Skip to main content
Version Next

JupyterLab Configuration

The tdp-jupyter chart packages JupyterHub 5.3.0 with Spark integration for TDP on Kubernetes.

What is JupyterLab?

JupyterLab is the interactive notebook interface of TDP Kubernetes — the environment where analysts and data engineers write and run code directly in the browser, organize files, and explore data.

When a user logs in, JupyterLab opens in an isolated environment: dedicated storage, memory, and CPU, without interference from other users.

The main difference from a local Jupyter installation is the integration with Apache Spark: notebooks do not run Spark on the user's machine — they connect to the TDP Spark cluster (tdp-spark) and distribute processing across the cluster workers.

How JupyterLab works in TDP Kubernetes

JupyterLab is served by JupyterHub — the multi-user server that orchestrates the lifecycle of environments in Kubernetes.

JupyterHub handles authentication, creation, and termination of user pods; JupyterLab is the interface the user sees and works with inside each pod.

Instead of each analyst installing Jupyter locally, JupyterHub centralizes everything in the cluster: each login results in a dedicated Kubernetes pod running an isolated JupyterLab environment.

Learn more

See JupyterLab — Concepts for a complete overview of the tool, its architecture, and how it works.

Deployed components

ComponentDescription
HubCentral server that manages authentication and environment creation
ProxyReverse proxy that routes requests to each user's environment
Single-user podsOne Kubernetes pod per logged-in user, with the notebook and kernel
Spark integrationConfigMap that points notebooks to the cluster's Spark master

The single-user model

Each time a user logs in, the Hub creates a dedicated Pod for them in Kubernetes.

When the user finishes and stops the server, the Pod is terminated (but storage persists if configured with a PVC).

This means you need to size the cluster for the peak number of simultaneous users.

Prerequisites

  • Kubernetes 1.19+, Helm 3.2+
  • PV provisioner
  • Spark cluster via the tdp-spark chart (or a compatible installation), when using Spark integration

Installation (OCI)

Terminal input
helm install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-jupyter \
-n <namespace> --create-namespace

Main parameters

ParameterDescription
tdp-jupyter.enabledEnable JupyterHub
tdp-jupyter.hub.resourcesHub CPU/memory
tdp-jupyter.hub.config.JupyterHub.authenticator_classAuthentication (default: dummy)
tdp-jupyter.singleuser.*User CPU/memory/storage
tdp-jupyter.proxy.service.*Service type / NodePorts
TDPConfigurations.externalDatabase.*Optional external PostgreSQL
tdpSparkIntegration.enabledSpark integration

Spark integration

tdpSparkIntegration:
enabled: true
configMap:
sparkConfig:
"spark.executor.instances": "2"
"spark.executor.memory": "4g"
"spark.executor.cores": "3"

tdp-spark:
spark:
worker:
replicaCount: 2
resources:
limits:
cpu: 4
memory: 6Gi

Adjust the Spark master (spark://...) and network policies to match the actual namespace and Spark release in your cluster; verify services and labels with kubectl get svc in the target environment.

External database (PostgreSQL)

TDPConfigurations:
externalDatabase:
enabled: true
recreate: false
externalSecret:
releaseName: "<postgresql-release-name>"

tdp-jupyter:
hub:
db:
type: postgres
url: "postgresql://<db-user>@<postgresql-host>.<namespace>.svc.cluster.local:5432/<database>"
password: null
upgrade: false
extraEnv:
PGPASSWORD:
valueFrom:
secretKeyRef:
name: "<jupyter-db-secret-name>"
key: password

Service: NodePort / LoadBalancer / ClusterIP

NodePort (default): tdp-jupyter.proxy.service.type: NodePort and nodePorts as per the values file (see helm show values).

LoadBalancer / ClusterIP: set proxy.service.type to LoadBalancer or ClusterIP, depending on the desired exposure.

Storage

singleuser:
storage:
dynamic:
storageClass: <storage-class>
capacity: 5Gi

Network policies

The chart exposes singleuser.networkPolicy with typical egress for Spark and S3 storage — use it as a template and adjust namespaceSelector, labels, and ports to your cluster (do not copy tenant/namespace names from another environment).

Access

NodePort

Terminal input
kubectl get nodes -o wide

Use http://<node-ip>:<http-nodeport> (port as per your values).

Port-forward

Terminal input
kubectl port-forward -n <namespace> svc/<jupyter-proxy-service-name> 8080:80

Open http://localhost:8080. The Service name typically derives from the release (verify with kubectl get svc).

LoadBalancer

Terminal input
kubectl get svc -n <namespace>

Use the external IP of the proxy Service when applicable.

Default authentication

  • User: admin
  • The chart sets an example initial password — you must change it before using in production.

Security / LDAP

See Security — JupyterLab.

Troubleshooting

Terminal input
kubectl -n <namespace> get pods -l app.kubernetes.io/instance=<release>
kubectl -n <namespace> logs -l app.kubernetes.io/component=hub
kubectl -n <namespace> logs -l app.kubernetes.io/component=proxy
kubectl -n <namespace> get pvc -l app.kubernetes.io/instance=<release>

Uninstallation

Terminal input
helm uninstall <release> -n <namespace>
kubectl delete configmap tdp-spark-jupyter-integration -n <namespace>

For the complete list of keys, use helm show values with the chart version you installed.