JupyterLab Configuration
The tdp-jupyter chart packages JupyterHub 5.3.0 with Spark integration for TDP on Kubernetes.
What is JupyterLab?
JupyterLab is the interactive notebook interface of TDP Kubernetes — the environment where analysts and data engineers write and run code directly in the browser, organize files, and explore data.
When a user logs in, JupyterLab opens in an isolated environment: dedicated storage, memory, and CPU, without interference from other users.
The main difference from a local Jupyter installation is the integration with Apache Spark: notebooks do not run Spark on the user's machine — they connect to the TDP Spark cluster (tdp-spark) and distribute processing across the cluster workers.
How JupyterLab works in TDP Kubernetes
JupyterLab is served by JupyterHub — the multi-user server that orchestrates the lifecycle of environments in Kubernetes.
JupyterHub handles authentication, creation, and termination of user pods; JupyterLab is the interface the user sees and works with inside each pod.
Instead of each analyst installing Jupyter locally, JupyterHub centralizes everything in the cluster: each login results in a dedicated Kubernetes pod running an isolated JupyterLab environment.
See JupyterLab — Concepts for a complete overview of the tool, its architecture, and how it works.
Deployed components
| Component | Description |
|---|---|
| Hub | Central server that manages authentication and environment creation |
| Proxy | Reverse proxy that routes requests to each user's environment |
| Single-user pods | One Kubernetes pod per logged-in user, with the notebook and kernel |
| Spark integration | ConfigMap that points notebooks to the cluster's Spark master |
The single-user model
Each time a user logs in, the Hub creates a dedicated Pod for them in Kubernetes.
When the user finishes and stops the server, the Pod is terminated (but storage persists if configured with a PVC).
This means you need to size the cluster for the peak number of simultaneous users.
Prerequisites
- Kubernetes 1.19+, Helm 3.2+
- PV provisioner
- Spark cluster via the
tdp-sparkchart (or a compatible installation), when using Spark integration
Installation (OCI)
helm install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-jupyter \
-n <namespace> --create-namespace
Main parameters
| Parameter | Description |
|---|---|
tdp-jupyter.enabled | Enable JupyterHub |
tdp-jupyter.hub.resources | Hub CPU/memory |
tdp-jupyter.hub.config.JupyterHub.authenticator_class | Authentication (default: dummy) |
tdp-jupyter.singleuser.* | User CPU/memory/storage |
tdp-jupyter.proxy.service.* | Service type / NodePorts |
TDPConfigurations.externalDatabase.* | Optional external PostgreSQL |
tdpSparkIntegration.enabled | Spark integration |
Spark integration
tdpSparkIntegration:
enabled: true
configMap:
sparkConfig:
"spark.executor.instances": "2"
"spark.executor.memory": "4g"
"spark.executor.cores": "3"
tdp-spark:
spark:
worker:
replicaCount: 2
resources:
limits:
cpu: 4
memory: 6Gi
Adjust the Spark master (spark://...) and network policies to match the actual namespace and Spark release in your cluster; verify services and labels with kubectl get svc in the target environment.
External database (PostgreSQL)
TDPConfigurations:
externalDatabase:
enabled: true
recreate: false
externalSecret:
releaseName: "<postgresql-release-name>"
tdp-jupyter:
hub:
db:
type: postgres
url: "postgresql://<db-user>@<postgresql-host>.<namespace>.svc.cluster.local:5432/<database>"
password: null
upgrade: false
extraEnv:
PGPASSWORD:
valueFrom:
secretKeyRef:
name: "<jupyter-db-secret-name>"
key: password
Service: NodePort / LoadBalancer / ClusterIP
NodePort (default): tdp-jupyter.proxy.service.type: NodePort and nodePorts as per the values file (see helm show values).
LoadBalancer / ClusterIP: set proxy.service.type to LoadBalancer or ClusterIP, depending on the desired exposure.
Storage
singleuser:
storage:
dynamic:
storageClass: <storage-class>
capacity: 5Gi
Network policies
The chart exposes singleuser.networkPolicy with typical egress for Spark and S3 storage — use it as a template and adjust namespaceSelector, labels, and ports to your cluster (do not copy tenant/namespace names from another environment).
Access
NodePort
kubectl get nodes -o wide
Use http://<node-ip>:<http-nodeport> (port as per your values).
Port-forward
kubectl port-forward -n <namespace> svc/<jupyter-proxy-service-name> 8080:80
Open http://localhost:8080. The Service name typically derives from the release (verify with kubectl get svc).
LoadBalancer
kubectl get svc -n <namespace>
Use the external IP of the proxy Service when applicable.
Default authentication
- User:
admin - The chart sets an example initial password — you must change it before using in production.
Security / LDAP
Troubleshooting
kubectl -n <namespace> get pods -l app.kubernetes.io/instance=<release>
kubectl -n <namespace> logs -l app.kubernetes.io/component=hub
kubectl -n <namespace> logs -l app.kubernetes.io/component=proxy
kubectl -n <namespace> get pvc -l app.kubernetes.io/instance=<release>
Uninstallation
helm uninstall <release> -n <namespace>
kubectl delete configmap tdp-spark-jupyter-integration -n <namespace>
For the complete list of keys, use helm show values with the chart version you installed.