Version Next

JupyterLab Configuration

The tdp-jupyter chart packages JupyterHub 5.3.0 with Spark integration for TDP on Kubernetes.

What is JupyterLab?

JupyterLab is the interactive notebook interface of TDP Kubernetes — the environment where analysts and data engineers write and run code directly in the browser, organize files, and explore data.

When a user logs in, JupyterLab opens in an isolated environment: dedicated storage, memory, and CPU, without interference from other users.

The main difference from a local Jupyter installation is the integration with Apache Spark: notebooks do not run Spark on the user's machine — they connect to the TDP Spark cluster (tdp-spark) and distribute processing across the cluster workers.

How JupyterLab works in TDP Kubernetes

JupyterLab is served by JupyterHub — the multi-user server that orchestrates the lifecycle of environments in Kubernetes.

JupyterHub handles authentication, creation, and termination of user pods; JupyterLab is the interface the user sees and works with inside each pod.

Instead of each analyst installing Jupyter locally, JupyterHub centralizes everything in the cluster: each login results in a dedicated Kubernetes pod running an isolated JupyterLab environment.

Learn more

See JupyterLab — Concepts for a complete overview of the tool, its architecture, and how it works.

Deployed components

Component	Description
Hub	Central server that manages authentication and environment creation
Proxy	Reverse proxy that routes requests to each user's environment
Single-user pods	One Kubernetes pod per logged-in user, with the notebook and kernel
Spark integration	ConfigMap that points notebooks to the cluster's Spark master

The single-user model

Each time a user logs in, the Hub creates a dedicated Pod for them in Kubernetes.

When the user finishes and stops the server, the Pod is terminated (but storage persists if configured with a PVC).

This means you need to size the cluster for the peak number of simultaneous users.

Prerequisites

Kubernetes 1.19+, Helm 3.2+
PV provisioner
Spark cluster via the tdp-spark chart (or a compatible installation), when using Spark integration

Installation (OCI)

Terminal input
helm install <release> \
  oci://registry.tecnisys.com.br/tdp/charts/tdp-jupyter \
  -n <namespace> --create-namespace

Main parameters

Parameter	Description
`tdp-jupyter.enabled`	Enable JupyterHub
`tdp-jupyter.hub.resources`	Hub CPU/memory
`tdp-jupyter.hub.config.JupyterHub.authenticator_class`	Authentication (default: `dummy`)
`tdp-jupyter.singleuser.*`	User CPU/memory/storage
`tdp-jupyter.proxy.service.*`	Service type / NodePorts
`TDPConfigurations.externalDatabase.*`	Optional external PostgreSQL
`tdpSparkIntegration.enabled`	Spark integration

Spark integration

tdpSparkIntegration:
  enabled: true
  configMap:
    sparkConfig:
      "spark.executor.instances": "2"
      "spark.executor.memory": "4g"
      "spark.executor.cores": "3"

tdp-spark:
  spark:
    worker:
      replicaCount: 2
      resources:
        limits:
          cpu: 4
          memory: 6Gi

Adjust the Spark master (spark://...) and network policies to match the actual namespace and Spark release in your cluster; verify services and labels with kubectl get svc in the target environment.

External database (PostgreSQL)

TDPConfigurations:
  externalDatabase:
    enabled: true
    recreate: false
    externalSecret:
      releaseName: "<postgresql-release-name>"

tdp-jupyter:
  hub:
    db:
      type: postgres
      url: "postgresql://<db-user>@<postgresql-host>.<namespace>.svc.cluster.local:5432/<database>"
      password: null
      upgrade: false
    extraEnv:
      PGPASSWORD:
        valueFrom:
          secretKeyRef:
            name: "<jupyter-db-secret-name>"
            key: password

Service: NodePort / LoadBalancer / ClusterIP

NodePort (default): tdp-jupyter.proxy.service.type: NodePort and nodePorts as per the values file (see helm show values).

LoadBalancer / ClusterIP: set proxy.service.type to LoadBalancer or ClusterIP, depending on the desired exposure.

Storage

singleuser:
  storage:
    dynamic:
      storageClass: <storage-class>
    capacity: 5Gi

Network policies

The chart exposes singleuser.networkPolicy with typical egress for Spark and S3 storage — use it as a template and adjust namespaceSelector, labels, and ports to your cluster (do not copy tenant/namespace names from another environment).

Access

NodePort

Terminal input

kubectl get nodes -o wide

Use http://<node-ip>:<http-nodeport> (port as per your values).

Port-forward

Terminal input

kubectl port-forward -n <namespace> svc/<jupyter-proxy-service-name> 8080:80

Open http://localhost:8080. The Service name typically derives from the release (verify with kubectl get svc).

LoadBalancer

Terminal input

kubectl get svc -n <namespace>

Use the external IP of the proxy Service when applicable.

Default authentication

User: admin
The chart sets an example initial password — you must change it before using in production.

Security / LDAP

See Security — JupyterLab.

Troubleshooting

Terminal input
kubectl -n <namespace> get pods -l app.kubernetes.io/instance=<release>
kubectl -n <namespace> logs -l app.kubernetes.io/component=hub
kubectl -n <namespace> logs -l app.kubernetes.io/component=proxy
kubectl -n <namespace> get pvc -l app.kubernetes.io/instance=<release>

Uninstallation

Terminal input

helm uninstall <release> -n <namespace>
kubectl delete configmap tdp-spark-jupyter-integration -n <namespace>

For the complete list of keys, use helm show values with the chart version you installed.

What is JupyterLab?​

How JupyterLab works in TDP Kubernetes​

Deployed components​

The single-user model​

Prerequisites​

Installation (OCI)​

Main parameters​

Spark integration​

External database (PostgreSQL)​

Service: NodePort / LoadBalancer / ClusterIP​

Storage​

Network policies​

Access​

NodePort​

Port-forward​

LoadBalancer​

Default authentication​

Security / LDAP​

Troubleshooting​

Uninstallation​

What is JupyterLab?

How JupyterLab works in TDP Kubernetes

Deployed components

The single-user model

Prerequisites

Installation (OCI)

Main parameters

Spark integration

External database (PostgreSQL)

Service: NodePort / LoadBalancer / ClusterIP

Storage

Network policies

Access

NodePort

Port-forward

LoadBalancer

Default authentication

Security / LDAP

Troubleshooting

Uninstallation