Adding OAuth2 to Jupyter Notebooks on Kubernetes

February 1, 2024
Share this post
https://www.truefoundry.com/blog/adding-oauth2-to-notebooks-on-kubernetes
URL
Adding OAuth2 to Jupyter Notebooks on Kubernetes

TrueFoundry users can deploy Jupyter Notebooks on their personal cloud accounts, such as AWS, Azure, or GCP. This feature allows them to conduct machine learning experiments and training jobs on their own machines with ease. Initially, notebooks deployed through TrueFoundry were secured using a username-password combination. However, in response to widespread client requests, we have integrated Single Sign-On. This means users can now conveniently access their notebooks with the same login they use for TrueFoundry. This blog post delves into the specifics of how we implemented this feature.

Launching a Jupyter Notebook on TrueFoundry

Notebooks on TrueFoundry

TrueFoundry internally uses a fork of Kubeflow Notebook Controller to orchestrate the deployment of notebooks. The controller provides various features that we leverage, like:

  1. Simplified notebook spec: The Kubeflow Notebook APIs are simple and the controller orchestrates the creation of the Jupyter Notebook deployments.
  2. Automatic Culling: The controller automatically shuts down the notebook after a certain period of inactivity. This is incredibly useful to our clients who run experiments on notebooks backed by GPU machines.
  3. Persistent Home Directory: The controller takes care of creating a persistent volume that saves user progress on the notebook across sessions.
  4. Extensible base images: The controller supports a suite of base notebook images of Jupyter Notebook and VS Code maintained by TrueFoundry. The user can extend the features on these Docker images by adding a startup script or installing specific libraries.

For context, here’s what a simple Kubeflow Notebook object looks like:

apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
 name: my-notebook
spec:
 template:
   spec:
     containers:
       - name: my-notebook
         image: kubeflownotebookswg/jupyter:master
         args:
           [
             "start.sh",
             "lab",
             "--LabApp.token=''",
             "--LabApp.allow_remote_access='True'",
             "--LabApp.allow_root='True'",
             "--LabApp.ip='*'",
             "--LabApp.base_url=/test/my-notebook/",
             "--port=8888",
             "--no-browser",
           ]

Basic Auth for Notebooks

Before implementing OAuth2, TrueFoundry provided users with the option to enhance the security of their public notebooks by integrating basic authentication. This added layer of security was crucial to ensure that only authorized individuals could access the sensitive content of these notebooks. To implement this feature, TrueFoundry utilized the capabilities of WebAssembly (Wasm) plugins within the Istio proxy, specifically the Envoy proxy.

Istio, an open-source service mesh, offers a framework for managing network communications between various service workloads. With Istio, TrueFoundry was empowered to inject custom logic directly into the network layer, which is managed by the Envoy proxy. This approach allowed for effective control and security of the traffic flowing to and from their Jupyter Notebooks. The key to the implementation of basic auth was the WasmPlugin, a feature of Istio that facilitates the deployment of WebAssembly modules within the Envoy proxy.

This basic authentication WasmPlugin is integrated into a sequence of network filters within the Envoy proxy. These filters enable the execution of higher-level functions related to access control, transformation, data enrichment, auditing, and more, thereby enhancing the overall security and functionality of the service mesh. Here’s a simplified version of the spec for adding basic auth filter to the Envoy filter chain:

apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
 name: basic-auth
 namespace: istio-ingress
spec:
 phase: AUTHN
 pluginConfig:
   basic_auth_rules:
     - credentials:
         - user:pass
       hosts: www.example.com
       prefix: /secret/
 selector:
   matchLabels:
     istio: ingressgateway
 url: oci://ghcr.io/istio-ecosystem/wasm-extensions/basic_auth:1.12.0

OAuth2 for Notebook

For implementing OAuth2 in our notebooks, we utilized an Envoy filter, but the approach differed from that of basic authentication. Unlike the basic auth where we could conveniently insert a pre-built WasmPlugin into the filter chain, OAuth2 required a more tailored solution. To achieve this, we employed an HTTP filter specifically designed for OAuth. At TrueFoundry, our Single Sign-On system integrates with FusionAuth, serving as our OAuth provider.

Here’s how the Envoy Filter spec looks like – refer to the comments in the file for more details:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
 name: truefoundry-notebook-tfy-oauth2  # Name of the EnvoyFilter
 namespace: auth-test  # Namespace where the EnvoyFilter is deployed
spec:
 workloadSelector:
   labels:
     truefoundry.com/application: truefoundry-notebook  # Selector targeting workloads with specific labels
 configPatches:
 - applyTo: CLUSTER
   match:
     context: SIDECAR_OUTBOUND
   patch:
     operation: ADD
     value:
       name: tfy-oauth2  # Name of the cluster for OAuth2 authentication service
       type: LOGICAL_DNS  # Type of service discovery (DNS)
       connect_timeout: 5s  # Timeout for establishing a connection
       lb_policy: ROUND_ROBIN  # Load balancing policy
       # other load balancing config
 - applyTo: HTTP_FILTER
   match:
     context: SIDECAR_INBOUND
     listener:
       filterChain:
         filter:
           name: "envoy.filters.network.http_connection_manager"
           subFilter:
             name: envoy.filters.http.jwt_authn
   patch:
     operation: INSERT_BEFORE  # Inserting this filter before the JWT auth filter
     value:
      name: envoy.filters.http.tfy-oauth  # Name of the OAuth filter
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.http.oauth2.v3.OAuth2
        config:
          use_refresh_token: false  # Whether to use a refresh token
          pass_through_matcher:
            - name: Authorization
              present_match: true  # Pass through if Authorization header is present
          forward_bearer_token: true  # Forward bearer token to upstream
          auth_type: BASIC_AUTH  # Type of authentication used
          token_endpoint:
            cluster: tfy-oauth2  # Cluster for token endpoint
            uri: <token-endpoint-uri-of-oauth-provider>
            timeout: 5s  # Timeout for token endpoint
          authorization_endpoint: <authorization-endpoint-uri-of-oauth-provider>
          redirect_uri: https://%REQ(:authority)%/truefoundry-notebook/_auth/callback  # Redirect URI for callback
          redirect_path_matcher:
            path:
              exact: /truefoundry-notebook/_auth/callback  # Path for redirect URI
          signout_path:
            path:
              exact: /truefoundry-notebook/_auth/signout  # Path for signout
          credentials:
            client_id: <client-id-for-oauth>
            token_secret:
               # configuration to fetch token secret
               # read more about how we fetch secrets here:
               # https://www.envoyproxy.io/docs/envoy/latest/configuration/security/secret
            hmac_secret:
               # configuration to fetch hmac

When a user attempts to access a service protected by the OAuth2 filter for the first time, they are redirected to the authorization_endpoint. This endpoint is the URL of our external OAuth Provider, which, in our implementation, is the FusionAuth-based TrueFoundry login modal. This redirection is a critical step in the OAuth process, guiding users to a secure location where they can authenticate and consequently grant the necessary permissions for access to the service.

Once the login is complete, FusionAuth will redirect you to the redirect_uri (configured in the filter specification), adding a secret, temporary authorization code there. This request is intercepted by the filter and it makes a request to token_endpoint, exchanging the code for a JWT token. Finally, the filter sets cookies with the JWT token.

Subsequent accesses to the service are passed through the HTTP Filter since the cookie sets the Authorization header with JWT as the value. The filter is configured to pass through such requests (refer pass_through_matcher in the spec). To validate that the JWT is a valid token, we create a RequestAuthentication policy that will check with the OAuth provider:

apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
 # ...
spec:
 selector:
   # ...
 jwtRules:
 - issuer: "truefoundry.com"
   fromHeaders:
   - name: Authorization
     prefix: "Bearer "
   audiences:
     - <client-id>
   jwksUri: <oauth-provider-jwks-uri>
   forwardOriginalToken: true

Finally, we add the Authorization Policy that specify what requests to apply RequestAuthentication to. We want to apply authorization to all requests on port 8888:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: best-notebook-tfy-oauth2
 namespace: auth-test
spec:
 selector:
   matchLabels:
     truefoundry.com/application: best-notebook
 action: DENY
 rules:
 - from:
   - source:
       notRequestPrincipals: ["*"]
   to:
     - operation:
         ports:
           - "8888"

Discover More

December 12, 2024

Building Compound AI Systems with TrueFoundry & Mongo DB

Engineering and Product
December 11, 2024

Building RAG using TrueFoundry and MongoDB Atlas

Engineering and Product
December 4, 2024

Enabling 3-15X Faster Docker Image Builds with TrueFoundry on Kubernetes

Engineering and Product
September 6, 2024

Build Vs Buy

Engineering and Product

Related Blogs

August 6, 2024
|
5 min read

A Guide to Cloud Node Auto-Provisioning

Blazingly fast way to build, track and deploy your models!

pipeline