1. Check Prerequisites

1.1. Substra version

Substra is a set of microservices which are together issued a version number; but, since we are installing the services one by one, we need to know the actual version of each one.

Check the Compatibility table for the Helm chart version needed for the orchestrator, backend and frontend. The corresponding Docker app version is already configured in there, so it’s all you need.

1.2. Local tools

Install:

kubectl
helm

Add the Substra helm repository: .. code-block:: shell

helm repo add substra https://substra.github.io/charts/ helm repo update

Also install:

curl for making sure the HTTP endpoints work
gRPCurl for making sure the gRPC endpoint works

1.3. Infrastructure

Substra is a federated learning tool and as such it makes little sense to have only one node running, or nodes running on the same cluster merely separated by a namespace.

Therefore, in this guide we are deploying on two separate Kubernetes clusters, connecting them through the internet.

Throughout the guide we are giving hostnames to endpoints. On the internet, this means owning a domain name and setting up DNS – everytime you see DOMAIN, it means your own domain you are setting this up under.

Exposing on the internet also means dealing with a certificate authority – here we’re using Let’s Encrypt.

Note

It is entirely possible to host multiple Substra nodes on the same cluster, and/or to have them communicate on a private network with a private CA, and/or to attribute hostnames differently.

1.3.1. In practice

1.3.1.1. Clusters

Set up two clusters – they have to support allocating PVCs on the fly and opening ingresses to the Internet. For this, we’d recommend using a managed Kubernetes service such as Google GKE, Azure AKS, or Amazon EKS.

We’ll henceforth refer to the clusters we have set up as cluster-1 and cluster-2 .

We also need some software for routing (ingress-nginx) and certificate management (cert-manager); install both on each cluster (insert your email address in place of YOUR_EMAIL_HERE):

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace

helm upgrade --install \
  cert-manager cert-manager \
  --repo https://charts.jetstack.io \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

kubectl apply -f - << "EOF"
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: YOUR_EMAIL_HERE
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
      - http01:
          ingress:
            class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: YOUR_EMAIL_HERE
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx
EOF

This also sets up letsencrypt-prod as an issuer of certificates (for endpoints exposed on the internet) and letsencrypt-staging to issue development certificates.

1.3.1.2. DNS

Probably the most convenient way to handle DNS is to set a wildcard record for each cluster and forget about it. Once you have installed nginx-ingress-controller, the corresponding service should have received an IP address you can then set in the DNS:

DNS zone file for DOMAIN

*.cluster-1 300 IN A NGINX_1_IP
*.cluster-2 300 IN A NGINX_2_IP

This way, any hostname such as whatever.cluster-1.DOMAIN directs to the same endpoint, which itself directs the traffic to the correct service based on hostname (this is what the Ingress objects are for).