1. Consider Prerequisites¶
1.1. Substra version¶
Substra is a set of microservices which are together issued a version number; but, since we are installing the services one by one, we need to know the actual version of each one.
Check the Compatibility table for the Helm chart version needed for the orchestrator, backend and frontend. The corresponding Docker app version is already configured in there, so it’s all you need.
1.2. Local tools¶
Add the Substra helm repository: .. code-block:: shell
helm repo add substra https://substra.github.io/charts/ helm repo update
- Also install:
curlfor making sure the HTTP endpoints work
gRPCurl for making sure the gRPC endpoint works
Substra is a federated learning tool and as such it makes little sense to have only one node running, or nodes running on the same cluster merely separated by a namespace.
Therefore, in this guide we are deploying on two separate Kubernetes clusters, connecting them through the internet.
Throughout the guide we are giving hostnames to endpoints. On the internet, this means owning a domain name and setting up DNS – everytime you see
DOMAIN, it means your own domain you are setting this up under.
Exposing on the internet also means dealing with a certificate authority – here we’re using Let’s Encrypt.
It is entirely possible to host multiple Substra nodes on the same cluster, and/or to have them communicate on a private network with a private CA, and/or to attribute hostnames differently.
1.3.1. In practice¶
Set up two clusters – they have to support allocating PVCs on the fly and opening ingresses to the Internet. For this, we’d recommend using a managed Kubernetes service such as Google GKE, Azure AKS, or Amazon EKS.
We’ll henceforth refer to the clusters we have set up as
We also need some software for routing (ingress-nginx) and certificate management (cert-manager); install both on each cluster (insert your email address in place of
helm upgrade --install ingress-nginx ingress-nginx \ --repo https://kubernetes.github.io/ingress-nginx \ --namespace ingress-nginx --create-namespace helm upgrade --install \ cert-manager cert-manager \ --repo https://charts.jetstack.io \ --namespace cert-manager \ --create-namespace \ --set installCRDs=true kubectl apply -f - << "EOF" apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-staging spec: acme: server: https://acme-staging-v02.api.letsencrypt.org/directory email: YOUR_EMAIL_HERE privateKeySecretRef: name: letsencrypt-staging solvers: - http01: ingress: class: nginx --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: YOUR_EMAIL_HERE privateKeySecretRef: name: letsencrypt-prod solvers: - http01: ingress: class: nginx EOF
This also sets up
letsencrypt-prod as an issuer of certificates (for endpoints exposed on the internet) and
letsencrypt-staging to issue development certificates.
Probably the most convenient way to handle DNS is to set a wildcard record for each cluster and forget about it. Once you have installed nginx-ingress-controller, the corresponding service should have received an IP address you can then set in the DNS:
*.cluster-1 300 IN A NGINX_1_IP *.cluster-2 300 IN A NGINX_2_IP
This way, any hostname such as
whatever.cluster-1.DOMAIN directs to the same endpoint, which itself directs the traffic to the correct service based on hostname (this is what the Ingress objects are for).