1. Consider Prerequisites¶
1.1. Substra version¶
Substra is a set of microservices which are together issued a version number; but, since we are installing the services one by one, we need to know the actual version of each one.
Check the Compatibility table for the Helm chart version needed for the orchestrator, backend and frontend. The corresponding Docker app version is already configured in there, so it’s all you need.
1.2. Local tools¶
- Install:
kubectl
helm
Add the Substra helm repository: .. code-block:: shell
helm repo add substra https://substra.github.io/charts/ helm repo update
- Also install:
curl
for making sure the HTTP endpoints workgRPCurl for making sure the gRPC endpoint works
1.3. Infrastructure¶
Substra is a federated learning tool and as such it makes little sense to have only one node running, or nodes running on the same cluster merely separated by a namespace.
Therefore, in this guide we are deploying on two separate Kubernetes clusters, connecting them through the internet.
Throughout the guide we are giving hostnames to endpoints. On the internet, this means owning a domain name and setting up DNS – everytime you see DOMAIN
, it means your own domain you are setting this up under.
Exposing on the internet also means dealing with a certificate authority – here we’re using Let’s Encrypt.
Note
It is entirely possible to host multiple Substra nodes on the same cluster, and/or to have them communicate on a private network with a private CA, and/or to attribute hostnames differently.
1.3.1. In practice¶
1.3.1.1. Clusters¶
Set up two clusters – they have to support allocating PVCs on the fly and opening ingresses to the Internet. For this, we’d recommend using a managed Kubernetes service such as Google GKE, Azure AKS, or Amazon EKS.
We’ll henceforth refer to the clusters we have set up as cluster-1
and cluster-2
.
We also need some software for routing (ingress-nginx) and certificate management (cert-manager); install both on each cluster (insert your email address in place of YOUR_EMAIL_HERE
):
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
helm upgrade --install \
cert-manager cert-manager \
--repo https://charts.jetstack.io \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true
kubectl apply -f - << "EOF"
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: YOUR_EMAIL_HERE
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: YOUR_EMAIL_HERE
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF
This also sets up letsencrypt-prod
as an issuer of certificates (for endpoints exposed on the internet) and letsencrypt-staging
to issue development certificates.
1.3.1.2. DNS¶
Probably the most convenient way to handle DNS is to set a wildcard record for each cluster and forget about it. Once you have installed nginx-ingress-controller, the corresponding service should have received an IP address you can then set in the DNS:
*.cluster-1 300 IN A NGINX_1_IP
*.cluster-2 300 IN A NGINX_2_IP
This way, any hostname such as whatever.cluster-1.DOMAIN
directs to the same endpoint, which itself directs the traffic to the correct service based on hostname (this is what the Ingress objects are for).