Substra documentation¶

Substra is an open source federated learning (FL) software. It provides a flexible Python library and a web application to run federated learning training at scale.

Substra’s main usage is in production environments. It has already been deployed and used by hospitals and biotech companies: see the MELLODDY and the HealthChain projects.

The key Substra differentiators are:

Framework agnostic — Any Python library can be used: PyTorch, TensorFlow, sklearn, etc.
Flexible — Any kind of computation can be run: machine learning, analytics, etc.
Scalable — Support for vertical scaling (several trainings on one machine) and horizontal scaling (training on several machines).
Traceable — All machine learning operations are logged in an auditable read-only database.
Web application — A web application to monitor long-running computations and explore model’s performances.
Production ready — Packaged in Kubernetes and regularly audited.
Debugging made easy — Remote error logs are accessible to data scientists. The same code can be run in a deployed production environment or on a single machine to debug.

Substra was created by Owkin and is now hosted by the Linux Foundation for AI and Data.

How does it work?¶

Interfaces¶

Substra has three user interfaces:

Substra: a low-level Python library (also called SDK). Substra is used to create datasets, functions and machine learning tasks on the platform.
SubstraFL: a high-level federated learning Python library based on Substra. SubstraFL is used to run complex federated learning experiments at scale.
A web application used to monitor experiments training and explore their results.

Installation¶

Client side: Install Substra and SubstraFL python libraries with the following command: pip install substrafl. Substra python library is a dependency of SubstraFL, so it will be automatically installed. More information on the installation can be found here.

Server side: There are 2 options to deploy the server side of Substra (backend, frontend and orchestrator):

Local deployment: to deploy locally on a single one machine. Useful for quick tests and for development.
Production deployment: for real deployments.

Note

You can start doing local FL experiments with Substra by installing only the client side.

Links¶

Some quick links: