Substra documentation
Substra is an open source federated learning (FL) software. It provides a flexible Python library and a web application to run federated learning training at scale.
Substra’s main usage is in production environments. It has already been deployed and used by hospitals and biotech companies: see the MELLODDY and the HealthChain projects.
The key Substra differentiators are:
Framework agnostic — Any Python library can be used: PyTorch, TensorFlow, sklearn, etc.
Flexible — Any kind of computation can be run: machine learning, analytics, etc.
Scalable — Support for vertical scaling (several trainings on one machine) and horizontal scaling (training on several machines).
Traceable — All machine learning operations are logged in an auditable read-only database.
Web application — A web application to monitor long-running computations and explore model’s performances.
Production ready — Packaged in Kubernetes and regularly audited.
Debugging made easy — Remote error logs are accessible to data scientists. The same code can be run in a deployed production environment or on a single machine to debug.
Substra was created by Owkin and is now hosted by the Linux Foundation for AI and Data.
How does it work?
Interfaces
- Substra has three user interfaces:
Substra: a low-level Python library (also called SDK). Substra is used to create datasets, functions and machine learning tasks on the platform.
SubstraFL: a high-level federated learning Python library based on Substra. SubstraFL is used to run complex federated learning experiments at scale.
A web application used to monitor experiments training and explore their results.
Installation
Client side: Install Substra and SubstraFL python libraries with the following command: pip install substrafl
. Substra python library is a dependency of SubstraFL, so it will be automatically installed. More information on the installation can be found here.
Server side: There are 2 options to deploy the server side of Substra (backend, frontend and orchestrator):
Local deployment: to deploy locally on a single one machine. Useful for quick tests and for development.
Production deployment: for real deployments.
Note
You can start doing local FL experiments with Substra by installing only the client side.
Links
Some quick links: