
Federated Averaging

class substrafl.strategies.FedAvg

Bases: substrafl.strategies.strategy.Strategy

Federated averaging strategy.

Federated averaging is the simplest federating strategy. A round consists in performing a predefined number of forward/backward passes on each client, aggregating updates by computing their means and distributing the consensus update to all clients. In FedAvg, strategy is performed in a centralized way, where a single server or AggregationNode communicates with a number of clients TrainDataNode and TestDataNode.

Formally, if \(w_t\) denotes the parameters of the model at round \(t\), a single round consists in the following steps:

\[\Delta w_t^{k} = \mathcal{O}^k_t(w_t| X_t^k, y_t^k, m) \Delta w_t = \sum_{k=1}^K \frac{n_k}{n} \Delta w_t^k w_{t + 1} = w_t + \Delta w_t\]

where \(\mathcal{O}^k_t\) is the local optimizer algorithm of client \(k\) taking as argument the averaged weights as well as the \(t\)-th batch of data for local worker \(k\) and the number of local updates \(m\) to perform, and where \(n_k\) is the number of samples for worker \(k\), \(n = \sum_{k=1}^K n_k\) is the total number of samples.

avg_shared_states(shared_states: Optional[List[substrafl.schemas.FedAvgSharedState]]) substrafl.schemas.FedAvgAveragedState

Compute the weighted average of all elements returned by the train methods of the user-defined algorithm. The average is weighted by the proportion of the number of samples.


shared_states = [
    {"weights": [3, 3, 3], "gradient": [4, 4, 4], "n_samples": 20},
    {"weights": [6, 6, 6], "gradient": [1, 1, 1], "n_samples": 40},
result = {"weights": [5, 5, 5], "gradient": [2, 2, 2]}

shared_states (List[FedAvgSharedState]) – The list of the shared_state returned by the train method of the algorithm for each organization.

  • EmptySharedStatesError – The train method of your algorithm must return a shared_state

  • TypeError – Each shared_state must contains the key n_samples

  • TypeError – Each shared_state must contains at least one element to average

  • TypeError – All the elements of shared_states must be similar (same keys)

  • TypeError – All elements to average must be of type np.ndarray


A dict containing the weighted average of each input parameters without the passed key “n_samples”.

Return type


property name: substrafl.schemas.StrategyName

The name of the strategy


Name of the strategy

Return type


perform_round(algo: algorithms.algo.Algo, train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], aggregation_node: organizations.aggregation_node.AggregationNode, round_idx: int, clean_models: bool)
One round of the Federated Averaging strategy consists in:
  • if round_idx==0: initialize the strategy by performing a local update

    (train on n mini-batches) of the models on each train data node

  • aggregate the model shared_states

  • set the model weights to the aggregated weights on each train data nodes

  • perform a local update (train on n mini-batches) of the models on each train data nodes

  • algo (Algo) – User defined algorithm: describes the model train and predict methods

  • train_data_nodes (List[TrainDataNode]) – List of the nodes on which to perform local updates.

  • aggregation_node (AggregationNode) – Node without data, used to perform operations on the shared states of the models

  • round_idx (int) – Round number, it starts at 0.

  • clean_models (bool) – Clean the intermediary models of this round on the Substra platform. Set it to False if you want to download or re-use intermediary models. This causes the disk space to fill quickly so should be set to True unless needed.

predict(algo: algorithms.algo.Algo, test_data_nodes: List[substrafl.nodes.test_data_node.TestDataNode], train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], round_idx: int)

Predict function of the strategy: evaluate the model. Gets the model for a train organization and evaluate it on the test nodes.

  • algo (Algo) – algo with the code to execute on the organization

  • test_data_nodes (List[TestDataNode]) – list of nodes on which to evaluate

  • train_data_nodes (List[TrainDataNode]) – list of nodes on which the model has been trained

  • round_idx (int) – index of the round


class substrafl.strategies.Scaffold(aggregation_lr: float = 1)

Bases: substrafl.strategies.strategy.Strategy

Scaffold strategy. Paper: Scaffold is Federated Averaging with control variates. By adding auxiliary variables in the clients and server the authors of the related paper prove better bounds on the convergence assuming certain hypothesis, in particular with non-iid data.

A round consists in performing a predefined number of forward/backward passes on each client, aggregating updates by computing their means and distributing the consensus update to all clients. In Scaffold, strategy is performed in a centralized way, where a single server or AggregationNode communicates with a number of clients TrainDataNode and TestDataNode.


aggregation_lr (float, Optional) – Global aggregation rate applied on the averaged weight updates (eta_g in the paper). Defaults to 1. Must be >=0.

avg_shared_states(shared_states: Optional[List[substrafl.schemas.ScaffoldSharedState]]) substrafl.schemas.ScaffoldAveragedStates

Performs the aggregation of the shared states returned by the train methods of the user-defined algorithm, according to the server operations of the Scaffold Algo.

  1. Computes the weighted average of the weight updates and applies the aggregation learning rate

  2. Updates the server control variate with the weighted average of the control variate updates

The average is weighted by the proportion of the number of samples.


shared_states (List[ScaffoldSharedState]) – Shared state returned by the train method of the algorithm for each client (e.g. algorithms.pytorch.scaffold.train)


averaged weight updates and updated server control variate

Return type


property name: substrafl.schemas.StrategyName

The name of the strategy


Name of the strategy

Return type


perform_round(algo: algorithms.algo.Algo, train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], aggregation_node: organizations.aggregation_node.AggregationNode, round_idx: int, clean_models: bool)
One round of the Scaffold strategy consists in:
  • if round_idx==0: initialize the strategy by performing a local update

    (train on n mini-batches) of the models on each train data nodes

  • aggregate the model shared_states

  • set the model weights to the aggregated weights on each train data nodes

  • perform a local update (train on n mini-batches) of the models on each train data nodes

  • algo (Algo) – User defined algorithm: describes the model train and predict methods

  • train_data_nodes (List[TrainDataNode]) – List of the organizations on which to perform

  • aggregation_node (organizations.aggregation_node.AggregationNode) – Node without data, used to perform operations on the shared states of the models

  • round_idx (int) – Round number, it starts at 0.

  • clean_models (bool) – Clean the intermediary models of this round on the Substra platform. Set it to False if you want to download or re-use intermediary models. This causes the disk space to fill quickly so should be set to True unless needed.

  • aggregation_node

predict(algo: algorithms.algo.Algo, test_data_nodes: List[substrafl.nodes.test_data_node.TestDataNode], train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], round_idx: int)

Predict function of the strategy: evaluate the model. Gets the model for a train organization and evaluate it on the test nodes.

  • algo (Algo) – algo with the code to execute on the organization

  • test_data_nodes (List[TestDataNode]) – list of nodes on which to evaluate

  • train_data_nodes (List[TrainDataNode]) – list of nodes on which the model has been trained

  • round_idx (int) – index of the round

Newton Raphson

class substrafl.strategies.NewtonRaphson(damping_factor: float)

Bases: substrafl.strategies.strategy.Strategy

Newton-Raphson strategy.

Newton-Raphson strategy is based on Newton-Raphson distributed method. It leads to a faster divergence than a standard FedAvg strategy, however it can only be used on convex problems.

In a local step, the first order derivative (gradients) and the second order derivative (Hessian Matrix) of loss with respect to weights is calculated for each node.

Hessians and gradients are averaged on the aggregation node.

Updates of the weights are then calculated using the formula:

\[update = -\eta * H^{-1}_\theta.\Delta_\theta\]

Where \(H\) is the Hessian of the loss with respect to \(\theta\) and \(\Delta_\theta\) is the gradients of the loss with respect to \(\theta\) and \(0 < \eta <= 1\) is the damping factor.


damping_factor (float) – Must be between 0 and 1. Multiplicative coefficient of the parameters update. Smaller value for \(\eta\) will increase the stability but decrease the speed of convergence of the gradient descent. Recommended value: damping_factor=0.8.

compute_averaged_states(shared_states: Optional[List[substrafl.schemas.NewtonRaphsonSharedState]]) substrafl.schemas.NewtonRaphsonAveragedStates

Given the gradients and the Hessian (the second order derivative of loss with respect to weights), updates of the weights are calculated using the formula:

\[update = -\eta * H^{-1}_\theta.\Delta_\theta\]

Where \(H\) is the Hessian of the loss with respect to \(\theta\) and \(\Delta_\theta\) is the gradients of the loss with respect to \(\theta\) and \(0 < \eta <= 1\) is the damping factor.

The average is weighted by the number of samples.


shared_states = [
    {"gradients": [1, 1, 1], "hessian": [[1, 0, 0], [0, 1, 0], [0, 0, 1]], "n_samples": 2},
    {"gradients": [2, 2, 2], "hessian": [[2, 0, 0], [0, 2, 0], [0, 0, 2]], "n_samples": 1}
damping_factor = 1

average = {"gradients": [4/3, 4/3, 4/3], "hessian": [[4/3, 0, 0],[0, 4/3, 0], [0, 0, 4/3]]}

shared_states (List[NewtonRaphsonSharedState]) – The list of the shared_state returned by the train method of the algorithm for each node.

  • EmptySharedStatesError – The train method of your algorithm must return a shared_state

  • TypeError – Each shared_state must contains the key n_samples, gradients and hessian

  • TypeError – Each shared_state must contains at least one element to average

  • TypeError – All the elements of shared_states must be similar (same keys)

  • TypeError – All elements to average must be of type np.array


A dict containing the parameters updates of each input parameters.

Return type


property name: substrafl.schemas.StrategyName

The name of the strategy


Name of the strategy

Return type


perform_round(algo: algorithms.algo.Algo, train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], aggregation_node: organizations.aggregation_node.AggregationNode, round_idx: int, clean_models: bool)

One round of the Newton-Raphson strategy consists in:

  • if round_ids==0: initialize the strategy by performing a local update of the models on each train data nodes

  • aggregate the model shared_states

  • set the model weights to the aggregated weights on each train data nodes

  • perform a local update of the models on each train data nodes

  • algo (Algo) – User defined algorithm: describes the model train and predict methods

  • train_data_nodes (List[TrainDataNode]) – List of the nodes on which to perform local updates

  • aggregation_node (AggregationNode) – node without data, used to perform operations on the shared states of the models

  • round_idx (int) – Round number, it starts at 0.

  • clean_models (bool) – Clean the intermediary models of this round on the Substra platform. Set it to False if you want to download or re-use intermediary models. This causes the disk space to fill quickly so should be set to True unless needed.

predict(algo: algorithms.algo.Algo, test_data_nodes: List[substrafl.nodes.test_data_node.TestDataNode], train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], round_idx: int)

Predict function for test_data_nodes on which the model have been trained on.

  • algo (Algo) – algo to use for computing the predictions.

  • test_data_nodes (List[TestDataNode]) – test data nodes to intersect with train data nodes to evaluate the model on.

  • train_data_nodes (List[TrainDataNode]) – train data nodes the model has been trained on.

  • round_idx (int) – round index.


NotImplementedError – Cannot test on a node we did not train on for now.

Single Organization

class substrafl.strategies.SingleOrganization

Bases: substrafl.strategies.strategy.Strategy

Single organization strategy.

Single organization is not a real federated strategy and it is rather used for testing as it is faster than other ‘real’ strategies. The training and prediction are performed on a single Node. However, the number of passes to that Node (num_rounds) is still defined to test the actual federated setting. In SingleOrganization strategy a single client TrainDataNode and TestDataNode performs all the model execution.

property name: substrafl.schemas.StrategyName

The name of the strategy


Name of the strategy

Return type


perform_round(algo: algorithms.algo.Algo, train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], round_idx: int, clean_models: bool, aggregation_node: Optional[organizations.aggregation_node.AggregationNode] = None)

One round of the SingleOrganization strategy: perform a local update (train on n mini-batches) of the models on a given data node

  • algo (Algo) – User defined algorithm: describes the model train and predict

  • train_data_nodes (List[TrainDataNode]) – List of the nodes on which to perform local updates, there should be exactly one item in the list.

  • aggregation_node (AggregationNode) – Should be None otherwise it will be ignored

  • round_idx (int) – Round number, it starts at 0.

  • clean_models (bool) – Clean the intermediary models of this round on the Substra platform. Set it to False if you want to download or re-use intermediary models. This causes the disk space to fill quickly so should be set to True unless needed.

predict(algo: algorithms.algo.Algo, test_data_nodes: List[substrafl.nodes.test_data_node.TestDataNode], train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], round_idx: int)

Predict function of the strategy: evaluate the model. Gets the model for a train organization and evaluate it on the test nodes.

  • algo (Algo) – algo with the code to execute on the organization

  • test_data_nodes (List[TestDataNode]) – list of nodes on which to evaluate

  • train_data_nodes (List[TrainDataNode]) – list of nodes on which the model has been trained

  • round_idx (int) – index of the round

Strategies Base Class

class substrafl.strategies.strategy.Strategy(*args, **kwargs)

Bases: abc.ABC

Base strategy to be inherited from substrafl strategies.

abstract property name: substrafl.schemas.StrategyName

The name of the strategy


Name of the strategy

Return type


abstract perform_round(algo: algorithms.algo.Algo, train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], aggregation_node: Optional[organizations.aggregation_node.AggregationNode], round_idx: int, clean_models: bool)

Perform one round of the strategy

  • algo (Algo) – algo with the code to execute on the organization

  • train_data_nodes (List[TrainDataNode]) – list of the train organizations

  • aggregation_node (Optional[AggregationNode]) – aggregation node, necessary for centralized strategy, unused otherwise

  • round_idx (int) – index of the round

  • clean_models (bool) – Clean the intermediary models of this round on the Substra platform. Set it to False if you want to download or re-use intermediary models. This causes the disk space to fill quickly so should be set to True unless needed.

abstract predict(algo: algorithms.algo.Algo, test_data_nodes: List[substrafl.nodes.test_data_node.TestDataNode], train_data_nodes: List[substrafl.nodes.train_data_node.TrainDataNode], round_idx: int)

Predict function of the strategy: evaluate the model. Gets the model for a train organization and evaluate it on the test nodes.

  • algo (Algo) – algo with the code to execute on the organization

  • test_data_nodes (List[TestDataNode]) – list of nodes on which to evaluate

  • train_data_nodes (List[TrainDataNode]) – list of nodes on which the model has been trained

  • round_idx (int) – index of the round