Experiment

substrafl.experiment.execute_experiment(*, client: Client, strategy: ComputePlanBuilder, train_data_nodes: List[TrainDataNodeProtocol], experiment_folder: str | Path, num_rounds: int | None = None, aggregation_node: AggregationNodeProtocol | None = None, evaluation_strategy: EvaluationStrategy | None = None, dependencies: Dependency | None = None, clean_models: bool = True, name: str | None = None, additional_metadata: Dict | None = None, task_submission_batch_size: int = 500) → ComputePlan

Run a complete experiment. This will train (on the train_data_nodes) and test (on the test_data_nodes) the specified strategy n_rounds times and return the compute plan object from the Substra platform.

In SubstraFL, operations are linked to each other statically before being submitted to Substra.

The execution of:

the self.perform_round method from the passed strategy num_rounds times

the self.predict methods from the passed strategy

generate the static graph of operations.

Each element necessary for those operations (Tasks and Functions) is registered to the Substra platform thanks to the specified client.

Finally, the compute plan is sent and executed.

The experiment summary is saved in experiment_folder, with the name format {timestamp}_{compute_plan.key}.json

Parameters:

client (substra.Client) – A substra client to interact with the Substra platform
strategy (Strategy) – The strategy that will be executed
train_data_nodes (List[TrainDataNodeProtocol]) – List of the nodes where training on data occurs.
aggregation_node (Optional[AggregationNodeProtocol]) – For centralized strategy, the aggregation node, where all the shared tasks occurs else None.
evaluation_strategy (Optional[EvaluationStrategy]) – If None performance will not be measured at all. Otherwise measuring of performance will follow the EvaluationStrategy. Defaults to None.
num_rounds (int) – The number of time your strategy will be executed.
dependencies (Optional[Dependency]) – Dependencies of the experiment. It must be defined from the SubstraFL Dependency class. Defaults None.
experiment_folder (Union[str, pathlib.Path]) – path to the folder where the experiment summary is saved.
clean_models (bool) – Clean the intermediary models on the Substra platform. Set it to False if you want to download or re-use intermediary models. This causes the disk space to fill quickly so should be set to True unless needed. Defaults to True.
name (Optional[str]) – Optional name chosen by the user to identify the compute plan. If None, the compute plan name is set to the timestamp.
additional_metadata (Optional[dict]) – Optional dictionary of metadata to be passed to the Substra WebApp.
task_submission_batch_size (int) – The compute plan tasks are submitted by batch. The higher the batch size, the faster the submission, a batch size that is too high makes the submission fail. Rule of thumb: batch_size = math.floor(120000 / number_of_samples_per_task)

Returns:

The generated compute plan

Return type:

ComputePlan

substrafl.experiment.simulate_experiment(*, client: Client, strategy: ComputePlanBuilder, experiment_folder: str | Path, train_data_nodes: List[TrainDataNodeProtocol], aggregation_node: AggregationNodeProtocol | None = None, evaluation_strategy: EvaluationStrategy | None = None, num_rounds: int | None = None, additional_metadata: Dict | None = None, clean_models: bool = True, **kwargs) → Tuple[SimuPerformancesMemory, SimuStatesMemory, SimuStatesMemory]

Simulate an experiment, by computing all operation on RAM. No tasks will be sent to the Client, which mean that this function should not be used to check that your experiment will run as wanted on Substra. remote client backend type is not supported by this function.

The intermediate states always contains the last round computed state. Set clean_models to False to keep all intermediate states.

Parameters:

client (substra.Client) – A substra client to interact with the Substra platform, in order to retrieve the registered data. remote client backend type is not supported by this function.
strategy (Strategy) – The strategy that will be executed.
experiment_folder (Union[str, pathlib.Path]) – path to the folder where the experiment summary is saved.
train_data_nodes (List[TrainDataNodeProtocol]) – List of the nodes where training on data occurs.
aggregation_node (Optional[AggregationNodeProtocol]) – For centralized strategy, the aggregation node, where all the shared tasks occurs else None.
evaluation_strategy (EvaluationStrategy, Optional) – If None performance will not be measured at all. Otherwise measuring of performance will follow the EvaluationStrategy. Defaults to None.
num_rounds (int) – The number of time your strategy will be executed.
additional_metadata (dict, Optional) – Optional dictionary of metadata to be passed to the experiment summary.
clean_models (bool) – Intermediary models are cleaned by the RAM. Set it to False if you want to return intermediary states. Defaults to True.

Raises:

UnsupportedClientBackendTypeError – remote client backend type is not supported by simulate_experiment.

Returns:

Objects containing all computed performances during the simulation. Set to None if no EvaluationStrategy given. SimuStatesMemory: Objects containing all intermediate state saved on the TrainDataNodes. SimuStatesMemory: Objects containing all intermediate state saved on the AggregationNode. Set to None if no AggregationNode given.

Return type:

SimuPerformancesMemory