Organizations

AggregationNode

class substrafl.nodes.aggregation_node.AggregationNode(organization_id: str)

Bases: organizations.organization.Node

The node which applies operations to the shared states which are received from TrainDataNode data operations. The result is sent to the TrainDataNode and/or TestDataNode data operations.

Parameters

organization_id (str) –

register_operations(client: substra.sdk.client.Client, permissions: substra.sdk.schemas.Permissions, cache: Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey], dependencies: substrafl.dependency.Dependency) Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey]

Define the algorithms for each operation and submit the aggregated tuple to substra.

Go through every operation in the computation graph, check what algorithm they use (identified by their RemoteStruct id), submit it to substra and save RemoteStruct : algo_key into the cache (where algo_key is the returned algo key per substra.) If two tuples depend on the same algorithm, the algorithm won’t be added twice to substra as this method check if an algo has already been submitted to substra before adding it.

Parameters
  • client (substra.Client) – Substra defined client used to register the operation.

  • permissions (substra.sdk.schemas.Permissions) – Substra permissions attached to the registered operation.

  • cache (Dict[RemoteStruct, OperationKey]) – Already registered algorithm identifications. The key of each element is the RemoteStruct id (generated by substrafl) and the value is the key generated by substra.

  • dependencies (Dependency) – Dependencies of the given operation.

Returns

updated cache

Return type

Dict[RemoteStruct, OperationKey]

update_states(operation: substrafl.remote.operations.AggregateOperation, round_idx: int, authorized_ids: List[str], clean_models: bool = False) substrafl.nodes.references.shared_state.SharedStateRef

Adding an aggregated tuple to the list of operations to be executed by the node during the compute plan. This is done in a static way, nothing is submitted to substra. This is why the algo key is a RemoteStruct (substrafl local reference of the algorithm) and not a substra algo_key as nothing has been submitted yet.

Parameters
  • operation (AggregateOperation) – Automatically generated structure returned by the remote() decorator. This allows to register an operation and execute it later on.

  • round_idx (int) – Round number, it starts at 1.

  • authorized_ids (List[str]) – Authorized org to access the output model.

  • clean_models (bool) – Whether outputs of this operation are transient (deleted when they are not used anymore) or not. Defaults to False.

Raises

TypeError – operation must be an AggregateOperation, make sure to decorate your (user defined) aggregate function of the strategy with @remote.

Returns

Identification for the result of this operation.

Return type

SharedStateRef

TrainDataNode

class substrafl.nodes.train_data_node.TrainDataNode(organization_id: str, data_manager_key: str, data_sample_keys: List[str])

Bases: organizations.organization.Node

A predefined structure that allows you to register operations on your train node in a static way before submitting them to substra.

Parameters
  • organization_id (str) – The substra organization ID (shared with other organizations if permissions are needed)

  • data_manager_key (str) – Substra data_manager_key opening data samples used by the strategy

  • data_sample_keys (List[str]) – Substra data_sample_keys used for the training on this node

register_operations(client: substra.sdk.client.Client, permissions: substra.sdk.schemas.Permissions, cache: Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey], dependencies: substrafl.dependency.Dependency) Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey]

Define the algorithms for each operation and submit the composite traintuple to substra.

Go through every operation in the computation graph, check what algorithm they use (identified by their RemoteStruct id), submit it to substra and save RemoteStruct : algo_key into the cache (where algo_key is the returned algo key by substra.) If two tuples depend on the same algorithm, the algorithm won’t be added twice to substra as this method check if an algo has already been submitted to substra before adding it.

Parameters
  • client (substra.Client) – Substra client for the organization.

  • permissions (substra.sdk.schemas.Permissions) – Permissions for the algorithm.

  • cache (Dict[RemoteStruct, OperationKey]) – Already registered algorithm identifications. The key of each element is the RemoteStruct id (generated by substrafl) and the value is the key generated by substra.

  • dependencies (Dependency) – Algorithm dependencies.

Returns

updated cache

Return type

Dict[RemoteStruct, OperationKey]

summary() dict

Summary of the class to be exposed in the experiment summary file

Returns

a json-serializable dict with the attributes the user wants to store

Return type

dict

update_states(operation: substrafl.remote.operations.DataOperation, round_idx: int, authorized_ids: List[str], clean_models: bool = False, local_state: Optional[substrafl.nodes.references.local_state.LocalStateRef] = None) Tuple[substrafl.nodes.references.local_state.LocalStateRef, substrafl.nodes.references.shared_state.SharedStateRef]

Adding a composite train tuple to the list of operations to be executed by the node during the compute plan. This is done in a static way, nothing is submitted to substra. This is why the algo key is a RemoteStruct (substrafl local reference of the algorithm) and not a substra algo_key as nothing has been submitted yet.

Parameters
  • operation (DataOperation) – Automatically generated structure returned by the remote_data() decorator. This allows to register an operation and execute it later on.

  • round_idx (int) – Round number, it starts at 1. In case of a centralized strategy, it is preceded by an initialization round tagged: 0.

  • authorized_ids (List[str]) – Authorized org to access the output model.

  • clean_models (bool) – Whether outputs of this operation are transient (deleted when they are not used anymore) or not. Defaults to False.

  • local_state (Optional[LocalStateRef]) – The parent task LocalStateRef. Defaults to None.

Raises

TypeError – operation must be a DataOperation, make sure to decorate the train and predict methods of your algorithm with @remote

Returns

Identifications for the results of this operation.

Return type

Tuple[LocalStateRef, SharedStateRef]

TestDataNode

class substrafl.nodes.test_data_node.TestDataNode(organization_id: str, data_manager_key: str, test_data_sample_keys: List[str], metric_keys: List[str])

Bases: organizations.organization.Node

A node on which you will test your algorithm. A TestDataNode must also be a train data node for now.

Parameters
  • organization_id (str) – The substra organization ID (shared with other organizations if permissions are needed)

  • data_manager_key (str) – Substra data_manager_key opening data samples used by the strategy

  • test_data_sample_keys (List[str]) – Substra data_sample_keys used for the training on this node

  • metric_keys (List[str]) – Substra metric keys to the metrics, use substra.Client().add_algo()

register_predict_operations(client: substra.sdk.client.Client, permissions: substra.sdk.schemas.Permissions, traintuples: List[Dict], cache: Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey], dependencies: substrafl.dependency.Dependency) Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey]

Find the algorithms from the parent traintuple of each predicttuple and submit it to substra with the predict algo category.

Go through every operation in the predict algo cache, submit it to substra and save RemoteStruct : algo_key into the cache (where algo_key is the returned algo key by substra.) If two predicttuples depend on the same algorithm, the algorithm won’t be added twice to substra as this method check if an algo has already been submitted as a predicttuple to substra before adding it.

Parameters
  • client (substra.Client) – Substra client for the organization.

  • permissions (substra.sdk.schemas.Permissions) – Permissions for the algorithm.

  • traintuples (List[Dict]) – (List[Dict]): List of traintuples on which to search for the predicttuples parents.

  • cache (Dict[RemoteStruct, OperationKey]) – Already registered algorithm identifications. The key of each element is the RemoteStruct id (generated by substrafl) and the value is the key generated by substra.

  • dependencies (Dependency) – Algorithm dependencies.

Returns

updated cache

Return type

Dict[RemoteStruct, OperationKey]

summary() dict

Summary of the class to be exposed in the experiment summary file

Returns

a json-serializable dict with the attributes the user wants to store

Return type

dict

update_states(traintuple_id: str, operation: substrafl.remote.operations.DataOperation, round_idx: int)

Creating a test tuple based on the node characteristic.

Parameters
  • traintuple_id (str) – The substra parent id

  • operation (DataOperation) – Automatically generated structure returned by the remote_data() decorator. This allows to register an operation and execute it later on.

  • round_idx (int) – (int): Round number of the strategy starting at 1.

Node

class substrafl.nodes.node.Node(organization_id: str)

Bases: object

Parameters

organization_id (str) –

summary() dict

Summary of the class to be exposed in the experiment summary file For inherited classes, override this function and add super.summary()

Example

def summary(self):

    summary = super().summary()
    summary.update(
        {
            "attribute": self.attribute,
            ...
        }
    )
    return summary
Returns

a json-serializable dict with the attributes the user wants to store

Return type

dict

References

class substrafl.nodes.node.OperationKey(x)

Bases:

class substrafl.nodes.references.local_state.LocalStateRef(key: str)

Bases: object

Parameters

key (str) –

Return type

None

class substrafl.nodes.references.shared_state.SharedStateRef(key: str)

Bases: object

Parameters

key (str) –

Return type

None