Organizations¶
AggregationNode¶
- class substrafl.nodes.aggregation_node.AggregationNode(organization_id: str)¶
Bases:
organizations.organization.Node
The node which applies operations to the shared states which are received from
TrainDataNode
data operations. The result is sent to theTrainDataNode
and/orTestDataNode
data operations.- Parameters
organization_id (str) –
- register_operations(client: substra.sdk.client.Client, permissions: substra.sdk.schemas.Permissions, cache: Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey], dependencies: substrafl.dependency.Dependency) Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey] ¶
Define the algorithms for each operation and submit the aggregated tuple to substra.
Go through every operation in the computation graph, check what algorithm they use (identified by their RemoteStruct id), submit it to substra and save RemoteStruct : algo_key into the cache (where algo_key is the returned algo key per substra.) If two tuples depend on the same algorithm, the algorithm won’t be added twice to substra as this method check if an algo has already been submitted to substra before adding it.
- Parameters
client (substra.Client) – Substra defined client used to register the operation.
permissions (substra.sdk.schemas.Permissions) – Substra permissions attached to the registered operation.
cache (Dict[RemoteStruct, OperationKey]) – Already registered algorithm identifications. The key of each element is the RemoteStruct id (generated by substrafl) and the value is the key generated by substra.
dependencies (Dependency) – Dependencies of the given operation.
- Returns
updated cache
- Return type
- update_states(operation: substrafl.remote.operations.AggregateOperation, round_idx: int, authorized_ids: List[str]) substrafl.nodes.references.shared_state.SharedStateRef ¶
Adding an aggregated tuple to the list of operations to be executed by the node during the compute plan. This is done in a static way, nothing is submitted to substra. This is why the algo key is a RemoteStruct (substrafl local reference of the algorithm) and not a substra algo_key as nothing has been submitted yet.
- Parameters
operation (AggregateOperation) – Automatically generated structure returned by the
remote()
decorator. This allows to register an operation and execute it later on.round_idx (int) – Round number, it starts at 1.
authorized_ids (List[str]) – Authorized org to access the output model.
- Raises
TypeError – operation must be an AggregateOperation, make sure to decorate your (user defined) aggregate function of the strategy with @remote.
- Returns
Identification for the result of this operation.
- Return type
TrainDataNode¶
- class substrafl.nodes.train_data_node.TrainDataNode(organization_id: str, data_manager_key: str, data_sample_keys: List[str])¶
Bases:
organizations.organization.Node
A predefined structure that allows you to register operations on your train node in a static way before submitting them to substra.
- Parameters
- register_operations(client: substra.sdk.client.Client, permissions: substra.sdk.schemas.Permissions, cache: Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey], dependencies: substrafl.dependency.Dependency) Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey] ¶
Define the algorithms for each operation and submit the composite traintuple to substra.
Go through every operation in the computation graph, check what algorithm they use (identified by their RemoteStruct id), submit it to substra and save RemoteStruct : algo_key into the cache (where algo_key is the returned algo key by substra.) If two tuples depend on the same algorithm, the algorithm won’t be added twice to substra as this method check if an algo has already been submitted to substra before adding it.
- Parameters
client (substra.Client) – Substra client for the organization.
permissions (substra.sdk.schemas.Permissions) – Permissions for the algorithm.
cache (Dict[RemoteStruct, OperationKey]) – Already registered algorithm identifications. The key of each element is the RemoteStruct id (generated by substrafl) and the value is the key generated by substra.
dependencies (Dependency) – Algorithm dependencies.
- Returns
updated cache
- Return type
- summary() dict ¶
Summary of the class to be exposed in the experiment summary file
- Returns
a json-serializable dict with the attributes the user wants to store
- Return type
- update_states(operation: substrafl.remote.operations.DataOperation, round_idx: int, authorized_ids: List[str], local_state: Optional[substrafl.nodes.references.local_state.LocalStateRef] = None) Tuple[substrafl.nodes.references.local_state.LocalStateRef, substrafl.nodes.references.shared_state.SharedStateRef] ¶
Adding a composite train tuple to the list of operations to be executed by the node during the compute plan. This is done in a static way, nothing is submitted to substra. This is why the algo key is a RemoteStruct (substrafl local reference of the algorithm) and not a substra algo_key as nothing has been submitted yet.
- Parameters
operation (DataOperation) – Automatically generated structure returned by the
remote_data()
decorator. This allows to register an operation and execute it later on.round_idx (int) – Round number, it starts at 1. In case of a centralized strategy, it is preceded by an initialization round tagged: 0.
authorized_ids (List[str]) – Authorized org to access the output model.
local_state (Optional[LocalStateRef]) – The parent task LocalStateRef. Defaults to None.
- Raises
TypeError – operation must be a DataOperation, make sure to decorate the train and predict methods of your algorithm with @remote
- Returns
Identifications for the results of this operation.
- Return type
TestDataNode¶
- class substrafl.nodes.test_data_node.TestDataNode(organization_id: str, data_manager_key: str, test_data_sample_keys: List[str], metric_keys: List[str])¶
Bases:
organizations.organization.Node
A node on which you will test your algorithm. A TestDataNode must also be a train data node for now.
- Parameters
organization_id (str) – The substra organization ID (shared with other organizations if permissions are needed)
data_manager_key (str) – Substra data_manager_key opening data samples used by the strategy
test_data_sample_keys (List[str]) – Substra data_sample_keys used for the training on this node
metric_keys (List[str]) – Substra metric keys to the metrics, use substra.Client().add_algo()
- register_predict_operations(client: substra.sdk.client.Client, permissions: substra.sdk.schemas.Permissions, traintuples: List[Dict], cache: Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey], dependencies: substrafl.dependency.Dependency) Dict[substrafl.remote.remote_struct.RemoteStruct, OperationKey] ¶
Find the algorithms from the parent traintuple of each predicttuple and submit it to substra with the predict algo category.
Go through every operation in the predict algo cache, submit it to substra and save RemoteStruct : algo_key into the cache (where algo_key is the returned algo key by substra.) If two predicttuples depend on the same algorithm, the algorithm won’t be added twice to substra as this method check if an algo has already been submitted as a predicttuple to substra before adding it.
- Parameters
client (substra.Client) – Substra client for the organization.
permissions (substra.sdk.schemas.Permissions) – Permissions for the algorithm.
traintuples (List[Dict]) – (List[Dict]): List of traintuples on which to search for the predicttuples parents.
cache (Dict[RemoteStruct, OperationKey]) – Already registered algorithm identifications. The key of each element is the RemoteStruct id (generated by substrafl) and the value is the key generated by substra.
dependencies (Dependency) – Algorithm dependencies.
- Returns
updated cache
- Return type
- summary() dict ¶
Summary of the class to be exposed in the experiment summary file
- Returns
a json-serializable dict with the attributes the user wants to store
- Return type
- update_states(traintuple_id: str, operation: substrafl.remote.operations.DataOperation, round_idx: int)¶
Creating a test tuple based on the node characteristic.
- Parameters
traintuple_id (str) – The substra parent id
operation (DataOperation) – Automatically generated structure returned by the
remote_data()
decorator. This allows to register an operation and execute it later on.round_idx (int) – (int): Round number of the strategy starting at 1.
Node¶
- class substrafl.nodes.node.Node(organization_id: str)¶
Bases:
object
- Parameters
organization_id (str) –
- summary() dict ¶
Summary of the class to be exposed in the experiment summary file For inherited classes, override this function and add
super.summary()
Example
def summary(self): summary = super().summary() summary.update( { "attribute": self.attribute, ... } ) return summary
- Returns
a json-serializable dict with the attributes the user wants to store
- Return type