Client¶
Client(url: Optional[str] = None, token: Optional[str] = None, retry_timeout: int = 300, insecure: bool = False, backend_type: substra.sdk.schemas.BackendType = <BackendType.REMOTE: 'remote'>)
Create a client
Arguments:
url (str, optional)
: URL of the Substra platform. Mandatory to connect to a Substra platform. If no URL is given debug must be True and all assets must be created locally. Defaults to None.token (str, optional)
: Token to authenticate to the Substra platform. If no token is given, use the ‘login’ function to authenticate. Defaults to None.retry_timeout (int, optional)
: Number of seconds before attempting a retry call in case of timeout. Defaults to 5 minutes.insecure (bool, optional)
: If True, the client can call a not-certified backend. This is for development purposes. Defaults to False.backend_type (schemas.BackendType, optional)
: Which mode to use. Possible values areremote
,docker
andsubprocess
. Defaults toremote
. Inremote
mode, assets are registered on a deployed platform which also executes the tasks. Insubprocess
ordocker
mode, if no URL is given then all assets are created locally and tasks are executed locally. If a URL is given then the mode is a hybrid one: new assets are created locally but can access assets from the deployed Substra platform. The platform is in read-only mode and tasks are executed locally.
temp_directory¶
This is a property.
Temporary directory for storing assets in debug mode.
Deleted when the client is deleted.
add_algo¶
add_algo(self, data: Union[dict, substra.sdk.schemas.AlgoSpec]) -> str
Create new algo asset.
Arguments:
data (Union[dict, schemas.AlgoSpec], required)
: If it is a dict, it must have the same keys as specified in schemas.AlgoSpec.
Returns:
str
: Key of the algo
add_compute_plan¶
add_compute_plan(self, data: Union[dict, substra.sdk.schemas.ComputePlanSpec], auto_batching: bool = True, batch_size: int = 500) -> substra.sdk.models.ComputePlan
Create new compute plan asset.
Arguments:
data (Union[dict, schemas.ComputePlanSpec], required)
: If it is a dict, it must have the same keys as specified in schemas.ComputePlanSpec.auto_batching (bool, optional)
: Set ‘auto_batching’ to False to upload all the tasks of the compute plan at once. Defaults to True.batch_size (int, optional)
: If ‘auto_batching’ is True, changebatch_size
to define the number of tasks uploaded in each batch (default 500).
Returns:
models.ComputePlan
: Created compute plan
add_compute_plan_tasks¶
add_compute_plan_tasks(self, key: str, tasks: Union[dict, substra.sdk.schemas.UpdateComputePlanTasksSpec], auto_batching: bool = True, batch_size: int = 500) -> substra.sdk.models.ComputePlan
Update compute plan.
Arguments:
key (str, required)
: Compute plan keytasks (Union[dict, schemas.UpdateComputePlanTasksSpec], required)
: If it is a dict, it must have the same keys as specified in schemas.UpdateComputePlanTasksSpec.auto_batching (bool, optional)
: Set ‘auto_batching’ to False to upload all the tasks of the compute plan at once. Defaults to True.batch_size (int, optional)
: If ‘auto_batching’ is True, changebatch_size
to define the number of tasks uploaded in each batch (default 500).
Returns:
models.ComputePlan
: updated compute plan, as described in the models.ComputePlan model
add_data_sample¶
add_data_sample(self, data: Union[dict, substra.sdk.schemas.DataSampleSpec], local: bool = True) -> str
Create a new data sample asset and return its key.
Arguments:
data (Union[dict, schemas.DataSampleSpec], required)
: data sample to add. If it is a dict, it must follow the DataSampleSpec schema.local (bool, optional)
: Iflocal
is true,path
must refer to a directory located on the local filesystem. The file content will be transferred to the server through an HTTP query, so this mode should be used for relatively small files (<10mo).
If local
is false, path
must refer to a directory located on the server
filesystem. This directory must be accessible (readable) by the server. This
mode is well suited for all kind of file sizes. Defaults to True.
Returns:
str
: key of the data sample
add_data_samples¶
add_data_samples(self, data: Union[dict, substra.sdk.schemas.DataSampleSpec], local: bool = True) -> List[str]
Create many data sample assets and return a list of keys. Create multiple data samples through a single HTTP request. This method is well suited for adding multiple small files only. For adding a large amount of data it is recommended to add them one by one. It allows a better control in case of failures.
Arguments:
data (Union[dict, schemas.DataSampleSpec], required)
: data samples to add. If it is a dict, it must follow the DataSampleSpec schema. Thepaths
in the data dictionary must be a list of paths where each path points to a directory representing one data sample.local (bool, optional)
: Please refer to the methodClient.add_data_sample
. Defaults to True.
Returns:
List[str]
: List of the data sample keys
add_dataset¶
add_dataset(self, data: Union[dict, substra.sdk.schemas.DatasetSpec])
Create new dataset asset and return its key.
Arguments:
data (Union[dict, schemas.DatasetSpec], required)
: If it is a dict, it must have the same keys as specified in schemas.DatasetSpec.
Returns:
str
: Key of the dataset
add_task¶
add_task(self, data: Union[dict, substra.sdk.schemas.TaskSpec]) -> str
Create new task asset.
Arguments:
data (Union[dict, schemas.TaskSpec], required)
: If it is a dict, it must have the same keys as specified in schemas.TaskSpec.
Returns:
str
: Key of the asset
cancel_compute_plan¶
cancel_compute_plan(self, key: str) -> None
Cancel execution of compute plan. Nothing is returned by this method
download_algo¶
download_algo(self, key: str, destination_folder: str) -> None
Download algo resource. Download algo package in destination folder.
Arguments:
key (str, required)
: Algo key to downloaddestination_folder (str, required)
: Destination folder
Returns:
pathlib.Path
: Path of the downloaded algo
download_dataset¶
download_dataset(self, key: str, destination_folder: str) -> None
Download data manager resource. Download opener script in destination folder.
Arguments:
key (str, required)
: Dataset key to downloaddestination_folder (str, required)
: Destination folder
Returns:
pathlib.Path
: Path of the downloaded dataset
download_logs¶
download_logs(self, task_key: str, folder: str) -> str
Download the execution logs of a failed task to a destination file. The logs are saved in the folder to a file named ‘task_logs_{task_key}.txt’.
Logs are only available for tasks that experienced an execution failure. Attempting to retrieve logs for tasks in any other states or for non-existing tasks will result in a NotFound error.
Arguments:
task_key
: the key of the task that produced the logsfolder
: the destination directory
Returns:
str
: The logs as a str
download_model¶
download_model(self, key: str, destination_folder) -> None
Download model to destination file. This model was saved using the ‘save_model’ function of the algorithm. To load and use the model, please refer to the ‘load_model’ and ‘predict’ functions of the algorithm.
Arguments:
key (str, required)
: Model key to downloaddestination_folder (str, required)
: Destination folder
Returns:
pathlib.Path
: Path of the downloaded model
download_model_from_task¶
download_model_from_task(self, task_key: str, identifier: str, folder: os.PathLike) -> pathlib.Path
Download task model to destination file. This model was saved using the ‘save_model’ function of the algorithm. To load and use the model, please refer to the ‘load_model’ and ‘predict’ functions of the algorithm.
Arguments:
task_key (str, required)
: Task key to downloadidentifier (str, required)
: output identifierfolder (os.PathLike, required)
: Destination folder
Returns:
pathlib.Path
: Path of the downloaded model
from_config_file¶
from_config_file(profile_name: str = 'default', config_path: Union[str, pathlib.Path] = '~/.substra', tokens_path: Union[str, pathlib.Path] = '~/.substra-tokens', token: Optional[str] = None, retry_timeout: int = 300, backend_type: substra.sdk.schemas.BackendType = <BackendType.REMOTE: 'remote'>)
Returns a new Client configured with profile data from configuration files.
Arguments:
profile_name (str, optional)
: Name of the profile to load. Defaults to ‘default’.config_path (Union[str, pathlib.Path], optional)
: Path to the configuration file. Defaults to ‘~/.substra’.tokens_path (Union[str, pathlib.Path], optional)
: Path to the tokens file. Defaults to ‘~/.substra-tokens’.token (str, optional)
: Token to use for authentication (will be used instead of any token found at tokens_path). Defaults to None.retry_timeout (int, optional)
: Number of seconds before attempting a retry call in case of timeout. Defaults to 5 minutes.backend_type (schemas.BackendType, optional)
: Which mode to use. Possible values areremote
,docker
andsubprocess
. Defaults toremote
. Inremote
mode, assets are registered on a deployed platform which also executes the tasks. Insubprocess
ordocker
mode, if no URL is given then all assets are created locally and tasks are executed locally. If a URL is given then the mode is a hybrid one: new assets are created locally but can access assets from the deployed Substra platform. The platform is in read-only mode and tasks are executed locally.
Returns:
Client
: The new client.
get_algo¶
get_algo(self, key: str) -> substra.sdk.models.Algo
Get algo by key, the returned object is described in the models.Algo model
get_compute_plan¶
get_compute_plan(self, key: str) -> substra.sdk.models.ComputePlan
Get compute plan by key, the returned object is described in the models.ComputePlan model
get_data_sample¶
get_data_sample(self, key: str) -> substra.sdk.models.DataSample
Get data sample by key, the returned object is described in the models.Datasample model
get_dataset¶
get_dataset(self, key: str) -> substra.sdk.models.Dataset
Get dataset by key, the returned object is described in the models.Dataset model
get_logs¶
get_logs(self, task_key: str) -> str
Get task logs by task key, the returned object is a string containing the logs.
Logs are only available for tasks that experienced an execution failure. Attempting to retrieve logs for tasks in any other states or for non-existing tasks will result in a NotFound error.
get_performances¶
get_performances(self, key: str) -> substra.sdk.models.Performances
Get the compute plan performances by key, the returned object is described in the models.Performances and easily convertible to pandas dataframe.
Examples:
perf = client.get_performances(cp_key)
df = pd.DataFrame(perf.dict())
print(df)
get_task¶
get_task(self, key: str) -> substra.sdk.models.Task
Get task by key, the returned object is described in the models.Task model
link_dataset_with_data_samples¶
link_dataset_with_data_samples(self, dataset_key: str, data_sample_keys: List[str]) -> List[str]
Link dataset with data samples.
list_algo¶
list_algo(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.Algo]
List algos.
The filters
argument is a dictionary, with those possible keys:
key (List[str]): list algo with given keys.
name (str): list algo with name partially matching given string. Remote mode only.
owner (List[str]): list algo with given owners.
metadata (dict)
{
"key": str # the key of the metadata to filter on
"type": "is", "contains" or "exists" # the type of query that will be used
"value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
}: list algo matching provided conditions in metadata.
permissions (List[str]): list algo which can be used by any of the listed nodes. Remote mode only.
compute_plan_key (str): list algo that are in the given compute plan. Remote mode only.
dataset_key (str): list algo linked or using this dataset. Remote mode only.
data_sample_key (List[str]): list algo linked or that used this data sample(s). Remote mode only.
Arguments:
filters (dict, optional)
: List of key values pair to filter on. Default None.ascending (bool, optional)
: Sorts results by oldest creation_date first. Default False (descending order).
Returns:
models.Algo
: the returned object is described in the models.Algo model
list_compute_plan¶
list_compute_plan(self, filters: dict = None, order_by: str = 'creation_date', ascending: bool = False) -> List[substra.sdk.models.ComputePlan]
List compute plans.
The filters
argument is a dictionary, with those possible keys:
key (List[str]): list compute plans with listed keys.
name (str): list compute plans with name partially matching given string. Remote mode only.
owner (List[str]): list compute plans with listed owners.
worker (List[str]): list compute plans which ran on listed workers. Remote mode only.
status (List[str]): list compute plans with given status.
The possible values are the values of `substra.models.ComputePlanStatus`
metadata (dict)
{
"key": str # the key of the metadata to filter on
"type": "is", "contains" or "exists" # the type of query that will be used
"value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
}: list compute plans matching provided conditions in metadata. Remote mode only.
algo_key (str): list compute plans that used the given algo. Remote mode only.
dataset_key (str): list compute plans linked or using this dataset. Remote mode only.
data_sample_key (List[str]): list compute plans linked or that used this data sample(s). Remote mode only.
Arguments:
filters (dict, optional)
: List of key values pair to filter on. Default None.order_by (str, optional)
: Field to sort results by. Possible values:creation_date
,start_date
,end_date
. Default creation_date.ascending (bool, optional)
: Sorts results on order_by by ascending order. Default False (descending order).
Returns:
models.ComputePlan
: the returned object is described
list_data_sample¶
list_data_sample(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.DataSample]
List data samples.
The filters
argument is a dictionary, with those possible keys:
key (List[str]): list data samples with listed keys.
owner (List[str]): list data samples with listed owners.
compute_plan_key (str): list data samples that are in the given compute plan. Remote mode only.
algo_key (str): list data samples that used the given algo. Remote mode only.
dataset_key (str): list data samples linked or using this dataset. Remote mode only.
Arguments:
filters (dict, optional)
: List of key values pair to filter on. Default None.ascending (bool, optional)
: Sorts results on order_by by ascending order. Default False (descending order).
Returns:
models.DataSample
: the returned object is described in the models.DataSample model
list_dataset¶
list_dataset(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.Dataset]
List datasets.
The filters
argument is a dictionary, with those possible keys:
key (List[str]): list dataset with given keys.
name (str): list dataset with name partially matching given string. Remote mode only.
owner (List[str]): list dataset with given owners.
metadata (dict)
{
"key": str # the key of the metadata to filter on
"type": "is", "contains" or "exists" # the type of query that will be used
"value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
}: list dataset matching provided conditions in metadata.
permissions (List[str]) : list dataset which can be used by any of the listed nodes. Remote mode only.
compute_plan_key (str): list dataset that are in the given compute plan. Remote mode only.
algo_key (str): list dataset that used the given algo. Remote mode only.
data_sample_key (List[str]): list dataset linked or that used this data sample(s). Remote mode only.
Arguments:
filters (dict, optional)
: List of key values pair to filter on. Default None.ascending (bool, optional)
: Sorts results by oldest creation_date first. Default False (descending order).
Returns:
models.Dataset
: the returned object is described
list_model¶
list_model(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.OutModel]
List models.
The filters
argument is a dictionnary, with those possible keys:
key (list[str]): list model with given keys.
compute_task_key (list[str]): list model produced by this compute task.
owner (list[str]): list model with given owners.
permissions (list[str]): list models which can be used by any of the listed nodes. Remote mode only.
Arguments:
filters (dict, optional)
: List of key values pair to filter on. Default None.ascending (bool, optional)
: Sorts results by oldest creation_date first. Default False (descending order).
Returns: models.OutModel the returned object is described in the models.OutModel model
list_organization¶
list_organization(self, *args, **kwargs) -> List[substra.sdk.models.Organization]
List organizations, the returned object is described in the models.Organization model
list_task¶
list_task(self, filters: dict = None, order_by: str = 'creation_date', ascending: bool = False) -> List[substra.sdk.models.Task]
List tasks.
The filters
argument is a dictionary, with those possible keys:
key (List[str]): list tasks with listed keys.
owner (List[str]): list tasks with listed owners.
worker (List[str]): list tasks which ran on listed workers. Remote mode only.
rank (List[int]): list tasks which are at given ranks.
status (List[str]): list tasks with given status.
The possible values are the values of `substra.models.Status`
metadata (dict)
{
"key": str # the key of the metadata to filter on
"type": "is", "contains" or "exists" # the type of query that will be used
"value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
}: list tasks matching provided conditions in metadata. Remote mode only.
compute_plan_key (str): list tasks that are in the given compute plan. Remote mode only.
algo_key (str): list tasks that used the given algo. Remote mode only.
Arguments:
filters (dict, optional)
: List of key values pair to filter on. Default None.order_by (str, optional)
: Field to sort results by. Possible values:creation_date
,start_date
,end_date
. Default creation_date.ascending (bool, optional)
: Sorts results on order_by by ascending order. Default False (descending order).
Returns:
models.Task
: the returned object is described in the models.Task model
organization_info¶
organization_info(self) -> substra.sdk.models.OrganizationInfo
Get organization information.
retry_on_exception¶
retry_on_exception(exceptions, timeout=300)
Retry function in case of exception(s).
Arguments:
exceptions (list, required)
: list of exception types that trigger a retrytimeout (int, optional)
: timeout in seconds
Examples:
from substra.sdk import exceptions, retry_on_exception
def my_function(arg1, arg2):
pass
retry = retry_on_exception(
exceptions=(exceptions.RequestTimeout),
timeout=300,
)
retry(my_function)(arg1, arg2)