Client¶

Client(url: Optional[str] = None, token: Optional[str] = None, retry_timeout: int = 300, insecure: bool = False, backend_type: substra.sdk.schemas.BackendType = <BackendType.REMOTE: 'remote'>)

Create a client

Arguments:

url (str, optional): URL of the Substra platform. Mandatory to connect to a Substra platform. If no URL is given debug must be True and all assets must be created locally. Defaults to None.
token (str, optional): Token to authenticate to the Substra platform. If no token is given, use the ‘login’ function to authenticate. Defaults to None.
retry_timeout (int, optional): Number of seconds before attempting a retry call in case of timeout. Defaults to 5 minutes.
insecure (bool, optional): If True, the client can call a not-certified backend. This is for development purposes. Defaults to False.
backend_type (schemas.BackendType, optional): Which mode to use. Possible values are remote, docker and subprocess. Defaults to remote. In remotemode, assets are registered on a deployed platform which also executes the tasks. In subprocess or docker mode, if no URL is given then all assets are created locally and tasks are executed locally. If a URL is given then the mode is a hybrid one: new assets are created locally but can access assets from the deployed Substra platform. The platform is in read-only mode and tasks are executed locally.

backend_mode¶

This is a property.
Get the backend mode: deployed, local and which type of local mode

    Returns:
        str: Backend mode

temp_directory¶

This is a property.
Temporary directory for storing assets in debug mode. Deleted when the client is deleted.

add_algo¶

add_algo(self, data: Union[dict, substra.sdk.schemas.AlgoSpec]) -> str

Create new algo asset.

Arguments:

data (Union[dict, schemas.AlgoSpec], required): If it is a dict, it must have the same keys as specified in schemas.AlgoSpec.

Returns:

str: Key of the algo

add_compute_plan¶

add_compute_plan(self, data: Union[dict, substra.sdk.schemas.ComputePlanSpec], auto_batching: bool = True, batch_size: int = 500) -> substra.sdk.models.ComputePlan

Create new compute plan asset.

Arguments:

data (Union[dict, schemas.ComputePlanSpec], required): If it is a dict, it must have the same keys as specified in schemas.ComputePlanSpec.
auto_batching (bool, optional): Set ‘auto_batching’ to False to upload all the tuples of the compute plan at once. Defaults to True.
batch_size (int, optional): If ‘auto_batching’ is True, change batch_size to define the number of tuples uploaded in each batch (default 500).

Returns:

models.ComputePlan: Created compute plan

add_compute_plan_tuples¶

add_compute_plan_tuples(self, key: str, tuples: Union[dict, substra.sdk.schemas.UpdateComputePlanTuplesSpec], auto_batching: bool = True, batch_size: int = 500) -> substra.sdk.models.ComputePlan

Update compute plan.

Arguments:

key (str, required): Compute plan key
tuples (Union[dict, schemas.UpdateComputePlanTuplesSpec], required): If it is a dict, it must have the same keys as specified in schemas.UpdateComputePlanTuplesSpec.
auto_batching (bool, optional): Set ‘auto_batching’ to False to upload all the tuples of the compute plan at once. Defaults to True.
batch_size (int, optional): If ‘auto_batching’ is True, change batch_size to define the number of tuples uploaded in each batch (default 500).

Returns:

models.ComputePlan: updated compute plan, as described in the models.ComputePlan model

add_data_sample¶

add_data_sample(self, data: Union[dict, substra.sdk.schemas.DataSampleSpec], local: bool = True) -> str

Create a new data sample asset and return its key.

Arguments:

data (Union[dict, schemas.DataSampleSpec], required): data sample to add. If it is a dict, it must follow the DataSampleSpec schema.
local (bool, optional): If local is true, path must refer to a directory located on the local filesystem. The file content will be transferred to the server through an HTTP query, so this mode should be used for relatively small files (<10mo).

If local is false, path must refer to a directory located on the server filesystem. This directory must be accessible (readable) by the server. This mode is well suited for all kind of file sizes. Defaults to True.

Returns:

str: key of the data sample

add_data_samples¶

add_data_samples(self, data: Union[dict, substra.sdk.schemas.DataSampleSpec], local: bool = True) -> List[str]

Create many data sample assets and return a list of keys. Create multiple data samples through a single HTTP request. This method is well suited for adding multiple small files only. For adding a large amount of data it is recommended to add them one by one. It allows a better control in case of failures.

Arguments:

data (Union[dict, schemas.DataSampleSpec], required): data samples to add. If it is a dict, it must follow the DataSampleSpec schema. The paths in the data dictionary must be a list of paths where each path points to a directory representing one data sample.
local (bool, optional): Please refer to the method Client.add_data_sample. Defaults to True.

Returns:

List[str]: List of the data sample keys

add_dataset¶

add_dataset(self, data: Union[dict, substra.sdk.schemas.DatasetSpec])

Create new dataset asset and return its key.

Arguments:

data (Union[dict, schemas.DatasetSpec], required): If it is a dict, it must have the same keys as specified in schemas.DatasetSpec.

Returns:

str: Key of the dataset

add_task¶

add_task(self, data: Union[dict, substra.sdk.schemas.TaskSpec]) -> str

Create new task asset.

Arguments:

data (Union[dict, schemas.TaskSpec], required): If it is a dict, it must have the same keys as specified in schemas.TaskSpec.

Returns:

str: Key of the asset

cancel_compute_plan¶

cancel_compute_plan(self, key: str) -> None

Cancel execution of compute plan. Nothing is returned by this method

describe_algo¶

describe_algo(self, key: str) -> str

Get algo description.

describe_dataset¶

describe_dataset(self, key: str) -> str

Get dataset description.

download_algo¶

download_algo(self, key: str, destination_folder: str) -> None

Download algo resource. Download algo package in destination folder.

Arguments:

key (str, required): Algo key to download
destination_folder (str, required): Destination folder

Returns:

pathlib.Path: Path of the downloaded algo

download_dataset¶

download_dataset(self, key: str, destination_folder: str) -> None

Download data manager resource. Download opener script in destination folder.

Arguments:

key (str, required): Dataset key to download
destination_folder (str, required): Destination folder

Returns:

pathlib.Path: Path of the downloaded dataset

download_logs¶

download_logs(self, tuple_key: str, folder: str) -> str

Download the execution logs of a failed tuple to a destination file. The logs are saved in the folder to a file named ‘tuple_logs_{tuple_key}.txt’.

Logs are only available for tuples that experienced an execution failure. Attempting to retrieve logs for tuples in any other states or for non-existing tuples will result in a NotFound error.

Arguments:

tuple_key : the key of the tuple that produced the logs
folder : the destination directory

Returns:

str: The logs as a str

download_model¶

download_model(self, key: str, destination_folder) -> None

Download model to destination file. This model was saved using the ‘save_model’ function of the algorithm. To load and use the model, please refer to the ‘load_model’ and ‘predict’ functions of the algorithm.

Arguments:

key (str, required): Model key to download
destination_folder (str, required): Destination folder

Returns:

pathlib.Path: Path of the downloaded model

download_model_from_task¶

download_model_from_task(self, task_key: str, identifier: str, folder: os.PathLike) -> pathlib.Path

Download task model to destination file. This model was saved using the ‘save_model’ function of the algorithm. To load and use the model, please refer to the ‘load_model’ and ‘predict’ functions of the algorithm.

Arguments:

task_key (str, required): Task key to download
identifier (str, required): output identifier
folder (os.PathLike, required): Destination folder

Returns:

pathlib.Path: Path of the downloaded model

from_config_file¶

from_config_file(profile_name: str = 'default', config_path: Union[str, pathlib.Path] = '~/.substra', tokens_path: Union[str, pathlib.Path] = '~/.substra-tokens', token: Optional[str] = None, retry_timeout: int = 300, backend_type: substra.sdk.schemas.BackendType = <BackendType.REMOTE: 'remote'>)

Returns a new Client configured with profile data from configuration files.

Arguments:

profile_name (str, optional): Name of the profile to load. Defaults to ‘default’.
config_path (Union[str, pathlib.Path], optional): Path to the configuration file. Defaults to ‘~/.substra’.
tokens_path (Union[str, pathlib.Path], optional): Path to the tokens file. Defaults to ‘~/.substra-tokens’.
token (str, optional): Token to use for authentication (will be used instead of any token found at tokens_path). Defaults to None.
retry_timeout (int, optional): Number of seconds before attempting a retry call in case of timeout. Defaults to 5 minutes.
backend_type (schemas.BackendType, optional): Which mode to use. Defaults to deployed. In deployed mode, assets are registered on a deployed platform which also executes the tasks. In local mode (subprocess or docker), if no URL is given then all assets are created locally and tasks are executed locally. In local mode (subprocess or docker), if a URL is given then the mode is a hybrid one: new assets are created locally but can access assets from the deployed Substra platform. The platform is in read-only mode and tasks are executed locally. The local mode is either docker or subprocess mode: docker if you want the tasks to be executed in containers (default) or subprocess to execute them in Python subprocesses (faster, experimental: The Dockerfile commands are not executed, requires dependencies to be installed locally).

Returns:

Client: The new client.

get_algo¶

get_algo(self, key: str) -> substra.sdk.models.Algo

Get algo by key, the returned object is described in the models.Algo model

get_compute_plan¶

get_compute_plan(self, key: str) -> substra.sdk.models.ComputePlan

Get compute plan by key, the returned object is described in the models.ComputePlan model

get_data_sample¶

get_data_sample(self, key: str) -> substra.sdk.models.DataSample

Get data sample by key, the returned object is described in the models.Datasample model

get_dataset¶

get_dataset(self, key: str) -> substra.sdk.models.Dataset

Get dataset by key, the returned object is described in the models.Dataset model

get_logs¶

get_logs(self, tuple_key: str) -> str

Get tuple logs by tuple key, the returned object is a string containing the logs.

Logs are only available for tuples that experienced an execution failure. Attempting to retrieve logs for tuples in any other states or for non-existing tuples will result in a NotFound error.

get_model¶

get_model(self, key: str) -> substra.sdk.models.OutModel

None

get_performances¶

get_performances(self, key: str) -> substra.sdk.models.Performances

Get the compute plan performances by key, the returned object is described in the models.Performances and easily convertible to pandas dataframe.

Examples:

perf = client.get_performances(cp_key)
df = pd.DataFrame(perf.dict())
print(df)

get_task¶

get_task(self, key: str) -> substra.sdk.models.Task

Get task by key, the returned object is described in the models.Task model

link_dataset_with_data_samples¶

link_dataset_with_data_samples(self, dataset_key: str, data_sample_keys: List[str]) -> List[str]

Link dataset with data samples.

list_algo¶

list_algo(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.Algo]

List algos. The filters argument is a dictionary, with those possible keys:

key (List[str]): list algo with given keys.

name (str): list algo with name partially matching given string. Remote mode only.

owner (List[str]): list algo with given owners.

metadata (dict)
    {
        "key": str # the key of the metadata to filter on
        "type": "is", "contains" or "exists" # the type of query that will be used
        "value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
    }: list algo matching provided conditions in metadata.

permissions (List[str]): list algo which can be used by any of the listed nodes. Remote mode only.
compute_plan_key (str): list algo that are in the given compute plan. Remote mode only.
dataset_key (str): list algo linked or using this dataset. Remote mode only.
data_sample_key (List[str]): list algo linked or that used this data sample(s). Remote mode only.

Arguments:

filters (dict, optional): List of key values pair to filter on. Default None.
ascending (bool, optional): Sorts results by oldest creation_date first. Default False (descending order).

Returns:

models.Algo: the returned object is described in the models.Algo model

list_compute_plan¶

list_compute_plan(self, filters: dict = None, order_by: str = 'creation_date', ascending: bool = False) -> List[substra.sdk.models.ComputePlan]

List compute plans. The filters argument is a dictionary, with those possible keys:

key (List[str]): list compute plans with listed keys.

name (str): list compute plans with name partially matching given string. Remote mode only.

owner (List[str]): list compute plans with listed owners.

worker (List[str]): list compute plans which ran on listed workers. Remote mode only.

status (List[str]): list compute plans with given status.
    The possible values are the values of `substra.models.ComputePlanStatus`
metadata (dict)
    {
        "key": str # the key of the metadata to filter on
        "type": "is", "contains" or "exists" # the type of query that will be used
        "value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
    }: list compute plans matching provided conditions in metadata. Remote mode only.
algo_key (str): list compute plans that used the given algo. Remote mode only.
dataset_key (str): list compute plans linked or using this dataset. Remote mode only.
data_sample_key (List[str]): list compute plans linked or that used this data sample(s). Remote mode only.

Arguments:

filters (dict, optional): List of key values pair to filter on. Default None.
order_by (str, optional): Field to sort results by. Possible values: creation_date, start_date, end_date. Default creation_date.
ascending (bool, optional): Sorts results on order_by by ascending order. Default False (descending order).

Returns:

models.ComputePlan: the returned object is described

list_data_sample¶

list_data_sample(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.DataSample]

List data samples. The filters argument is a dictionary, with those possible keys:

    key (List[str]): list data samples with listed keys.
    owner (List[str]): list data samples with listed owners.
    compute_plan_key (str): list data samples that are in the given compute plan. Remote mode only.
    algo_key (str): list data samples that used the given algo. Remote mode only.
    dataset_key (str): list data samples linked or using this dataset. Remote mode only.

Arguments:

filters (dict, optional): List of key values pair to filter on. Default None.
ascending (bool, optional): Sorts results on order_by by ascending order. Default False (descending order).

Returns:

models.DataSample: the returned object is described in the models.DataSample model

list_dataset¶

list_dataset(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.Dataset]

List datasets. The filters argument is a dictionary, with those possible keys:

key (List[str]): list dataset with given keys.

name (str): list dataset with name partially matching given string. Remote mode only.

owner (List[str]): list dataset with given owners.

metadata (dict)
    {
        "key": str # the key of the metadata to filter on
        "type": "is", "contains" or "exists" # the type of query that will be used
        "value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
    }: list dataset matching provided conditions in metadata.

permissions (List[str]) : list dataset which can be used by any of the listed nodes. Remote mode only.
compute_plan_key (str): list dataset that are in the given compute plan. Remote mode only.
algo_key (str): list dataset that used the given algo. Remote mode only.
data_sample_key (List[str]): list dataset linked or that used this data sample(s). Remote mode only.

Arguments:

filters (dict, optional): List of key values pair to filter on. Default None.
ascending (bool, optional): Sorts results by oldest creation_date first. Default False (descending order).

Returns:

models.Dataset: the returned object is described

list_model¶

list_model(self, filters: dict = None, ascending: bool = False) -> List[substra.sdk.models.OutModel]

List models. The filters argument is a dictionnary, with those possible keys:

key (list[str]): list model with given keys.

compute_task_key (list[str]): list model produced by this compute task.

owner (list[str]): list model with given owners.

permissions (list[str]): list models which can be used by any of the listed nodes. Remote mode only.

Arguments:

filters (dict, optional): List of key values pair to filter on. Default None.
ascending (bool, optional): Sorts results by oldest creation_date first. Default False (descending order).

Returns: models.OutModel the returned object is described in the models.OutModel model

list_organization¶

list_organization(self, *args, **kwargs) -> List[substra.sdk.models.Organization]

List organizations, the returned object is described in the models.Organization model

list_task¶

list_task(self, filters: dict = None, order_by: str = 'creation_date', ascending: bool = False) -> List[substra.sdk.models.Task]

List tasks. The filters argument is a dictionary, with those possible keys:

key (List[str]): list tasks with listed keys.

owner (List[str]): list tasks with listed owners.

worker (List[str]): list tasks which ran on listed workers. Remote mode only.

rank (List[int]): list tasks which are at given ranks.

status (List[str]): list tasks with given status.
    The possible values are the values of `substra.models.Status`
metadata (dict)
    {
        "key": str # the key of the metadata to filter on
        "type": "is", "contains" or "exists" # the type of query that will be used
        "value": str # the value that the key must be (if type is "is") or contain (if type if "contains")
    }: list tasks matching provided conditions in metadata. Remote mode only.
compute_plan_key (str): list tasks that are in the given compute plan. Remote mode only.
algo_key (str): list tasks that used the given algo. Remote mode only.

Arguments:

filters (dict, optional): List of key values pair to filter on. Default None.
order_by (str, optional): Field to sort results by. Possible values: creation_date, start_date, end_date. Default creation_date.
ascending (bool, optional): Sorts results on order_by by ascending order. Default False (descending order).

Returns:

models.Task: the returned object is described in the models.Task model

organization_info¶

organization_info(self) -> substra.sdk.models.OrganizationInfo

Get organization information.

update_algo¶

update_algo(self, key: str, name: str)

None

update_compute_plan¶

update_compute_plan(self, key: str, name: str)

None

update_dataset¶

update_dataset(self, key: str, name: str)

None

retry_on_exception¶

retry_on_exception(exceptions, timeout=300)

Retry function in case of exception(s).

Arguments:

exceptions (list, required): list of exception types that trigger a retry
timeout (int, optional): timeout in seconds

Examples:

from substra.sdk import exceptions, retry_on_exception

def my_function(arg1, arg2):
    pass

retry = retry_on_exception(
            exceptions=(exceptions.RequestTimeout),
            timeout=300,
        )
retry(my_function)(arg1, arg2)

Client¶

backend_mode¶

temp_directory¶

add_algo¶

add_compute_plan¶

add_compute_plan_tuples¶

add_data_sample¶

add_data_samples¶

add_dataset¶

add_task¶

cancel_compute_plan¶

describe_algo¶

describe_dataset¶

download_algo¶

download_dataset¶

download_logs¶

download_model¶

download_model_from_task¶

from_config_file¶

get_algo¶

get_compute_plan¶

get_data_sample¶

get_dataset¶

get_logs¶

get_model¶

get_performances¶

get_task¶

link_dataset_with_data_samples¶

list_algo¶

list_compute_plan¶

list_data_sample¶

list_dataset¶

list_model¶

list_organization¶

list_task¶

login¶

organization_info¶

update_algo¶

update_compute_plan¶

update_dataset¶

retry_on_exception¶