mlcompare.data.DatasetFactory#

class mlcompare.data.DatasetFactory(params_list)[source]#

Creates Dataset objects such as LocalDataset, KaggleDataset, etc. from a list of dictionaries.

Attributes:#

params_list (list[dict[str, Any]] | Path): List of dictionaries containing dataset parameters or a path to a .json file with one. For a list of keys required in each dictionary, see below:

Required keys for all dataset types:

dataset_type Literal[“kaggle”, “local”]: Type of dataset. Accepts ‘kaggle’ or ‘local’. target (str): Name of the target column in the dataset.

Additional required keys for ‘local’ datasets:

file_path (str | Path): Path to the local dataset file. It can be relative or absolute.

Additional required keys for ‘kaggle’ datasets:

user (str): Kaggle username of the dataset owner. dataset (str): Name of the Kaggle dataset. file (str): Name of the file to download from the dataset.

Optional Keys:

save_name (str): Name to use for files saved from this dataset. Should be unique across datasets. drop (list[str]): List of column names to drop from the downloaded data. one_hot_encode (list[str]): List of column names to encode using a specific encoding method.

Raises:#

AssertionError: If dataset_params is not a list of dictionaries or a path to a .json file containing one.

Parameters:

params_list (ParamsInput)