mlcompare.data.DatasetFactory#
- class mlcompare.data.DatasetFactory(params_list)[source]#
Creates Dataset objects such as LocalDataset, KaggleDataset, etc. from a list of dictionaries.
Attributes:#
params_list (list[dict[str, Any]] | Path): List of dictionaries containing dataset parameters or a path to a .json file with one. For a list of keys required in each dictionary, see below:
- Required keys for all dataset types:
dataset_type Literal[“kaggle”, “local”]: Type of dataset. Accepts ‘kaggle’ or ‘local’. target (str): Name of the target column in the dataset.
- Additional required keys for ‘local’ datasets:
file_path (str | Path): Path to the local dataset file. It can be relative or absolute.
- Additional required keys for ‘kaggle’ datasets:
user (str): Kaggle username of the dataset owner. dataset (str): Name of the Kaggle dataset. file (str): Name of the file to download from the dataset.
- Optional Keys:
save_name (str): Name to use for files saved from this dataset. Should be unique across datasets. drop (list[str]): List of column names to drop from the downloaded data. one_hot_encode (list[str]): List of column names to encode using a specific encoding method.
Raises:#
AssertionError: If dataset_params is not a list of dictionaries or a path to a .json file containing one.
- Parameters:
params_list (ParamsInput)