autoprognosis.plugins.imputers.plugin_sinkhorn module

class SinkhornPlugin(random_state: int = 0, **kwargs: Any)

Bases: ImputerPlugin

Sinkhorn imputation can be used to impute quantitative data and it relies on the idea that two batches extracted randomly from the same dataset should share the same distribution and consists in minimizing optimal transport distances between batches.

Args:

eps: float, default=0.01
Sinkhorn regularization parameter.

lrfloat, default = 0.01
Learning rate.

opt: torch.nn.optim.Optimizer, default=torch.optim.Adam
Optimizer class to use for fitting.

n_epochsint, default=15
Number of gradient updates for each model within a cycle.

batch_sizeint, defatul=256
Size of the batches on which the sinkhorn divergence is evaluated.

n_pairsint, default=10
Number of batch pairs used per gradient update.

noisefloat, default = 0.1
Noise used for the missing values initialization.

scaling: float, default=0.9
Scaling parameter in Sinkhorn iterations

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("sinkhorn")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
          0         1         2         3
0  1.000000  1.000000  1.000000  1.000000
1  1.404637  1.651113  1.651093  1.404638
2  1.000000  2.000000  2.000000  1.000000
3  2.000000  2.000000  2.000000  2.000000

Reference: “Missing Data Imputation using Optimal Transport”, Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi: Original code: https://github.com/BorisMuzellec/MissingDataOT

change_output(output: str) → None

fit(X: DataFrame, *args: Any, **kwargs: Any) → Plugin

Train the plugin

Parameters:: X – pd.DataFrame

fit_predict(X: DataFrame, *args: Any, **kwargs: Any) → DataFrame: Fit the model and predict the training data. Used by predictors.

fit_transform(X: DataFrame, *args: Any, **kwargs: Any) → DataFrame: Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() → str: The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) → List[Params]: The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) → List[Params]: The hyperparameter domain using they fully-qualified name.

is_fitted() → bool: Check if the model was trained

classmethod load(buff: bytes) → ImputerPlugin: Load the plugin from bytes

static name() → str: The name of the plugin, e.g.: xgboost

predict(X: DataFrame, *args: Any, **kwargs: Any) → DataFrame

Run predictions for the input. Used by predictors.

Parameters:: X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) → Dict[str, Any]: Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) → Dict[str, Any]: Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) → Dict[str, Any]: Sample hyperparameters as a dict.

save() → bytes: Save the plugin to bytes

static subtype() → str: The type of the plugin, e.g.: classifier

transform(X: DataFrame) → DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters:: X – pd.DataFrame

static type() → str: The type of the plugin, e.g.: prediction

plugin: alias of SinkhornPlugin