autoprognosis.plugins.imputers.plugin_sinkhorn module

class SinkhornPlugin(random_state: int = 0, **kwargs: Any)

Bases: ImputerPlugin

Sinkhorn imputation can be used to impute quantitative data and it relies on the idea that two batches extracted randomly from the same dataset should share the same distribution and consists in minimizing optimal transport distances between batches.

Args:
eps: float, default=0.01

Sinkhorn regularization parameter.

lrfloat, default = 0.01

Learning rate.

opt: torch.nn.optim.Optimizer, default=torch.optim.Adam

Optimizer class to use for fitting.

n_epochsint, default=15

Number of gradient updates for each model within a cycle.

batch_sizeint, defatul=256

Size of the batches on which the sinkhorn divergence is evaluated.

n_pairsint, default=10

Number of batch pairs used per gradient update.

noisefloat, default = 0.1

Noise used for the missing values initialization.

scaling: float, default=0.9

Scaling parameter in Sinkhorn iterations

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("sinkhorn")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
          0         1         2         3
0  1.000000  1.000000  1.000000  1.000000
1  1.404637  1.651113  1.651093  1.404638
2  1.000000  2.000000  2.000000  1.000000
3  2.000000  2.000000  2.000000  2.000000
Reference: “Missing Data Imputation using Optimal Transport”, Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

Original code: https://github.com/BorisMuzellec/MissingDataOT

change_output(output: str) None
fit(X: DataFrame, *args: Any, **kwargs: Any) Plugin

Train the plugin

Parameters:

X – pd.DataFrame

fit_predict(X: DataFrame, *args: Any, **kwargs: Any) DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: DataFrame, *args: Any, **kwargs: Any) DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: DataFrame, *args: Any, **kwargs: Any) DataFrame

Run predictions for the input. Used by predictors.

Parameters:

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: DataFrame) DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters:

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of SinkhornPlugin