autoprognosis.plugins.imputers.plugin_EM module

class EMPlugin(random_state: int = 0, **kwargs: Any)

Bases: ImputerPlugin

The EM algorithm is an optimization algorithm that assumes a distribution for the partially missing data and tries to maximize the expected complete data log-likelihood under that distribution.

Steps:

For an input dataset X with missing values, we assume that the values are sampled from distribution N(Mu, Sigma).

We generate the “observed” and “missing” masks from X, and choose some initial values for Mu = Mu0 and Sigma = Sigma0.

The EM loop tries to approximate the (Mu, Sigma) pair by some iterative means under the conditional distribution of missing components.

The E step finds the conditional expectation of the “missing” data, given the observed values and current estimates of the parameters. These expectations are then substituted for the “missing” data.

In the M step, maximum likelihood estimates of the parameters are computed as though the missing data had been filled in.

The X_reconstructed contains the approximation after each iteration.

Args:

maxit: int, default=500
maximum number of imputation rounds to perform.

convergence_thresholdfloat, default=1e-08
Minimum ration difference between iterations before stopping.

random_state: int
Random seed

Paper: “Maximum Likelihood from Incomplete Data via the EM Algorithm”, A. P. Dempster, N. M. Laird and D. B. Rubin

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("EM")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])

change_output(output: str) → None

fit(X: DataFrame, *args: Any, **kwargs: Any) → Plugin

Train the plugin

Parameters:: X – pd.DataFrame

fit_predict(X: DataFrame, *args: Any, **kwargs: Any) → DataFrame: Fit the model and predict the training data. Used by predictors.

fit_transform(X: DataFrame, *args: Any, **kwargs: Any) → DataFrame: Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() → str: The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) → List[Params]: The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) → List[Params]: The hyperparameter domain using they fully-qualified name.

is_fitted() → bool: Check if the model was trained

classmethod load(buff: bytes) → ImputerPlugin: Load the plugin from bytes

static name() → str: The name of the plugin, e.g.: xgboost

predict(X: DataFrame, *args: Any, **kwargs: Any) → DataFrame

Run predictions for the input. Used by predictors.

Parameters:: X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) → Dict[str, Any]: Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) → Dict[str, Any]: Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) → Dict[str, Any]: Sample hyperparameters as a dict.

save() → bytes: Save the plugin to bytes

static subtype() → str: The type of the plugin, e.g.: classifier

transform(X: DataFrame) → DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters:: X – pd.DataFrame

static type() → str: The type of the plugin, e.g.: prediction

plugin: alias of EMPlugin