autoprognosis.plugins.imputers.plugin_EM module

class EMPlugin(random_state: int = 0, **kwargs: Any)

Bases: ImputerPlugin

The EM algorithm is an optimization algorithm that assumes a distribution for the partially missing data and tries to maximize the expected complete data log-likelihood under that distribution.

Steps:
  1. For an input dataset X with missing values, we assume that the values are sampled from distribution N(Mu, Sigma).

  2. We generate the “observed” and “missing” masks from X, and choose some initial values for Mu = Mu0 and Sigma = Sigma0.

  3. The EM loop tries to approximate the (Mu, Sigma) pair by some iterative means under the conditional distribution of missing components.

  4. The E step finds the conditional expectation of the “missing” data, given the observed values and current estimates of the parameters. These expectations are then substituted for the “missing” data.

  5. In the M step, maximum likelihood estimates of the parameters are computed as though the missing data had been filled in.

  6. The X_reconstructed contains the approximation after each iteration.

Args:
maxit: int, default=500

maximum number of imputation rounds to perform.

convergence_thresholdfloat, default=1e-08

Minimum ration difference between iterations before stopping.

random_state: int

Random seed

Paper: “Maximum Likelihood from Incomplete Data via the EM Algorithm”, A. P. Dempster, N. M. Laird and D. B. Rubin

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("EM")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
change_output(output: str) None
fit(X: DataFrame, *args: Any, **kwargs: Any) Plugin

Train the plugin

Parameters:

X – pd.DataFrame

fit_predict(X: DataFrame, *args: Any, **kwargs: Any) DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: DataFrame, *args: Any, **kwargs: Any) DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: DataFrame, *args: Any, **kwargs: Any) DataFrame

Run predictions for the input. Used by predictors.

Parameters:

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: DataFrame) DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters:

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of EMPlugin