autoprognosis.plugins.imputers.plugin_EM module
- class EMPlugin(random_state: int = 0, **kwargs: Any)
Bases:
ImputerPluginThe EM algorithm is an optimization algorithm that assumes a distribution for the partially missing data and tries to maximize the expected complete data log-likelihood under that distribution.
- Steps:
For an input dataset X with missing values, we assume that the values are sampled from distribution N(Mu, Sigma).
We generate the “observed” and “missing” masks from X, and choose some initial values for Mu = Mu0 and Sigma = Sigma0.
The EM loop tries to approximate the (Mu, Sigma) pair by some iterative means under the conditional distribution of missing components.
The E step finds the conditional expectation of the “missing” data, given the observed values and current estimates of the parameters. These expectations are then substituted for the “missing” data.
In the M step, maximum likelihood estimates of the parameters are computed as though the missing data had been filled in.
The X_reconstructed contains the approximation after each iteration.
- Args:
- maxit: int, default=500
maximum number of imputation rounds to perform.
- convergence_thresholdfloat, default=1e-08
Minimum ration difference between iterations before stopping.
- random_state: int
Random seed
Paper: “Maximum Likelihood from Incomplete Data via the EM Algorithm”, A. P. Dempster, N. M. Laird and D. B. Rubin
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("EM") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
- change_output(output: str) None
- fit_predict(X: DataFrame, *args: Any, **kwargs: Any) DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: DataFrame, *args: Any, **kwargs: Any) DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: DataFrame, *args: Any, **kwargs: Any) DataFrame
Run predictions for the input. Used by predictors.
- Parameters:
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: DataFrame) DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters:
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction