AutoPrognosis documentation!

AutoPrognosis - A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

image

🔑 Features

  • 🚀 Automatically learns ensembles of pipelines for classification, regression or survival analysis tasks.

  • 🌀 Easy to extend pluginable architecture.

  • 🔥 Interpretability and uncertainty quantification tools.

  • 🩹 Data imputation using HyperImpute.

  • ⚡ Build demonstrators using Streamlit.

  • 📓 Python and R tutorials available.

🚀 Installation

Using pip

The library can be installed from PyPI using

$ pip install autoprognosis

or from source, using

$ pip install .

Environment variables

The library can be configured from a set of environment variables.

Variable

Description

N_OPT_JOBS

Number of cores to use for hyperparameter search. Default : 1

N_LEARNER_JOBS

Number of cores to use by inidividual learners. Default: all cpus

REDIS_HOST

IP address for the Redis database. Default 127.0.0.1

REDIS_PORT

Redis port. Default: 6379

Example: export N_OPT_JOBS = 2 to use 2 cores for hyperparam search.

💥 Sample Usage

Advanced Python tutorials can be found in the Python tutorials section.

R examples can be found in the R tutorials section.

List the available classifiers

from autoprognosis.plugins.prediction.classifiers import Classifiers
print(Classifiers().list_available())

Create a study for classifiers

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
)
model = study.fit()

# Predict the probabilities of each class using the model
model.predict_proba(X)

(Advanced) Customize the study for classifiers

from pathlib import Path

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

workspace = Path("workspace")
study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=100,  # how many trials to do for each candidate
    timeout=60,  # seconds
    classifiers=["logistic_regression", "lda", "qda"],
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"
model = load_model_from_file(output)

# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_estimator(model, X, Y)

print(f"model {model.name()} -> {metrics['clf']}")

# Train the model
model.fit(X, Y)

# Predict the probabilities of each class using the model
model.predict_proba(X)

List the available regressors

from autoprognosis.plugins.prediction.regression import Regression
print(Regression().list_available())

Create a Regression study

# third party
import pandas as pd

# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
from autoprognosis.studies.regression import RegressionStudy

# Load dataset
df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
    header=None,
    sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])

df = X.copy()
df["target"] = y

# Search the model
study_name="regression_example"
study = RegressionStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
)
model = study.fit()

# Predict using the model
model.predict(X)

Advanced Customize the Regression study

# stdlib
from pathlib import Path

# third party
import pandas as pd

# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
from autoprognosis.studies.regression import RegressionStudy

# Load dataset
df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
    header=None,
    sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])

df = X.copy()
df["target"] = y

# Search the model
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name="regression_example"
study = RegressionStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=10,  # how many trials to do for each candidate. Default: 50
    num_study_iter=2,  # how many outer iterations to do. Default: 5
    timeout=50,  # timeout for optimization for each classfier. Default: 600 seconds
    regressors=["linear_regression", "xgboost_regressor"],
    workspace=workspace,
)

study.run()

# Test the model
output = workspace / study_name / "model.p"

model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.

metrics = evaluate_regression(model, X, y)

print(f"Model {model.name()} score: {metrics['str']}")

# Train the model
model.fit(X, y)


# Predict using the model
model.predict(X)

List available survival analysis estimators

from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation
print(RiskEstimation().list_available())

Create a Survival analysis study

# third party
import numpy as np
from pycox import datasets

# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
)

model = study.fit()

# Predict using the model
model.predict(X, eval_time_horizons)

Advanced Customize the Survival analysis study

# stdlib
import os
from pathlib import Path

# third party
import numpy as np
from pycox import datasets

# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

workspace = Path("workspace")
study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
    num_iter=10,
    num_study_iter=1,
    timeout=10,
    risk_estimators=["cox_ph", "survival_xgboost"],
    score_threshold=0.5,
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"

model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.

metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)

print(f"Model {model.name()} score: {metrics['clf']}")

# Train the model
model.fit(X, T, Y)

# Predict using the model
model.predict(X, eval_time_horizons)

⚡ Plugins

from autoprognosis.plugins.imputers import  Imputers

imputer = Imputers().get(<NAME>)
from autoprognosis.plugins.preprocessors import Preprocessors

preprocessor = Preprocessors().get(<NAME>)

Name

Description

maxabs_scaler

Scale each feature by its maximum absolute value. ``MaxAbsScaler` <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html>`_

scaler

Standardize features by removing the mean and scaling to unit variance. - ``StandardScaler` <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler>`_

feature_normalizer

Normalize samples individually to unit norm. ``Normalizer` <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer>`_

normal_transform

Transform features using quantiles information.``QuantileTransformer` <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer>`_

uniform_transform

Transform features using quantiles information.``QuantileTransformer` <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer>`_

minmax_scaler

Transform features by scaling each feature to a given range.``MinMaxScaler` <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler>`_

from autoprognosis.plugins.prediction.classifiers import Classifiers

classifier = Classifiers().get(<NAME>)

Name

Description

neural_nets

PyTorch based neural net classifier.

logistic_regression

``LogisticRegression` <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_

catboost

Gradient boosting on decision trees - ``CatBoost` <https://catboost.ai/>`_

random_forest

A random forest classifier. ``RandomForestClassifier` <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html>`_

tabnet

``TabNet : Attentive Interpretable Tabular Learning` <https://github.com/dreamquark-ai/tabnet>`_

xgboost

``XGBoostClassifier` <https://xgboost.readthedocs.io/en/stable/>`_

from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation

predictor = RiskEstimation().get(<NAME>)

Name

Description

survival_xgboost

``XGBoost Survival Embeddings` <https://github.com/loft-br/xgboost-survival-embeddings>`_

loglogistic_aft

``Log-Logistic AFT model` <https://lifelines.readthedocs.io/en/latest/fitters/regression/LogLogisticAFTFitter.html>`_

deephit

``DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks` <https://github.com/chl8856/DeepHit>`_

cox_ph

``Cox’s proportional hazard model` <https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html>`_

weibull_aft

``Weibull AFT model.` <https://lifelines.readthedocs.io/en/latest/fitters/regression/WeibullAFTFitter.html>`_

lognormal_aft

``Log-Normal AFT model` <https://lifelines.readthedocs.io/en/latest/fitters/regression/LogNormalAFTFitter.html>`_

coxnet

``CoxNet is a Cox proportional hazards model also referred to as DeepSurv` <https://github.com/havakv/pycox>`_

from autoprognosis.plugins.prediction.regression import Regression

regressor = Regression().get(<NAME>)

Name

Description

tabnet_regressor

``TabNet : Attentive Interpretable Tabular Learning` <https://github.com/dreamquark-ai/tabnet>`_

catboost_regressor

Gradient boosting on decision trees - ``CatBoost` <https://catboost.ai/>`_

random_forest_regressor

``RandomForestRegressor` <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html>`_

xgboost_regressor

``XGBoostClassifier` <https://xgboost.readthedocs.io/en/stable/>`_

neural_nets_regression

PyTorch-based neural net regressor.

linear_regression

``LinearRegression` <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html>`_

from autoprognosis.plugins.explainers import Explainers

explainer = Explainers().get(<NAME>)

Name

Description

risk_effect_size

Feature importance using Cohen’s distance between probabilities

lime

``Lime: Explaining the predictions of any machine learning classifier` <https://github.com/marcotcr/lime>`_

symbolic_pursuit

``Symbolic Pursuit` <Learning outside the black-box: at the pursuit of interpretable models>`_

shap_permutation_sampler

``SHAP Permutation Sampler` <https://shap.readthedocs.io/en/latest/generated/shap.explainers.Permutation.html>`_

kernel_shap

``SHAP KernelExplainer` <https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html>`_

invase

``INVASE: Instance-wise Variable Selection` <https://github.com/vanderschaarlab/invase>`_

from autoprognosis.plugins.uncertainty import UncertaintyQuantification
model = UncertaintyQuantification().get(<NAME>)

🔨 Test

After installing the library, the tests can be executed using pytest

$ pip install .[testing]
$ pytest -vxs -m "not slow"

Citing

If you use this code, please cite the associated paper:

@misc{https://doi.org/10.48550/arxiv.2210.12090,
  doi = {10.48550/ARXIV.2210.12090},
  url = {https://arxiv.org/abs/2210.12090},
  author = {Imrie, Fergus and Cebere, Bogdan and McKinney, Eoin F. and van der Schaar, Mihaela},
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

References

  1. AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning

  2. Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning

  3. Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants

Examples

Tutorials

AutoPrognosis classification

Welcome! This tutorial will walk you through the steps of selecting a model for a classification task using AutoPrognosis.

Setup

[ ]:
# stdlib
import json
import warnings

# third party
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

Import ClassifierStudy

ClassifierStudy is the engine that learns an ensemble of pipelines and their hyperparameters automatically.

[ ]:
# autoprognosis absolute
from autoprognosis.studies.classifiers import ClassifierStudy

Load the target dataset

AutoPrognosis expects pandas.DataFrames as input.

For this example, we will use the Breast Cancer Wisconsin Dataset.

[ ]:
# stdlib
from pathlib import Path

X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

Create the classifier

While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.

You can see the supported plugins below:

[ ]:
# List the available plugins

# autoprognosis absolute
from autoprognosis.plugins import Plugins

print(json.dumps(Plugins().list_available(), indent=2))

We will set a few custom plugins for the pipelines and create the classifier study.

[ ]:
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "classification_example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=2,  # DELETE THIS LINE FOR BETTER RESULTS. how many trials to do for each candidate. Default: 50
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS. how many outer iterations to do. Default: 5
    classifiers=[
        "logistic_regression",
        "lda",
        "qda",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
)

Search for the optimal ensemble

[ ]:
study.run()
[ ]:
# stdlib
import pprint

# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator

output = workspace / study_name / "model.p"

model = load_model_from_file(output)

metrics = evaluate_estimator(model, X, Y)

print(f"Model {model.name()} ")
print("Score: ")

pprint.pprint(metrics)
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file

out = workspace / "tmp.bkp"
# Fit the model
model.fit(X, Y)

# Save
save_to_file(out, model)

# Reload
loaded_model = load_from_file(out)

print(loaded_model.name())

assert loaded_model.name() == model.name()

out.unlink()
Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

Tutorial: Classification AutoML with imputation

Welcome to the classification AutoML tutorial!

This tutorial will show how to use AutoPrognosis to learn a model for datasets with missing data. We show how to use a predefined imputer or how to use AutoPrognosis to select the optimal imputer.

[ ]:
# stdlib
import json
import sys
import warnings

# third party
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

# autoprognosis absolute
import autoprognosis.logger as log
from autoprognosis.studies.classifiers import ClassifierStudy
[ ]:
log.add(sink=sys.stderr, level="INFO")

Load toy dataset

[ ]:
# stdlib
from pathlib import Path


def get_dataset() -> pd.DataFrame:
    Path("data").mkdir(parents=True, exist_ok=True)
    bkp_file = Path("data") / "anneal.csv"

    if bkp_file.exists():
        return pd.read_csv(bkp_file)

    df = pd.read_csv(
        "https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data",
        header=None,
    )
    df.to_csv(bkp_file, index=None)

    return df


df = get_dataset()

df = df.replace("?", np.nan)

X = df.drop(columns=[df.columns[-1]])
y = df[df.columns[-1]]

X
[ ]:
dataset = X.copy()
dataset["target"] = y
[ ]:
for col in X.columns:
    if X[col].isna().sum() == 0:
        continue

    col_type = "categorical" if len(X[col].unique()) < 10 else "cont"
    print(
        f"NaNs ratio in col = {col} col_type = {col_type} miss ratio = {X[col].isna().sum() / len(X[col])}"
    )
[ ]:

[ ]:
# List available classifiers

# autoprognosis absolute
from autoprognosis.plugins.prediction import Classifiers

Classifiers().list_available()

Option 1: Predefined imputer

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "test_classification_studies"

study = ClassifierStudy(
    study_name=study_name,
    dataset=dataset,
    target="target",
    num_iter=10,  # DELETE THIS LINE FOR BETTER RESULTS.
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.
    imputers=["mean"],
    classifiers=["logistic_regression", "lda"],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator

model_path = workspace / study_name / "model.p"

model = load_model_from_file(model_path)

evaluate_estimator(model, X, y)
[ ]:
model.name()

Option 2: Let the optimizer find the optimal imputer

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
study_name = "test_classification_studies_v2"

study = ClassifierStudy(
    study_name=study_name,
    dataset=dataset,
    target="target",
    num_iter=10,  # DELETE THIS LINE FOR BETTER RESULTS.
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.
    classifiers=[
        "logistic_regression",
        "lda",
        "xgboost",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator

model_path = workspace / study_name / "model.p"

model = load_model_from_file(model_path)

evaluate_estimator(model, X, y)
[ ]:
model.name()

Serialization

[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file

out = workspace / "tmp.bkp"
# Fit the model
model.fit(X, y)

# Save
save_to_file(out, model)

# Reload
loaded_model = load_from_file(out)

print(loaded_model.name())

assert loaded_model.name() == model.name()

out.unlink()

Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

AutoPrognosis - Tutorial on using classifiers with explainers

[ ]:
# Install AutoPrognosis
!pip install autoprognosis
[ ]:
# stdlib
import json
import sys
import warnings

# third party
import numpy as np
import pandas as pd

warnings.filterwarnings("ignore")

# autoprognosis absolute
# autoprognosis
import autoprognosis.logger as log
from autoprognosis.studies.classifiers import ClassifierStudy

log.add(sink=sys.stderr, level="INFO")

Load dataset

AutoPrognosis expects pandas.DataFrames as input.

For this example, we will use the Breast Cancer Wisconsin Dataset.

[ ]:
# third party
# Load dataset
from sklearn.datasets import load_breast_cancer

X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

X

Run a study with AutoPrognosis

[ ]:
dataset = X.copy()
dataset["target"] = Y
[ ]:
# List available classifiers

# autoprognosis absolute
from autoprognosis.plugins.prediction import Classifiers

Classifiers().list_available()
[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
study_name = "test_classification_studies"

study = ClassifierStudy(
    study_name=study_name,
    dataset=dataset,
    target="target",
    num_iter=100,  # DELETE THIS LINE FOR BETTER RESULTS.
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.
    imputers=[],  # Dataset is complete, so imputation not necessary
    classifiers=[
        "logistic_regression",
        "perceptron",
        "xgboost",
        "decision_trees",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    feature_scaling=[],
    score_threshold=0.4,
    workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator

model_path = workspace / study_name / "model.p"

model = load_model_from_file(model_path)
[ ]:
model.name()
[ ]:
evaluate_estimator(model, X, Y)

Interpretability

[ ]:
# autoprognosis absolute
from autoprognosis.plugins.explainers import Explainers
[ ]:
# Explain using Kernel SHAP
explainer = Explainers().get(
    "kernel_shap",
    model,
    X,
    Y,
    feature_names=X.columns,
    task_type="classification",
)
explainer.plot(X.sample(frac=0.1))
[ ]:
# Explain using Risk Effect Size
explainer = Explainers().get(
    "risk_effect_size",
    model,
    X,
    Y,
    task_type="classification",
)

explainer.plot(X)

Value of information

[ ]:
def evaluate_for_effect_size(effect_size):
    exp = Explainers().get(
        "risk_effect_size",
        model,
        X,
        Y,
        task_type="classification",
        effect_size=effect_size,
    )

    important_features = exp.explain(X, effect_size).index.tolist()

    return important_features


def evaluate_using_important_feature(effect_size):
    filtered_model = load_model_from_file(model_path)

    important_features = evaluate_for_effect_size(effect_size)
    X_filtered = X[important_features]

    metrics = evaluate_estimator(
        filtered_model,
        X_filtered,
        Y,
    )

    print("\033[1mEvaluation for effect size \033[0m", effect_size)
    print(
        "    >>> \033[1mSelected features for effect size\033[0m ", important_features
    )
    print("    >>> \033[1mSelected features count\033[0m ", len(important_features))
    print("    >>> \033[1mEvaluation:\033[0m ")
    print(f"        >>>> score =  {metrics['str']}")
    print("========================================")
[ ]:
# Evaluate performance for difference feature subsets defined by effect size
for effect_size in [0.5, 1.0, 1.5, 2.0]:
    evaluate_using_important_feature(effect_size)

Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to learn more about machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

Check out our website and paper for AutoPrognosis
Learn more about our lab and other work

AutoPrognosis survival analysis

Welcome! This tutorial will walk you through the steps of selecting a model for a survival analysis task using AutoPrognosis.

Setup

[ ]:
# stdlib
import json
import warnings

# third party
from lifelines.datasets import load_rossi
import pandas as pd
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

Import RiskEstimationStudy

RiskEstimationStudy is the engine that learns an ensemble of survival analysis pipelines and their hyperparameters automatically.

[ ]:
# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy

Load the target dataset

AutoPrognosis expects pandas.DataFrames as input.

For this example, we will use the Rossi dataset.

[ ]:
# third party
from lifelines.datasets import load_rossi

rossi = load_rossi()

X = rossi.drop(["week", "arrest"], axis=1)
Y = rossi["arrest"]
T = rossi["week"]

eval_time_horizons = [
    int(T[Y.iloc[:] == 1].quantile(0.25)),
    int(T[Y.iloc[:] == 1].quantile(0.50)),
    int(T[Y.iloc[:] == 1].quantile(0.75)),
]

Create the risk estimation study

While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.

You can see the supported plugins below:

[ ]:
# stdlib
# List the available plugins
import json
from pathlib import Path

# autoprognosis absolute
from autoprognosis.plugins import Plugins

print(json.dumps(Plugins().list_available(), indent=2))

We will set a few custom plugins for the pipelines and create the classifier study.

[ ]:
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "test_risk_estimation_studies"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=rossi,
    target="arrest",
    time_to_event="week",
    time_horizons=eval_time_horizons,
    num_iter=10,  # DELETE THIS LINE FOR BETTER RESULTS.  number of BO iterations per estimator. Default: 50
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.  number of outer optimization iterations. Default: 5
    risk_estimators=[
        "cox_ph",
        "lognormal_aft",
        "loglogistic_aft",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
    score_threshold=0.4,
)

Search for the best ensemble

[ ]:
study.run()
[ ]:
# stdlib
import pprint

# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

output = workspace / study_name / "model.p"

model = load_model_from_file(output)

metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)

print(f"Model {model.name()}")
print(f"Score: ")

pprint.pprint(metrics)
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file

out = workspace / "tmp.bkp"

# Fit the model
model.fit(X, T, Y)

# Save
save_to_file(out, model)

# Reload
loaded_model = load_from_file(out)

print(loaded_model.name())

assert loaded_model.name() == model.name()

out.unlink()
Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

Tutorial: Survival Analysis AutoML with imputation

Welcome to the Survival analysis AutoML tutorial!

This tutorial will show how to use AutoPrognosis to learn a model for datasets with missing data. We show how to use a predefined imputer or how to use AutoPrognosis to select the optimal imputer.

[ ]:
# stdlib
import sys
import warnings

# third party
import numpy as np
import pandas as pd

warnings.filterwarnings("ignore")

# autoprognosis absolute
import autoprognosis.logger as log
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
[ ]:
log.add(sink=sys.stderr, level="INFO")

Load dataset

[ ]:
# third party
from pycox import datasets

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns=["duration", "event"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = [
    int(T[Y.iloc[:] == 1].quantile(0.50)),
]
[ ]:
# stdlib
import random

total_len = len(X)

for col in ["x3", "x4"]:
    indices = random.sample(range(0, total_len), 10)
    X.loc[indices, col] = np.nan

X.isnull().any()
[ ]:
dataset = X.copy()
dataset["target"] = Y
dataset["time_to_event"] = T

Option 1: Predefined imputer

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
study_name = "test_risk_estimation_studies"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=dataset,
    target="target",
    time_to_event="time_to_event",
    time_horizons=eval_time_horizons,
    num_iter=2,  # DELETE THIS LINE FOR BETTER RESULTS.
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.
    risk_estimators=[
        "cox_ph",
        "lognormal_aft",
        "survival_xgboost",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    imputers=["mean"],
    feature_scaling=["minmax_scaler", "nop"],  # DELETE THIS LINE FOR BETTER RESULTS.
    score_threshold=0.4,
    workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

model_path = workspace / study_name / "model.p"

model = load_model_from_file(model_path)

X_imp = Imputers().get("mean").fit_transform(X)

evaluate_survival_estimator(model, X_imp, T, Y, eval_time_horizons)

Option 2: Let the optimizer find the best imputer

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "test_risk_estimation_studies_v2"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=dataset,
    target="target",
    time_to_event="time_to_event",
    time_horizons=eval_time_horizons,
    num_iter=2,  # DELETE THIS LINE FOR BETTER RESULTS.
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.
    risk_estimators=[
        "cox_ph",
        "lognormal_aft",
        "survival_xgboost",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    imputers=["mean", "ice", "median"],  # DELETE THIS LINE FOR BETTER RESULTS.
    feature_scaling=["minmax_scaler", "nop"],  # DELETE THIS LINE FOR BETTER RESULTS.
    score_threshold=0.4,
    workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

model_path = workspace / study_name / "model.p"

model = load_model_from_file(model_path)

evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)

Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

AutoPrognosis regression

Welcome! This tutorial will walk you through the steps of selecting a model for a regression task using AutoPrognosis.

Setup

[ ]:
# stdlib
import json
import warnings

# third party
import pandas as pd
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

Import RegressionStudy

RegressionStudy is the engine that learns an ensemble of regression pipelines and their hyperparameters automatically.

[ ]:
# autoprognosis absolute
from autoprognosis.studies.regression import RegressionStudy

Load the target dataset

AutoPrognosis expects pandas.DataFrames as input.

For this example, we will use the Airfoil Self-Noise Data Set.

[ ]:
# third party
import pandas as pd

df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
    header=None,
    sep="\\t",
)


last_col = df.columns[-1]

y = df[last_col]
X = df.drop(columns=[last_col])


df = X.copy()
df["target"] = y

df

Create the regressor

While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.

You can see the supported plugins below:

[ ]:
# stdlib
# List the available plugins
import json

# autoprognosis absolute
from autoprognosis.plugins import Plugins

print(json.dumps(Plugins().list_available(), indent=2))

We will set a few custom plugins for the pipelines and create the classifier study.

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "regression_example"

study = RegressionStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=10,  # DELETE THIS LINE FOR BETTER RESULTS.  how many trials to do for each candidate. Default: 50
    num_study_iter=2,  # DELETE THIS LINE FOR BETTER RESULTS.  how many outer iterations to do. Default: 5
    regressors=[
        "linear_regression",
        "xgboost_regressor",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
)

Search for the optimal ensemble

[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression

output = workspace / study_name / "model.p"

model = load_model_from_file(output)

metrics = evaluate_regression(model, X, y)

f"Model {model.name()} score: {metrics['raw']}"
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file

out = workspace / "tmp.bkp"

# Fit the model
model.fit(X, y)

# Save
save_to_file(out, model)

# Reload
loaded_model = load_from_file(out)

print(loaded_model.name())

assert loaded_model.name() == model.name()

out.unlink()
Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

Tutorial: Simulating multiple imputation(MICE) using AutoPrognosis

Welcome to the classification AutoML tutorial!

This tutorial will show how to use AutoPrognosis and multiple imputation to learn a model for datasets with missing data.

[ ]:
# stdlib
import json
import sys
import warnings

# third party
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

# autoprognosis absolute
import autoprognosis.logger as log
from autoprognosis.studies.classifiers import ClassifierStudy
[ ]:
log.add(sink=sys.stderr, level="INFO")

Load toy dataset

[ ]:
# stdlib
from pathlib import Path


def get_dataset() -> pd.DataFrame:
    Path("data").mkdir(parents=True, exist_ok=True)
    bkp_file = Path("data") / "anneal.csv"

    if bkp_file.exists():
        return pd.read_csv(bkp_file)

    df = pd.read_csv(
        "https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data",
        header=None,
    )
    df.to_csv(bkp_file, index=None)

    return df


df = get_dataset()

df = df.replace("?", np.nan)

X = df.drop(columns=[df.columns[-1]])
y = df[df.columns[-1]]

X
[ ]:
dataset = X.copy()
dataset["target"] = y
[ ]:
for col in X.columns:
    if X[col].isna().sum() == 0:
        continue

    col_type = "categorical" if len(X[col].unique()) < 10 else "cont"
    print(
        f"NaNs ratio in col = {col} col_type = {col_type} miss ratio = {X[col].isna().sum() / len(X[col])}"
    )
[ ]:

[ ]:
# List available classifiers

# autoprognosis absolute
from autoprognosis.plugins.prediction import Classifiers

Classifiers().list_available()

Search model with the ICE imputer

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "test_classification_studies_mice"

study = ClassifierStudy(
    study_name=study_name,
    dataset=dataset,
    target="target",
    imputers=[
        "ice"
    ],  # Using chained equations. Can use it for "missforest" or "hyperimpute" plugins as well.
    num_iter=10,  # DELETE THIS LINE FOR BETTER RESULTS.
    num_study_iter=1,  # DELETE THIS LINE FOR BETTER RESULTS.
    classifiers=["logistic_regression", "lda"],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
)
study.run()

Train the model template using multiple random seeds

[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file

model_path = workspace / study_name / "model.p"

model = load_model_from_file(model_path)

model.name()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.distributions import enable_reproducible_results
from autoprognosis.utils.tester import evaluate_estimator_multiple_seeds

score = evaluate_estimator_multiple_seeds(model, X, y, seeds=list(range(5)))
[ ]:
score

Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.

AutoML studies

AutoML studies

autoprognosis.studies.classifiers module

class ClassifierStudy(dataset: pandas.core.frame.DataFrame, target: str, num_iter: int = 20, num_study_iter: int = 5, num_ensemble_iter: int = 15, timeout: int = 360, metric: str = 'aucroc', study_name: Optional[str] = None, feature_scaling: List[str] = ['normal_transform', 'maxabs_scaler', 'feature_normalizer', 'minmax_scaler', 'nop', 'scaler', 'uniform_transform'], feature_selection: List[str] = ['nop', 'pca', 'fast_ica'], classifiers: List[str] = ['random_forest', 'xgboost', 'catboost', 'lgbm', 'logistic_regression'], imputers: List[str] = ['ice'], workspace: pathlib.Path = PosixPath('tmp'), hooks: autoprognosis.hooks.base.Hooks = <autoprognosis.hooks.default.DefaultHooks object>, score_threshold: float = 0.65, group_id: Optional[str] = None, nan_placeholder: Optional[Any] = None, random_state: int = 0, sample_for_search: bool = True, max_search_sample_size: int = 10000, ensemble_size: int = 3, n_folds_cv: int = 5)

Bases: autoprognosis.studies._base.Study

Core logic for classification studies.

A study automatically handles imputation, preprocessing and model selection for a certain dataset. The output is an optimal model architecture, selected by the AutoML logic.

Parameters
  • dataset – DataFrame. The dataset to analyze.

  • target – str. The target column in the dataset.

  • num_iter – int. Maximum Number of optimization trials. This is the limit of trials for each base estimator in the “classifiers” list, used in combination with the “timeout” parameter. For each estimator, the search will end after “num_iter” trials or “timeout” seconds.

  • num_study_iter – int. The number of study iterations. This is the limit for the outer optimization loop. After each outer loop, an intermediary model is cached and can be used by another process, while the outer loop continues to improve the result.

  • timeout – int. Maximum wait time(seconds) for each estimator hyperparameter search. This timeout will apply to each estimator in the “classifiers” list.

  • metric

    str. The metric to use for optimization. Available objective metrics:

    • ”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

    • ”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

    • ”accuracy” : Accuracy classification score.

    • ”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.

    • ”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

    • ”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).

    • ”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

    • ”kappa”, “kappa_quadratic”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.

  • study_name – str. The name of the study, to be used in the caches.

  • feature_scaling

    list. Plugin search pool to use in the pipeline for scaling. Defaults to : [‘maxabs_scaler’, ‘scaler’, ‘feature_normalizer’, ‘normal_transform’, ‘uniform_transform’, ‘nop’, ‘minmax_scaler’] Available plugins, retrieved using Preprocessors(category=”feature_scaling”).list_available():

    • ’maxabs_scaler’

    • ’scaler’

    • ’feature_normalizer’

    • ’normal_transform’

    • ’uniform_transform’

    • ’nop’ # empty operation

    • ’minmax_scaler’

  • feature_selection

    list. Plugin search pool to use in the pipeline for feature selection. Defaults [“nop”, “variance_threshold”, “pca”, “fast_ica”] Available plugins, retrieved using Preprocessors(category=”dimensionality_reduction”).list_available():

    • ’feature_agglomeration’

    • ’fast_ica’

    • ’variance_threshold’

    • ’gauss_projection’

    • ’pca’

    • ’nop’ # no operation

  • classifiers

    list. Plugin search pool to use in the pipeline for prediction. Defaults to [“random_forest”, “xgboost”, “logistic_regression”, “catboost”]. Available plugins, retrieved using Classifiers().list_available():

    • ’adaboost’

    • ’bernoulli_naive_bayes’

    • ’neural_nets’

    • ’linear_svm’

    • ’qda’

    • ’decision_trees’

    • ’logistic_regression’

    • ’hist_gradient_boosting’

    • ’extra_tree_classifier’

    • ’bagging’

    • ’gradient_boosting’

    • ’ridge_classifier’

    • ’gaussian_process’

    • ’perceptron’

    • ’lgbm’

    • ’catboost’

    • ’random_forest’

    • ’tabnet’

    • ’multinomial_naive_bayes’

    • ’lda’

    • ’gaussian_naive_bayes’

    • ’knn’

    • ’xgboost’

  • imputers

    list. Plugin search pool to use in the pipeline for imputation. Defaults to [“mean”, “ice”, “missforest”, “hyperimpute”]. Available plugins, retrieved using Imputers().list_available():

    • ’sinkhorn’

    • ’EM’

    • ’mice’

    • ’ice’

    • ’hyperimpute’

    • ’most_frequent’

    • ’median’

    • ’missforest’

    • ’softimpute’

    • ’nop’

    • ’mean’

    • ’gain’

  • hooks – Hooks. Custom callbacks to be notified about the search progress.

  • workspace – Path. Where to store the output model.

  • score_threshold – float. The minimum metric score for a candidate.

  • id – str. The id column in the dataset.

  • random_state – int Random seed

  • sample_for_search – bool Subsample the evaluation dataset in the search pipeline. Improves the speed of the search.

  • max_search_sample_size – int Subsample size for the evaluation dataset, if sample is True.

  • n_folds_cv – int. Number of cross-validation folds to use for study evaluation

  • ensemble_size – int Maximum number of models to include in the ensemble

Example

>>> from sklearn.datasets import load_breast_cancer
>>>
>>> from autoprognosis.studies.classifiers import ClassifierStudy
>>> from autoprognosis.utils.serialization import load_model_from_file
>>> from autoprognosis.utils.tester import evaluate_estimator
>>>
>>> X, Y = load_breast_cancer(return_X_y=True, as_frame=True)
>>>
>>> df = X.copy()
>>> df["target"] = Y
>>>
>>> study_name = "example"
>>>
>>> study = ClassifierStudy(
>>>     study_name=study_name,
>>>     dataset=df,  # pandas DataFrame
>>>     target="target",  # the label column in the dataset
>>> )
>>> model = study.fit()
>>>
>>> # Predict the probabilities of each class using the model
>>> model.predict_proba(X)
fit() Any

Run the study and train the model. The call returns the fitted model.

run() Any

Run the study. The call returns the optimal model architecture - not fitted.

autoprognosis.studies.regression module

class RegressionStudy(dataset: pandas.core.frame.DataFrame, target: str, num_iter: int = 20, num_study_iter: int = 5, num_ensemble_iter: int = 15, timeout: int = 360, metric: str = 'r2', study_name: Optional[str] = None, feature_scaling: List[str] = ['normal_transform', 'maxabs_scaler', 'feature_normalizer', 'minmax_scaler', 'nop', 'scaler', 'uniform_transform'], feature_selection: List[str] = ['nop', 'pca', 'fast_ica'], regressors: List[str] = ['random_forest_regressor', 'xgboost_regressor', 'linear_regression', 'catboost_regressor'], imputers: List[str] = ['ice'], workspace: pathlib.Path = PosixPath('tmp'), hooks: autoprognosis.hooks.base.Hooks = <autoprognosis.hooks.default.DefaultHooks object>, score_threshold: float = 0.65, nan_placeholder: Optional[Any] = None, group_id: Optional[str] = None, random_state: int = 0, sample_for_search: bool = True, max_search_sample_size: int = 10000, ensemble_size: int = 3, n_folds_cv: int = 5)

Bases: autoprognosis.studies._base.Study

Core logic for regression studies.

A study automatically handles imputation, preprocessing and model selection for a certain dataset. The output is an optimal model architecture, selected by the AutoML logic.

Parameters
  • dataset – DataFrame. The dataset to analyze.

  • target – str. The target column in the dataset.

  • num_iter – int. Maximum Number of optimization trials. This is the limit of trials for each base estimator in the “regressors” list, used in combination with the “timeout” parameter. For each estimator, the search will end after “num_iter” trials or “timeout” seconds.

  • num_study_iter – int. The number of study iterations. This is the limit for the outer optimization loop. After each outer loop, an intermediary model is cached and can be used by another process, while the outer loop continues to improve the result.

  • timeout – int. Maximum wait time(seconds) for each estimator hyperparameter search. This timeout will apply to each estimator in the “regressors” list.

  • metric

    str. The metric to use for optimization. Available metric:

    • ”r2”

  • study_name – str. The name of the study, to be used in the caches.

  • feature_scaling

    list. Plugin search pool to use in the pipeline for scaling. Defaults to : [‘maxabs_scaler’, ‘scaler’, ‘feature_normalizer’, ‘normal_transform’, ‘uniform_transform’, ‘nop’, ‘minmax_scaler’] Available plugins, retrieved using Preprocessors(category=”feature_scaling”).list_available():

    • ’maxabs_scaler’

    • ’scaler’

    • ’feature_normalizer’

    • ’normal_transform’

    • ’uniform_transform’

    • ’nop’ # empty operation

    • ’minmax_scaler’

  • feature_selection

    list. Plugin search pool to use in the pipeline for feature selection. Defaults [“nop”, “variance_threshold”, “pca”, “fast_ica”] Available plugins, retrieved using Preprocessors(category=”dimensionality_reduction”).list_available():

    • ’feature_agglomeration’

    • ’fast_ica’

    • ’variance_threshold’

    • ’gauss_projection’

    • ’pca’

    • ’nop’ # no operation

  • imputers

    list. Plugin search pool to use in the pipeline for imputation. Defaults to [“mean”, “ice”, “missforest”, “hyperimpute”]. Available plugins, retrieved using Imputers().list_available():

    • ’sinkhorn’

    • ’EM’

    • ’mice’

    • ’ice’

    • ’hyperimpute’

    • ’most_frequent’

    • ’median’

    • ’missforest’

    • ’softimpute’

    • ’nop’

    • ’mean’

    • ’gain’

  • regressors

    list. Plugin search pool to use in the pipeline for prediction. Defaults to [“random_forest_regressor”,”xgboost_regressor”, “linear_regression”, “catboost_regressor”] Available plugins, retrieved using Regression().list_available():

    • ’kneighbors_regressor’

    • ’bayesian_ridge’

    • ’tabnet_regressor’

    • ’catboost_regressor’

    • ’random_forest_regressor’

    • ’mlp_regressor’

    • ’xgboost_regressor’

    • ’neural_nets_regression’

    • ’linear_regression’

  • hooks – Hooks. Custom callbacks to be notified about the search progress.

  • workspace – Path. Where to store the output model.

  • score_threshold – float. The minimum metric score for a candidate.

  • id – str. The id column in the dataset.

  • random_state – int Random seed

  • sample_for_search – bool Subsample the evaluation dataset in the search pipeline. Improves the speed of the search.

  • max_search_sample_size – int Subsample size for the evaluation dataset, if sample is True.

Example

>>> import pandas as pd
>>> from autoprognosis.utils.serialization import load_model_from_file
>>> from autoprognosis.utils.tester import evaluate_regression
>>> from autoprognosis.studies.regression import RegressionStudy
>>>
>>> # Load dataset
>>> df = pd.read_csv(
>>>     "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
>>>     header=None,
>>>     sep="\t",
>>> )
>>> last_col = df.columns[-1]
>>> y = df[last_col]
>>> X = df.drop(columns=[last_col])
>>>
>>> df = X.copy()
>>> df["target"] = y
>>>
>>> # Search the model
>>>
>>> study_name="regression_example"
>>> study = RegressionStudy(
>>>     study_name=study_name,
>>>     dataset=df,  # pandas DataFrame
>>>     target="target",  # the label column in the dataset
>>> )
>>> model = study.fit()
>>>
>>> # Predict using the model
>>> model.predict(X)
fit() Any

Run the study and train the model. The call returns the fitted model.

run() Any

Run the study. The call returns the optimal model architecture - not fitted.

autoprognosis.studies.risk_estimation module

class RiskEstimationStudy(dataset: pandas.core.frame.DataFrame, target: str, time_to_event: str, time_horizons: List[int], num_iter: int = 20, num_study_iter: int = 5, num_ensemble_iter: int = 15, timeout: int = 360, study_name: Optional[str] = None, workspace: pathlib.Path = PosixPath('tmp'), risk_estimators: List[str] = ['survival_xgboost', 'loglogistic_aft', 'deephit', 'cox_ph', 'weibull_aft', 'lognormal_aft', 'coxnet'], imputers: List[str] = ['ice'], feature_scaling: List[str] = ['normal_transform', 'maxabs_scaler', 'feature_normalizer', 'minmax_scaler', 'nop', 'scaler', 'uniform_transform'], feature_selection: List[str] = ['nop', 'pca', 'fast_ica'], hooks: autoprognosis.hooks.base.Hooks = <autoprognosis.hooks.default.DefaultHooks object>, score_threshold: float = 0.65, nan_placeholder: Optional[Any] = None, group_id: Optional[str] = None, random_state: int = 0, sample_for_search: bool = True, max_search_sample_size: int = 10000, ensemble_size: int = 3, n_folds_cv: int = 5)

Bases: autoprognosis.studies._base.Study

Core logic for risk estimation studies.

A study automatically handles imputation, preprocessing and model selection for a certain dataset. The output is an optimal model architecture, selected by the AutoML logic.

Parameters
  • dataset – DataFrame. The dataset to analyze.

  • target – str. The target column in the dataset.

  • time_to_event – str. The time_to_event column in the dataset.

  • num_iter – int. Maximum Number of optimization trials. This is the limit of trials for each base estimator in the “risk_estimators” list, used in combination with the “timeout” parameter. For each estimator, the search will end after “num_iter” trials or “timeout” seconds.

  • num_study_iter – int. The number of study iterations. This is the limit for the outer optimization loop. After each outer loop, an intermediary model is cached and can be used by another process, while the outer loop continues to improve the result.

  • timeout – int. Maximum wait time(seconds) for each estimator hyperparameter search. This timeout will apply to each estimator in the “risk_estimators” list.

  • study_name – str. The name of the study, to be used in the caches.

  • feature_scaling

    list. Plugin search pool to use in the pipeline for scaling. Defaults to : [‘maxabs_scaler’, ‘scaler’, ‘feature_normalizer’, ‘normal_transform’, ‘uniform_transform’, ‘nop’, ‘minmax_scaler’] Available plugins, retrieved using Preprocessors(category=”feature_scaling”).list_available():

    • ’maxabs_scaler’

    • ’scaler’

    • ’feature_normalizer’

    • ’normal_transform’

    • ’uniform_transform’

    • ’nop’ # empty operation

    • ’minmax_scaler’

  • feature_selection

    list. Plugin search pool to use in the pipeline for feature selection. Defaults [“nop”, “variance_threshold”, “pca”, “fast_ica”] Available plugins, retrieved using Preprocessors(category=”dimensionality_reduction”).list_available():

    • ’feature_agglomeration’

    • ’fast_ica’

    • ’variance_threshold’

    • ’gauss_projection’

    • ’pca’

    • ’nop’ # no operation

  • imputers

    list. Plugin search pool to use in the pipeline for imputation. Defaults to [“mean”, “ice”, “missforest”, “hyperimpute”]. Available plugins, retrieved using Imputers().list_available():

    • ’sinkhorn’

    • ’EM’

    • ’mice’

    • ’ice’

    • ’hyperimpute’

    • ’most_frequent’

    • ’median’

    • ’missforest’

    • ’softimpute’

    • ’nop’

    • ’mean’

    • ’gain’

  • risk_estimators

    list. Plugin search pool to use in the pipeline for risk estimation. Defaults to [“survival_xgboost”, “loglogistic_aft”, “deephit”, “cox_ph”, “weibull_aft”, “lognormal_aft”, “coxnet”] Available plugins:

    • ’survival_xgboost’

    • ’loglogistic_aft’

    • ’deephit’

    • ’cox_ph’

    • ’weibull_aft’

    • ’lognormal_aft’

    • ’coxnet’

  • hooks – Hooks. Custom callbacks to be notified about the search progress.

  • workspace – Path. Where to store the output model.

  • score_threshold – float. The minimum metric score for a candidate.

  • random_state – int Random seed

  • sample_for_search – bool Subsample the evaluation dataset in the search pipeline. Improves the speed of the search.

  • max_search_sample_size – int Subsample size for the evaluation dataset, if sample is True.

Example

>>> import numpy as np
>>> from pycox import datasets
>>> from autoprognosis.studies.risk_estimation import RiskEstimationStudy
>>> from autoprognosis.utils.serialization import load_model_from_file
>>> from autoprognosis.utils.tester import evaluate_survival_estimator
>>>
>>> df = datasets.gbsg.read_df()
>>> df = df[df["duration"] > 0]
>>>
>>> X = df.drop(columns = ["duration"])
>>> T = df["duration"]
>>> Y = df["event"]
>>>
>>> eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]
>>>
>>> study_name = "example_risks"
>>> study = RiskEstimationStudy(
>>>     study_name=study_name,
>>>     dataset=df,
>>>     target="event",
>>>     time_to_event="duration",
>>>     time_horizons=eval_time_horizons,
>>> )
>>>
>>> model = study.fit()
>>> # Predict using the model
>>> model.predict(X, eval_time_horizons)
fit() Any

Run the study and train the model. The call returns the fitted model.

run() Any

Run the study. The call returns the optimal model architecture - not fitted.

Imputation plugins

Imputation plugins

autoprognosis.plugins.imputers.plugin_hyperimpute module

class HyperImputePlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

“HyperImpute strategy, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters.

Parameters
  • classifier_seed – list. List of ClassifierPlugin names for the search pool.

  • regression_seed – list. List of RegressionPlugin names for the search pool.

  • imputation_order – int. 0 - ascending, 1 - descending, 2 - random

  • baseline_imputer – int. 0 - mean, 1 - median, 2- most_frequent

  • optimizer – str. Hyperparam search strategy. Options: simple, hyperband, bayesian

  • class_threshold – int. Maximum number of unique items in a categorical column.

  • optimize_thresh – int. The number of subsamples used for the model search.

  • n_inner_iter – int. number of imputation iterations.

  • select_model_by_column – bool. If False, reuse the first model selected in the current iteration for all columns. Else, search the model for each column.

  • select_model_by_iteration – bool. If False, reuse the models selected in the first iteration. Otherwise, refresh the models on each iteration.

  • select_lazy – bool. If True, if there is a trend towards a certain model architecture, the loop reuses than for all columns, instead of calling the optimizer.

  • inner_loop_hook – Callable. Debug hook, called before each iteration.

  • random_state – int. random seed.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("hyperimpute")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])

Reference: “HyperImpute: Generalized Iterative Imputation with Automatic Model Selection”

change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_hyperimpute.HyperImputePlugin

autoprognosis.plugins.imputers.plugin_EM module

class EMPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

The EM algorithm is an optimization algorithm that assumes a distribution for the partially missing data and tries to maximize the expected complete data log-likelihood under that distribution.

Steps:
  1. For an input dataset X with missing values, we assume that the values are sampled from distribution N(Mu, Sigma).

  2. We generate the “observed” and “missing” masks from X, and choose some initial values for Mu = Mu0 and Sigma = Sigma0.

  3. The EM loop tries to approximate the (Mu, Sigma) pair by some iterative means under the conditional distribution of missing components.

  4. The E step finds the conditional expectation of the “missing” data, given the observed values and current estimates of the parameters. These expectations are then substituted for the “missing” data.

  5. In the M step, maximum likelihood estimates of the parameters are computed as though the missing data had been filled in.

  6. The X_reconstructed contains the approximation after each iteration.

Args:
maxit: int, default=500

maximum number of imputation rounds to perform.

convergence_thresholdfloat, default=1e-08

Minimum ration difference between iterations before stopping.

random_state: int

Random seed

Paper: “Maximum Likelihood from Incomplete Data via the EM Algorithm”, A. P. Dempster, N. M. Laird and D. B. Rubin

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("EM")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_EM.EMPlugin

autoprognosis.plugins.imputers.plugin_gain module

class GainPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

GAIN Imputation for static data using Generative Adversarial Nets.
The training steps are:
  • The generato imputes the missing components conditioned on what is actually observed, and outputs a completed vector.

  • The discriminator takes a completed vector and attempts to determine which components were actually observed and which were imputed.

Args:

batch_size: int

The batch size for the training steps.

n_epochs: int

Number of epochs for training.

hint_rate: float

Percentage of additional information for the discriminator.

loss_alpha: int

Hyperparameter for the generator loss.

Paper: J. Yoon, J. Jordon, M. van der Schaar, “GAIN: Missing Data Imputation using Generative Adversarial Nets, ” ICML, 2018. Original code: https://github.com/jsyoon0823/GAIN

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("gain")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_gain.GainPlugin

autoprognosis.plugins.imputers.plugin_ice module

class IterativeChainedEquationsPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

Imputation plugin for completing missing values using the Multivariate Iterative chained equations Imputation strategy.

Method:

Multivariate Iterative chained equations(MICE) methods model each feature with missing values as a function of other features in a round-robin fashion. For each step of the round-robin imputation, we use a BayesianRidge estimator, which does a regularized linear regression.

Parameters
  • max_iter – int, default=500 maximum number of imputation rounds to perform.

  • random_state – int, default set to the current time. seed of the pseudo random number generator to use.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("ice")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
          0         1         2         3
0  1.000000  1.000000  1.000000  1.000000
1  1.333333  1.666667  1.666667  1.333333
2  1.000000  2.000000  2.000000  1.000000
3  2.000000  2.000000  2.000000  2.000000

Reference: “mice: Multivariate Imputation by Chained Equations in R”, Stef van Buuren, Karin Groothuis-Oudshoorn

change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_ice.IterativeChainedEquationsPlugin

autoprognosis.plugins.imputers.plugin_mice module

class MicePlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

Imputation plugin for completing missing values using the Multivariate Iterative chained equations and multiple imputations.

Method:

Multivariate Iterative chained equations(MICE) methods model each feature with missing values as a function of other features in a round-robin fashion. For each step of the round-robin imputation, we use a BayesianRidge estimator, which does a regularized linear regression. The class sklearn.impute.IterativeImputer is able to generate multiple imputations of the same incomplete dataset. We can then learn a regression or classification model on different imputations of the same dataset. Setting sample_posterior=True for the IterativeImputer will randomly draw values to fill each missing value from the Gaussian posterior of the predictions. If each IterativeImputer uses a different random_state, this results in multiple imputations, each of which can be used to train a predictive model. The final result is the average of all the n_imputation estimates.

Parameters
  • n_imputations – int, default=5i number of multiple imputations to perform.

  • max_iter – int, default=500 maximum number of imputation rounds to perform.

  • random_state – int, default set to the current time. seed of the pseudo random number generator to use.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("mice")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
          0        1         2         3
0  1.000000  1.00000  1.000000  1.000000
1  1.222412  1.68686  1.687483  1.221473
2  1.000000  2.00000  2.000000  1.000000
3  2.000000  2.00000  2.000000  2.000000
change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_mice.MicePlugin

autoprognosis.plugins.imputers.plugin_missforest module

class MissForestPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

Imputation plugin for completing missing values using the MissForest strategy.

Method:

Iterative chained equations(ICE) methods model each feature with missing values as a function of other features in a round-robin fashion. For each step of the round-robin imputation, we use a ExtraTreesRegressor, which fits a number of randomized extra-trees and averages the results.

Parameters
  • n_estimators – int, default=10 The number of trees in the forest.

  • max_iter – int, default=500 maximum number of imputation rounds to perform.

  • random_state – int, default set to the current time. seed of the pseudo random number generator to use.

AutoPrognosis Hyperparameters:

n_estimators: The number of trees in the forest.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("missforest")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
     0    1    2    3
0  1.0  1.0  1.0  1.0
1  1.0  1.9  1.9  1.0
2  1.0  2.0  2.0  1.0
3  2.0  2.0  2.0  2.0
change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_missforest.MissForestPlugin

autoprognosis.plugins.imputers.plugin_sinkhorn module

class SinkhornPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

Sinkhorn imputation can be used to impute quantitative data and it relies on the idea that two batches extracted randomly from the same dataset should share the same distribution and consists in minimizing optimal transport distances between batches.

Args:
eps: float, default=0.01

Sinkhorn regularization parameter.

lrfloat, default = 0.01

Learning rate.

opt: torch.nn.optim.Optimizer, default=torch.optim.Adam

Optimizer class to use for fitting.

n_epochsint, default=15

Number of gradient updates for each model within a cycle.

batch_sizeint, defatul=256

Size of the batches on which the sinkhorn divergence is evaluated.

n_pairsint, default=10

Number of batch pairs used per gradient update.

noisefloat, default = 0.1

Noise used for the missing values initialization.

scaling: float, default=0.9

Scaling parameter in Sinkhorn iterations

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("sinkhorn")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
          0         1         2         3
0  1.000000  1.000000  1.000000  1.000000
1  1.404637  1.651113  1.651093  1.404638
2  1.000000  2.000000  2.000000  1.000000
3  2.000000  2.000000  2.000000  2.000000
Reference: “Missing Data Imputation using Optimal Transport”, Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

Original code: https://github.com/BorisMuzellec/MissingDataOT

change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_sinkhorn.SinkhornPlugin

autoprognosis.plugins.imputers.plugin_softimpute module

class SoftImputePlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

The SoftImpute algorithm fits a low-rank matrix approximation to a matrix with missing values via nuclear- norm regularization. The algorithm can be used to impute quantitative data.

To calibrate the the nuclear-norm regularization parameter(shrink_lambda), we perform cross- validation(_cv_softimpute)

Args:
maxit: int, default=500

maximum number of imputation rounds to perform.

convergence_thresholdfloat, default=1e-5

Minimum ration difference between iterations before stopping.

max_rankint, default=2

Perform a truncated SVD on each iteration with this value as its rank.

shrink_lambda: float, default=0

Value by which we shrink singular values on each iteration. If it’s missing, it is calibrated using cross validation.

cv_len: int, default=15

the length of the grid on which the cross-validation is performed.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("softimpute")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
              0             1             2             3
0  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00
1  3.820605e-16  1.708249e-16  1.708249e-16  3.820605e-16
2  1.000000e+00  2.000000e+00  2.000000e+00  1.000000e+00
3  2.000000e+00  2.000000e+00  2.000000e+00  2.000000e+00

Reference: “Spectral Regularization Algorithms for Learning Large Incomplete Matrices”, by Mazumder, Hastie, and Tibshirani.

change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_softimpute.SoftImputePlugin

autoprognosis.plugins.imputers.plugin_mean module

class MeanPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

Imputation plugin for completing missing values using the Mean Imputation strategy.

Method:

The Mean Imputation strategy replaces the missing values using the mean along each column.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("mean")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
          0         1         2         3
0  1.000000  1.000000  1.000000  1.000000
1  1.333333  1.666667  1.666667  1.333333
2  1.000000  2.000000  2.000000  1.000000
3  2.000000  2.000000  2.000000  2.000000
change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_mean.MeanPlugin

autoprognosis.plugins.imputers.plugin_median module

class MedianPlugin(random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.imputers.base.ImputerPlugin

Imputation plugin for completing missing values using the Median Imputation strategy.

Method:

The Median Imputation strategy replaces the missing values using the median along each column.

Example

>>> import numpy as np
>>> from autoprognosis.plugins.imputers import Imputers
>>> plugin = Imputers().get("median")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
     0    1    2    3
0  1.0  1.0  1.0  1.0
1  1.0  2.0  2.0  1.0
2  1.0  2.0  2.0  1.0
3  2.0  2.0  2.0  2.0
change_output(output: str) None
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.imputers.plugin_median.MedianPlugin

Preprocessing plugins

Preprocessing plugins

autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_data_cleanup module

autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_fast_ica module

class FastICAPlugin(model: Optional[Any] = None, random_state: int = 0, n_components: int = 2, max_iter=10000)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for dimensionality reduction based on Independent Component Analysis algorithm.

Method:

Independent component analysis separates a multivariate signal into additive subcomponents that are maximally independent.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html

Parameters

n_components – int Number of components to use.

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors(category="dimensionality_reduction").get("fast_ica")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_fast_ica.FastICAPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_fast_ica.FastICAPlugin

autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_feature_agglomeration module

class FeatureAgglomerationPlugin(model: Optional[Any] = None, random_state: int = 0, n_clusters: int = 2)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for dimensionality reduction based on Feature Agglomeration algorithm.

Method:

FeatureAgglomeration uses agglomerative clustering to group together features that look very similar, thus decreasing the number of features.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.FeatureAgglomeration.html

Parameters

n_clusters – int Number of clusters to find.

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors(category="dimensionality_reduction").get("feature_agglomeration")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_feature_agglomeration.FeatureAgglomerationPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_feature_agglomeration.FeatureAgglomerationPlugin

autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_gauss_projection module

class GaussianRandomProjectionPlugin(random_state: int = 0, model: Optional[Any] = None, n_components: int = 2)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for dimensionality reduction based on Gaussian random projection algorithm.

Method:

The Gaussian random projection reduces the dimensionality by projecting the original input space on a randomly generated matrix where components are drawn from N(0, 1 / n_components).

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.random_projection.GaussianRandomProjection.html

Parameters

n_components – int Number of components to use.

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors(category="dimensionality_reduction").get("gauss_projection")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_gauss_projection.GaussianRandomProjectionPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_gauss_projection.GaussianRandomProjectionPlugin

autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_pca module

class PCAPlugin(random_state: int = 0, model: Optional[Any] = None, n_components: int = 2)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for dimensionality reduction based on the PCA method.

Method:

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Parameters

n_components – int Number of components to use.

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors(category="dimensionality_reduction").get("pca")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_pca.PCAPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_pca.PCAPlugin

autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_variance_threshold module

class VarianceThresholdPlugin(random_state: int = 0, model: Optional[Any] = None, threshold: float = 0.001)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for dimensionality reduction based on removing features with low variance.

Method:

VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html

Parameters

threshold – float Features with a training-set variance lower than this threshold will be removed.

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors(category="dimensionality_reduction").get("variance_threshold", threshold=1.0)
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_variance_threshold.VarianceThresholdPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_variance_threshold.VarianceThresholdPlugin

autoprognosis.plugins.preprocessors.feature_scaling.plugin_feature_normalizer module

class FeatureNormalizerPlugin(random_state: int = 0, model: Optional[Any] = None)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for sample normalization based on L2 normalization.

Method:

Normalization is the process of scaling individual samples to have unit norm.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors().get("feature_normalizer")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_feature_normalizer.FeatureNormalizerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.feature_scaling.plugin_feature_normalizer.FeatureNormalizerPlugin

autoprognosis.plugins.preprocessors.feature_scaling.plugin_maxabs_scaler module

class MaxAbsScalerPlugin(random_state: int = 0, model: Optional[Any] = None)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for feature scaling based on maximum absolute value.

Method:

The MaxAbs estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors().get("maxabs_scaler")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_maxabs_scaler.MaxAbsScalerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.feature_scaling.plugin_maxabs_scaler.MaxAbsScalerPlugin

autoprognosis.plugins.preprocessors.feature_scaling.plugin_minmax_scaler module

class MinMaxScalerPlugin(random_state: int = 0, model: Optional[Any] = None)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for feature scaling to a given range.

Method:

The MinMax estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors().get("minmax_scaler")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_minmax_scaler.MinMaxScalerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.feature_scaling.plugin_minmax_scaler.MinMaxScalerPlugin

autoprognosis.plugins.preprocessors.feature_scaling.plugin_scaler module

class ScalerPlugin(random_state: int = 0, model: Optional[Any] = None)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for feature scaling based on StandardScaler implementation.

Method:

The Scaler plugin standardizes the features by removing the mean and scaling to unit variance.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors().get("scaler")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_scaler.ScalerPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.feature_scaling.plugin_scaler.ScalerPlugin

autoprognosis.plugins.preprocessors.feature_scaling.plugin_normal_transform module

class NormalTransformPlugin(random_state: int = 0, n_quantiles: int = 100, model: Optional[Any] = None)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for feature scaling based on quantile information.

Method:

This method transforms the features to follow a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors().get("normal_transform")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
            0         1         2         3
0   -0.701131  1.061219 -1.205040 -1.138208
1   -1.154434 -0.084214 -1.205040 -1.138208
2   -1.523968  0.443066 -1.674870 -1.138208
3   -1.710095  0.229099 -0.836836 -1.138208
4   -0.923581  1.222611 -1.205040 -1.138208
..        ...       ...       ...       ...
145  1.017901 -0.084214  0.778555  1.523968
146  0.509020 -1.297001  0.547708  0.813193
147  0.778555 -0.084214  0.778555  0.949666
148  0.378986  0.824957  0.869109  1.523968
149  0.109568 -0.084214  0.669219  0.627699

[150 rows x 4 columns]

change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_normal_transform.NormalTransformPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.feature_scaling.plugin_normal_transform.NormalTransformPlugin

autoprognosis.plugins.preprocessors.feature_scaling.plugin_uniform_transform module

class UniformTransformPlugin(random_state: int = 0, n_quantiles: int = 100, model: Optional[Any] = None)

Bases: autoprognosis.plugins.preprocessors.base.PreprocessorPlugin

Preprocessing plugin for feature scaling based on quantile information.

Method:

This method transforms the features to follow a uniform distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values.

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html

Example

>>> from autoprognosis.plugins.preprocessors import Preprocessors
>>> plugin = Preprocessors().get("uniform_transform")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_transform(X, y)
change_output(output: str) None
static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_uniform_transform.UniformTransformPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.preprocessors.feature_scaling.plugin_uniform_transform.UniformTransformPlugin

Prediction plugins

Prediction plugins

Classifiers

autoprognosis.plugins.prediction.classifiers.plugin_adaboost module

class AdaBoostPlugin(estimator: int = 0, n_estimators: int = 10, learning_rate: float = 0.1, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the AdaBoost estimator.

Method:

An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

Parameters
  • estimator – int Base Learner to use. 0: HistGradientBoostingClassifier, 1: CatBoostClassifier, 2: LGBM, 3: LogisticRegression

  • n_estimators – int The maximum number of estimators at which boosting is terminated.

  • learning_rate – float Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between the learning_rate and n_estimators parameters.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("adaboost")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
base_estimators = [sklearn.ensemble.HistGradientBoostingClassifier, catboost.CatBoostClassifier, sklearn.base.ClassifierMixin, sklearn.linear_model.LogisticRegression]
calibrations = ['none', 'sigmoid', 'isotonic']
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_adaboost.AdaBoostPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_adaboost.AdaBoostPlugin

autoprognosis.plugins.prediction.classifiers.plugin_bagging module

class BaggingPlugin(n_estimators: int = 10, max_samples: float = 1.0, max_features: float = 1.0, estimator: int = 0, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Bagging estimator.

Method:

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.

Parameters
  • n_estimators – int The number of base estimators in the ensemble.

  • max_samples – float The number of samples to draw from X to train each base estimator.

  • max_features – float The number of features to draw from X to train each base estimator.

  • estimator – int Base estimator to use. 0: HistGradientBoostingClassifier, 1: CatBoostClassifier, 2: LGBM, 3: LogisticRegression.

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("bagging")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
base_estimators = [sklearn.ensemble.HistGradientBoostingClassifier, catboost.CatBoostClassifier, sklearn.base.ClassifierMixin, sklearn.linear_model.LogisticRegression]
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_bagging.BaggingPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_bagging.BaggingPlugin

autoprognosis.plugins.prediction.classifiers.plugin_catboost module

class CatBoostPlugin(n_estimators: int = 100, depth: int = 5, grow_policy: int = 0, l2_leaf_reg: float = 3, learning_rate: float = 0.001, min_data_in_leaf: int = 1, random_strength: float = 1, random_state: int = 0, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the CatBoost framework.

Method:

CatBoost provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. It uses Ordered Boosting to overcome over fitting and Symmetric Trees for faster execution.

Parameters
  • n_estimators – int Number of gradient boosted trees. Equivalent to number of boosting rounds.

  • depth – int Depth of the tree.

  • grow_policy – int The tree growing policy. Defines how to perform greedy tree construction: [SymmetricTree, Depthwise]

  • l2_leaf_reg – float Coefficient at the L2 regularization term of the cost function.

  • learning_rate – float The learning rate used for reducing the gradient step.

  • min_data_in_leaf – int The minimum number of training samples in a leaf.

  • random_strength – float The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("catboost")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
grow_policies = ['Depthwise', 'SymmetricTree', 'Lossguide']
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_catboost.CatBoostPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_catboost.CatBoostPlugin

autoprognosis.plugins.prediction.classifiers.plugin_decision_trees module

class DecisionTreePlugin(criterion: int = 0, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Decision trees.

Method:

Decision Trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.

Parameters
  • criterion – int The function to measure the quality of a split. Supported criteria are “gini”(0) for the Gini impurity and “entropy”(1) for the information gain.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("decision_trees")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
criterions = ['gini', 'entropy']
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_decision_trees.DecisionTreePlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_decision_trees.DecisionTreePlugin

autoprognosis.plugins.prediction.classifiers.plugin_extra_tree_classifier module

class ExtraTreeClassifierPlugin(criterion: int = 0, calibration: int = 0, random_state: int = 0, model: Optional[Any] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on extra-trees classifier.

Method:

The Extra-Trees classifierimplements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Parameters
  • criterion – int The function to measure the quality of a split. Supported criteria are “gini”(0) for the Gini impurity and “entropy”(1) for the information gain.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("extra_tree_classifier")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
criterions = ['gini', 'entropy']
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_extra_tree_classifier.ExtraTreeClassifierPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_extra_tree_classifier.ExtraTreeClassifierPlugin

autoprognosis.plugins.prediction.classifiers.plugin_gaussian_naive_bayes module

class GaussianNaiveBayesPlugin(calibration: int = 0, random_state: int = 0, model: Optional[Any] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Gaussian Naive Bayes algorithm for classification.

Method:

The plugin implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian.

Parameters
  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("gaussian_naive_bayes")
>>> plugin.fit_predict(...)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_gaussian_naive_bayes.GaussianNaiveBayesPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_gaussian_naive_bayes.GaussianNaiveBayesPlugin

autoprognosis.plugins.prediction.classifiers.plugin_gradient_boosting module

class GradientBoostingPlugin(n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 6, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Gradient boosting method.

Method:

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest.

Parameters
  • n_estimators – int The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • learning_rate – float Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • max_depth – int The maximum depth of the individual regression estimators.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("gradient_boosting")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_gradient_boosting.GradientBoostingPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_gradient_boosting.GradientBoostingPlugin

autoprognosis.plugins.prediction.classifiers.plugin_knn module

class KNNPlugin(n_neighbors: int = 5, weights: int = 0, algorithm: int = 0, leaf_size: int = 30, p: int = 2, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the k-nearest neighbors vote.

Method:

Neighbors-based classification is a type of instance-based learning or non-generalizing learning: it does not attempt to construct a general internal model, but simply stores instances of the training data. Classification is computed from a simple majority vote of the nearest neighbors of each point: a query point is assigned the data class which has the most representatives within the nearest neighbors of the point.

Parameters
  • n_neighbors – int Number of neighbors to use

  • weights – str Weight function used in prediction. Possible values: “uniform”, “distance”

  • algorithm – str Algorithm used to compute the nearest neighbors: “ball_tree”, “kd_tree”, “brute” or “auto”.

  • leaf_size – int Leaf size passed to BallTree or KDTree.

  • p – int Power parameter for the Minkowski metric.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("knn")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
algorithms = ['auto', 'ball_tree', 'kd_tree', 'brute']
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_knn.KNNPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

weights = ['uniform', 'distance']
plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_knn.KNNPlugin

autoprognosis.plugins.prediction.classifiers.plugin_lda module

class LinearDiscriminantAnalysisPlugin(calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on Linear Discriminant Analysis.

Method:

The plugin is based on Linear Discriminant Analysis, a classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

Parameters
  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("lda")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_lda.LinearDiscriminantAnalysisPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_lda.LinearDiscriminantAnalysisPlugin

autoprognosis.plugins.prediction.classifiers.plugin_lgbm module

class LightGBMPlugin(n_estimators: int = 100, boosting_type: str = 'gbdt', learning_rate: float = 0.01, max_depth: int = 6, reg_lambda: float = 0.001, reg_alpha: float = 0.001, colsample_bytree: float = 0.1, subsample: float = 0.1, num_leaves: int = 31, min_child_samples: int = 1, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on LightGBM.

Method:

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest.

Parameters
  • n_estimators – int The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

  • learning_rate – float Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

  • max_depth – int The maximum depth of the individual regression estimators.

  • boosting_type – str ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.

  • objective – str Specify the learning task and the corresponding learning objective or a custom objective function to be used.

  • reg_lambda – float L2 regularization term on weights.

  • reg_alpha – float L1 regularization term on weights.

  • colsample_bytree – float Subsample ratio of columns when constructing each tree.

  • subsample – float Subsample ratio of the training instance.

  • num_leaves – int Maximum tree leaves for base learners.

  • min_child_samples – int Minimum sum of instance weight (hessian) needed in a child (leaf).

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("lgbm")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_lgbm.LightGBMPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_lgbm.LightGBMPlugin

autoprognosis.plugins.prediction.classifiers.plugin_linear_svm module

class LinearSVMPlugin(penalty: int = 1, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Linear Support Vector Classification algorithm.

Method:

The plugin is based on LinearSVC, an implementation of Support Vector Classification for the case of a linear kernel.

Parameters
  • penalty – int Specifies the norm used in the penalization. 0: l1, 1: l2

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("linear_svm")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_linear_svm.LinearSVMPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

penalties = ['l1', 'l2']
predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_linear_svm.LinearSVMPlugin

autoprognosis.plugins.prediction.classifiers.plugin_logistic_regression module

class LogisticRegressionPlugin(C: float = 1.0, solver: int = 1, multi_class: int = 0, class_weight: int = 0, max_iter: int = 10000, penalty: str = 'l2', calibration: int = 0, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, n_jobs: int = 2, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Logistic Regression classifier.

Method:

Logistic regression is a linear model for classification rather than regression. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.

Parameters
  • C – float Inverse of regularization strength; must be a positive float.

  • solver – int index Algorithm to use in the optimization problem: [‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’]

  • multi_class – int If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

  • class_weight – int index Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

  • max_iter – int Maximum number of iterations taken for the solvers to converge.

  • penalty – str Specify the norm of the penalty:

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("logistic_regression")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
classes = ['auto', 'ovr', 'multinomial']
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_logistic_regression.LogisticRegressionPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
solvers = ['newton-cg', 'lbfgs', 'sag', 'saga']
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

weights = ['balanced', None]
plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_logistic_regression.LogisticRegressionPlugin

autoprognosis.plugins.prediction.classifiers.plugin_multinomial_naive_bayes module

class MultinomialNaiveBayesPlugin(alpha: float = 1.0, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Multinomial Naive Bayes algorithm.

Method:

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification).

Parameters
  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("multinomial_naive_bayes")
>>> plugin.fit_predict(...)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_multinomial_naive_bayes.MultinomialNaiveBayesPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_multinomial_naive_bayes.MultinomialNaiveBayesPlugin

autoprognosis.plugins.prediction.classifiers.plugin_neural_nets module

class BasicNet(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

Basic neural net.

Parameters
  • n_unit_in (int) – Number of features

  • categories (int) –

  • n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)

  • n_units_hidden (int) – Number of hidden units in each hypothesis layer

  • nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.

  • lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.

  • weight_decay (float) – l2 (ridge) penalty for the weights.

  • n_iter (int) – Maximum number of iterations.

  • batch_size (int) – Batch size

  • n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.

  • val_split_prop (float) – Proportion of samples used for validation split (can be 0)

  • patience (int) – Number of iterations to wait before early stopping after decrease in validation loss

  • n_iter_min (int) – Minimum number of iterations to go through before starting early stopping

  • clipping_value (int, default 1) – Gradients clipping value

forward(X: torch.Tensor) torch.Tensor
train(X: torch.Tensor, y: torch.Tensor) autoprognosis.plugins.prediction.classifiers.plugin_neural_nets.BasicNet
class NeuralNetsPlugin(n_layers_hidden: int = 1, n_units_hidden: int = 100, nonlin: str = 'relu', lr: float = 0.001, weight_decay: float = 0.001, n_iter: int = 1000, batch_size: int = 128, n_iter_print: int = 10, patience: int = 10, n_iter_min: int = 100, dropout: float = 0.1, clipping_value: int = 1, batch_norm: bool = True, early_stopping: bool = True, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on Neural networks.

Parameters
  • n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)

  • n_units_hidden (int) – Number of hidden units in each hypothesis layer

  • nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.

  • lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.

  • weight_decay (float) – l2 (ridge) penalty for the weights.

  • n_iter (int) – Maximum number of iterations.

  • batch_size (int) – Batch size

  • n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.

  • val_split_prop (float) – Proportion of samples used for validation split (can be 0)

  • patience (int) – Number of iterations to wait before early stopping after decrease in validation loss

  • n_iter_min (int) – Minimum number of iterations to go through before starting early stopping

  • clipping_value (int, default 1) – Gradients clipping value

  • random_state (int, default 0) – Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("neural_nets", n_layers_hidden = 2)
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_neural_nets.NeuralNetsPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_neural_nets.NeuralNetsPlugin

autoprognosis.plugins.prediction.classifiers.plugin_perceptron module

class PerceptronPlugin(penalty: int = 1, alpha: float = 0.0001, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on perceptrons.

Method:

Perceptron is simple classification algorithm suitable for large scale learning. By default, it does not require a learning rate and it updates its model only on mistakes.

Parameters
  • penalty – str The penalty to be used: {‘l2’,’l1’,’elasticnet’}

  • alpha – float Constant that multiplies the regularization term if regularization is used.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("perceptron")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_perceptron.PerceptronPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

penalties = ['l1', 'l2', 'elasticnet']
predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_perceptron.PerceptronPlugin

autoprognosis.plugins.prediction.classifiers.plugin_qda module

class QuadraticDiscriminantAnalysisPlugin(calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on Quadratic Discriminant Analysis.

Method:

The plugin is based on Quadratic Discriminant Analysis, a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

Parameters
  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("qda")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_qda.QuadraticDiscriminantAnalysisPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_qda.QuadraticDiscriminantAnalysisPlugin

autoprognosis.plugins.prediction.classifiers.plugin_random_forest module

class RandomForestPlugin(n_estimators: int = 100, criterion: int = 0, min_samples_split: int = 2, bootstrap: bool = True, min_samples_leaf: int = 2, calibration: int = 0, max_depth: int = 4, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on Random forests.

Method:

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Parameters
  • n_estimators – int The number of trees in the forest.

  • criterion – str The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

  • min_samples_split – int The minimum number of samples required to split an internal node.

  • boostrap – bool Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

  • min_samples_leaf – int The minimum number of samples required to be at a leaf node.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("random_forest")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
change_output(output: str) None
criterions = ['gini', 'entropy']
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_random_forest.RandomForestPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_random_forest.RandomForestPlugin

autoprognosis.plugins.prediction.classifiers.plugin_ridge_classifier module

class RidgeClassifierPlugin(solver: int = 0, calibration: int = 0, random_state: int = 0, model: Optional[Any] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the Ridge classifier.

Method:

The RidgeClassifier converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

Parameters
  • solver – str Algorithm to use in the optimization problem: {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("ridge_classifier")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_ridge_classifier.RidgeClassifierPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
solvers = ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg']
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_ridge_classifier.RidgeClassifierPlugin

autoprognosis.plugins.prediction.classifiers.plugin_tabnet module

class TabNetPlugin(n_d: int = 64, n_a: int = 64, lr: float = 0.001, n_steps: int = 3, gamma: float = 1.5, n_independent: int = 2, n_shared: int = 2, lambda_sparse: float = 0.0001, momentum: float = 0.3, clip_value: float = 2.0, max_epochs: int = 1000, patience: int = 20, batch_size: int = 50, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features.

Parameters
  • n_d – int Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting. Values typically range from 8 to 64.

  • n_a – int Width of the attention embedding for each mask. According to the paper n_d=n_a is usually a good choice. (default=8)

  • lr – float Learning rate

  • n_steps – int Number of steps in the architecture (usually between 3 and 10)

  • gamma – float This is the coefficient for feature reusage in the masks. A value close to 1 will make mask selection least correlated between layers. Values range from 1.0 to 2.0.

  • n_independent – int Number of independent Gated Linear Units layers at each step. Usual values range from 1 to 5.

  • n_shared – int Number of shared Gated Linear Units at each step Usual values range from 1 to 5

  • lambda_sparse – float This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.

  • momentum – float Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)

  • clip_value – float If a float is given this will clip the gradient at clip_value.

  • max_epochs – int Maximum number of epochs for trainng.

  • patience – int Number of consecutive epochs without improvement before performing early stopping.

  • batch_size – int Batch size

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("tabnet", max_epochs = 100)
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class

Original implementation: https://github.com/dreamquark-ai/tabnet

change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_tabnet.TabNetPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_tabnet.TabNetPlugin

autoprognosis.plugins.prediction.classifiers.plugin_xgboost module

class XGBoostPlugin(n_estimators: int = 100, reg_lambda: float = 0.001, reg_alpha: float = 0.001, colsample_bytree: float = 0.1, colsample_bynode: float = 0.1, colsample_bylevel: float = 0.1, max_depth: int = 6, subsample: float = 0.1, learning_rate: float = 0.01, min_child_weight: int = 0, max_bin: int = 256, booster: int = 0, grow_policy: int = 0, random_state: int = 0, calibration: int = 0, gamma: float = 0, model: Optional[Any] = None, nthread: int = 2, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin

Classification plugin based on the XGBoost classifier.

Method:

Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm has a robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can fine-tune.

Parameters
  • n_estimators – int The maximum number of estimators at which boosting is terminated.

  • max_depth – int Maximum depth of a tree.

  • reg_lambda – float L2 regularization term on weights (xgb’s lambda).

  • reg_alpha – float L1 regularization term on weights (xgb’s alpha).

  • colsample_bytree – float Subsample ratio of columns when constructing each tree.

  • colsample_bynode – float Subsample ratio of columns for each split.

  • colsample_bylevel – float Subsample ratio of columns for each level.

  • subsample – float Subsample ratio of the training instance.

  • learning_rate – float Boosting learning rate

  • booster – int index Specify which booster to use: gbtree, gblinear or dart.

  • min_child_weight – int Minimum sum of instance weight(hessian) needed in a child.

  • max_bin – int Number of bins for histogram construction.

  • grow_policy – int index Controls a way new nodes are added to the tree. 0: “depthwise”, 1 : “lossguide”

  • random_state – float Random number seed.

  • calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="classifiers").get("xgboost", n_estimators = 20)
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
booster = ['gbtree', 'gblinear', 'dart']
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
grow_policy = ['depthwise', 'lossguide']
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_xgboost.XGBoostPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.classifiers.plugin_xgboost.XGBoostPlugin

Risk estimation

autoprognosis.plugins.prediction.risk_estimation.plugin_coxnet module

class CoxnetRiskEstimationPlugin(hidden_dim: int = 100, hidden_len: int = 2, batch_norm: bool = True, dropout: float = 0.1, lr: float = 0.001, epochs: int = 5000, patience: int = 50, batch_size: int = 128, verbose: bool = False, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

CoxPH neural net plugin for survival analysis.

Parameters
  • hidden_dim – int Number of neurons in the hidden layers

  • hidden_len – int Number of hidden layers

  • batch_norm – bool. Batch norm on/off.

  • dropout – float. Dropout value.

  • lr – float. Learning rate.

  • epochs – int. Number of training epochs

  • patience – int. Number of iterations without validation improvement.

  • batch_size – int. Batch size

  • verbose – bool. Enable debug logs

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> from pycox.datasets import metabric
>>>
>>> df = metabric.read_df()
>>> X = df.drop(["duration", "event"], axis=1)
>>> Y = df["event"]
>>> T = df["duration"]
>>>
>>> plugin = Predictions(category="risk_estimation").get("coxnet")
>>> plugin.fit(X, T, Y)
>>>
>>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))]
>>> plugin.predict(X, eval_time_horizons)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

Return the hyperparameter space for the current model.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_coxnet.CoxnetRiskEstimationPlugin

Load the plugin from bytes

static name() str

Return the name of the current model.

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.risk_estimation.plugin_coxnet.CoxnetRiskEstimationPlugin

autoprognosis.plugins.prediction.risk_estimation.plugin_deephit module

class DeepHitRiskEstimationPlugin(model: Optional[Any] = None, num_durations: int = 10, batch_size: int = 100, epochs: int = 5000, lr: float = 0.01, dim_hidden: int = 300, alpha: float = 0.28, sigma: float = 0.38, dropout: float = 0.2, patience: int = 20, batch_norm: bool = False, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

DeepHit plugin for survival analysis. DeepHit, that uses a deep neural network to learn the distribution of survival times directly.DeepHit makes no assumptions about the underlying stochastic process and allows for the possibility that the relationship between covariates and risk(s) changes over time. Most importantly, DeepHit smoothly handles competing risks; i.e. settings in which there is more than one possible event of interest.

Parameters
  • num_durations – int Number of points in the survival function

  • batch_size – int Batch size

  • epochs – int Number of iterations

  • lr – float learning rate

  • dim_hidden – int Number of neurons in the hidden layers

  • alpha – float Weighting (0, 1) likelihood and rank loss (L2 in paper). 1 gives only likelihood, and 0 gives only rank loss. (default: {0.2})

  • sigma – float From eta in rank loss (L2 in paper) (default: {0.1})

  • dropout – float Dropout value

  • patience – int Number of epochs without improvement.

  • batch_norm – bool Enable/Disable batch_norm

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> from pycox.datasets import metabric
>>>
>>> df = metabric.read_df()
>>> X = df.drop(["duration", "event"], axis=1)
>>> Y = df["event"]
>>> T = df["duration"]
>>>
>>> plugin = Predictions(category="risk_estimation").get("deephit")
>>> plugin.fit(X, T, Y)
>>>
>>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))]
>>> plugin.predict(X, eval_time_horizons)

References: [1] Changhee Lee, William R Zame, Jinsung Yoon, and Mihaela van der Schaar. Deephit: A deep learning

approach to survival analysis with competing risks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. http://medianetlab.ee.ucla.edu/papers/AAAI_2018_DeepHit

change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_deephit.DeepHitRiskEstimationPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.risk_estimation.plugin_deephit.DeepHitRiskEstimationPlugin

autoprognosis.plugins.prediction.risk_estimation.plugin_loglogistic_aft module

class LogLogisticAFTPlugin(alpha: float = 0.05, l1_ratio: float = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Log-Logistic AFT plugin for survival analysis.

Parameters
  • alpha – float the level in the confidence intervals.

  • l1_ratio – float the penalizer coefficient to the size of the coefficients.

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> from pycox.datasets import metabric
>>>
>>> df = metabric.read_df()
>>> X = df.drop(["duration", "event"], axis=1)
>>> Y = df["event"]
>>> T = df["duration"]
>>>
>>> plugin = Predictions(category="risk_estimation").get("loglogistic_aft")
>>> plugin.fit(X, T, Y)
>>>
>>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))]
>>> plugin.predict(X, eval_time_horizons)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_loglogistic_aft.LogLogisticAFTPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.risk_estimation.plugin_loglogistic_aft.LogLogisticAFTPlugin

autoprognosis.plugins.prediction.risk_estimation.plugin_lognormal_aft module

class LogNormalAFTPlugin(alpha: float = 0.05, l1_ratio: float = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Log-Normal AFT plugin for survival analysis.

Parameters
  • alpha – float the level in the confidence intervals.

  • l1_ratio – float the penalizer coefficient to the size of the coefficients.

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> from pycox.datasets import metabric
>>>
>>> df = metabric.read_df()
>>> X = df.drop(["duration", "event"], axis=1)
>>> Y = df["event"]
>>> T = df["duration"]
>>>
>>> plugin = Predictions(category="risk_estimation").get("lognormal_aft")
>>> plugin.fit(X, T, Y)
>>>
>>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))]
>>> plugin.predict(X, eval_time_horizons)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_lognormal_aft.LogNormalAFTPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.risk_estimation.plugin_lognormal_aft.LogNormalAFTPlugin

autoprognosis.plugins.prediction.risk_estimation.plugin_survival_xgboost module

class XGBoostRiskEstimationPlugin(n_estimators: int = 100, reg_lambda: float = 0.001, reg_alpha: float = 0.001, colsample_bytree: float = 0.1, colsample_bynode: float = 0.1, colsample_bylevel: float = 0.1, max_depth: int = 6, subsample: float = 0.1, learning_rate: float = 0.01, min_child_weight: int = 0, max_bin: int = 256, booster: int = 0, grow_policy: int = 0, objective: str = 'aft', strategy: str = 'weibull', model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Survival XGBoost plugin for survival analysis.

Parameters
  • n_estimators – int The maximum number of estimators at which boosting is terminated.

  • max_depth – int Maximum depth of a tree.

  • reg_lambda – float L2 regularization term on weights (xgb’s lambda).

  • reg_alpha – float L1 regularization term on weights (xgb’s alpha).

  • colsample_bytree – float Subsample ratio of columns when constructing each tree.

  • colsample_bynode – float Subsample ratio of columns for each split.

  • colsample_bylevel – float Subsample ratio of columns for each level.

  • subsample – float Subsample ratio of the training instance.

  • learning_rate – float Boosting learning rate

  • booster – int index Specify which booster to use: gbtree, gblinear or dart.

  • min_child_weight – int Minimum sum of instance weight(hessian) needed in a child.

  • max_bin – int Number of bins for histogram construction.

  • grow_policy – int index Controls a way new nodes are added to the tree. 0: “depthwise”, 1 : “lossguide”

  • random_state – float Random number seed.

  • objective – str Survival analysis objective. Can be “aft” or “cox”

  • strategy – str Survival analysis model. Can be “weibull”, “debiased_bce”

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> from pycox.datasets import metabric
>>>
>>> df = metabric.read_df()
>>> X = df.drop(["duration", "event"], axis=1)
>>> Y = df["event"]
>>> T = df["duration"]
>>>
>>> plugin = Predictions(category="risk_estimation").get("survival_xgboost")
>>> plugin.fit(X, T, Y)
>>>
>>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))]
>>> plugin.predict(X, eval_time_horizons)
booster = ['gbtree', 'gblinear', 'dart']
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
grow_policy = ['depthwise', 'lossguide']
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_survival_xgboost.XGBoostRiskEstimationPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.risk_estimation.plugin_survival_xgboost.XGBoostRiskEstimationPlugin

autoprognosis.plugins.prediction.risk_estimation.plugin_weibull_aft module

class WeibullAFTPlugin(alpha: float = 0.05, l1_ratio: float = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Weibull AFT plugin for survival analysis.

Parameters
  • alpha – float the level in the confidence intervals.

  • l1_ratio – float the penalizer coefficient to the size of the coefficients.

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> from pycox.datasets import metabric
>>>
>>> df = metabric.read_df()
>>> X = df.drop(["duration", "event"], axis=1)
>>> Y = df["event"]
>>> T = df["duration"]
>>>
>>> plugin = Predictions(category="risk_estimation").get("weibull_aft")
>>> plugin.fit(X, T, Y)
>>>
>>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))]
>>> plugin.predict(X, eval_time_horizons)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_weibull_aft.WeibullAFTPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.risk_estimation.plugin_weibull_aft.WeibullAFTPlugin

Regression

autoprognosis.plugins.prediction.regression.plugin_bayesian_ridge module

class BayesianRidgePlugin(n_iter: int = 1000, tol: float = 0.001, hyperparam_search_iterations: Optional[int] = None, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Bayesian ridge regression.

Parameters
  • n_iter – int Maximum number of iterations. Should be greater than or equal to 1.

  • tol – float Stop the algorithm if w has converged.

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regression").get("bayesian_ridge")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_bayesian_ridge.BayesianRidgePlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_bayesian_ridge.BayesianRidgePlugin

autoprognosis.plugins.prediction.regression.plugin_catboost_regressor module

class CatBoostRegressorPlugin(depth: int = 5, grow_policy: int = 0, n_estimators: int = 100, l2_leaf_reg: float = 3, learning_rate: float = 0.001, min_data_in_leaf: int = 1, random_strength: float = 1, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on the CatBoost framework.

Method:

CatBoost provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. It uses Ordered Boosting to overcome over fitting and Symmetric Trees for faster execution.

Parameters
  • n_estimators – int Number of gradient boosted trees. Equivalent to number of boosting rounds.

  • depth – int Depth of the tree.

  • grow_policy – int The tree growing policy. Defines how to perform greedy tree construction: [SymmetricTree, Depthwise]

  • l2_leaf_reg – float Coefficient at the L2 regularization term of the cost function.

  • learning_rate – float The learning rate used for reducing the gradient step.

  • min_data_in_leaf – int The minimum number of training samples in a leaf.

  • random_strength – float The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regression").get("catboost_regressor")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
grow_policies = ['Depthwise', 'SymmetricTree', 'Lossguide']
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_catboost_regressor.CatBoostRegressorPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_catboost_regressor.CatBoostRegressorPlugin

autoprognosis.plugins.prediction.regression.plugin_kneighbors_regressor module

class KNeighborsRegressorPlugin(n_neighbors: int = 5, weights: int = 0, algorithm: int = 0, leaf_size: int = 30, p: int = 2, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, model: Optional[Any] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on the KNeighborsRegressor.

Parameters
  • n_neighbors – int Number of neighbors to use

  • weights – str Weight function used in prediction. Possible values: “uniform”, “distance”

  • algorithm – int index Algorithm used to compute the nearest neighbors: “ball_tree”, “kd_tree”, “brute” or “auto”.

  • leaf_size – int Leaf size passed to BallTree or KDTree.

  • p – int Power parameter for the Minkowski metric.

  • random_state – int, default 0 Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regression").get("kneighbors_regressor")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
algorithm = ['auto', 'ball_tree', 'kd_tree', 'brute']
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_kneighbors_regressor.KNeighborsRegressorPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

weights = ['uniform', 'distance']
plugin

alias of autoprognosis.plugins.prediction.regression.plugin_kneighbors_regressor.KNeighborsRegressorPlugin

autoprognosis.plugins.prediction.regression.plugin_linear_regression module

class LinearRegressionPlugin(model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on the Linear Regression.

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regression").get("linear_regression")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_linear_regression.LinearRegressionPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
solvers = ['auto', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_linear_regression.LinearRegressionPlugin

autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression module

class BasicNet(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

Basic neural net.

Parameters
  • n_unit_in (int) – Number of features

  • n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)

  • n_units_hidden (int) – Number of hidden units in each hypothesis layer

  • nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.

  • lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.

  • weight_decay (float) – l2 (ridge) penalty for the weights.

  • n_iter (int) – Maximum number of iterations.

  • batch_size (int) – Batch size

  • n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.

  • val_split_prop (float) – Proportion of samples used for validation split (can be 0)

  • patience (int) – Number of iterations to wait before early stopping after decrease in validation loss

  • n_iter_min (int) – Minimum number of iterations to go through before starting early stopping

  • clipping_value (int, default 1) – Gradients clipping value

forward(X: torch.Tensor) torch.Tensor
train(X: torch.Tensor, y: torch.Tensor) autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression.BasicNet
class NeuralNetsRegressionPlugin(n_layers_hidden: int = 1, n_units_hidden: int = 100, nonlin: str = 'relu', lr: float = 0.001, weight_decay: float = 0.001, n_iter: int = 1000, batch_size: int = 512, n_iter_print: int = 10, patience: int = 10, n_iter_min: int = 100, dropout: float = 0.1, clipping_value: int = 1, batch_norm: bool = True, early_stopping: bool = True, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on Neural networks.

Parameters
  • n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)

  • n_units_hidden (int) – Number of hidden units in each hypothesis layer

  • nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.

  • lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.

  • weight_decay (float) – l2 (ridge) penalty for the weights.

  • n_iter (int) – Maximum number of iterations.

  • batch_size (int) – Batch size

  • n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.

  • val_split_prop (float) – Proportion of samples used for validation split (can be 0)

  • patience (int) – Number of iterations to wait before early stopping after decrease in validation loss

  • n_iter_min (int) – Minimum number of iterations to go through before starting early stopping

  • clipping_value (int, default 1) – Gradients clipping value

  • random_state (int) – Random seed

  • Example

    >>> from autoprognosis.plugins.prediction import Predictions
    >>> plugin = Predictions(category="regression").get("neural_nets_regression", n_iter = 100)
    >>> from sklearn.datasets import load_iris
    >>> X, y = load_iris(return_X_y=True)
    >>> plugin.fit_predict(X, y) # returns the probabilities for each class
    

change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression.NeuralNetsRegressionPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression.NeuralNetsRegressionPlugin

autoprognosis.plugins.prediction.regression.plugin_random_forest_regressor module

class RandomForestRegressionPlugin(n_estimators: int = 50, criterion: int = 0, min_samples_split: int = 2, bootstrap: bool = True, min_samples_leaf: int = 2, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on Random forests.

Method:

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Parameters
  • n_estimators – int The number of trees in the forest.

  • criterion – str The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

  • min_samples_split – int The minimum number of samples required to split an internal node.

  • boostrap – bool Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

  • min_samples_leaf – int The minimum number of samples required to be at a leaf node.

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regression").get("random_forest")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
change_output(output: str) None
criterions = ['squared_error', 'absolute_error', 'friedman_mse', 'poisson']
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_random_forest_regressor.RandomForestRegressionPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_random_forest_regressor.RandomForestRegressionPlugin

autoprognosis.plugins.prediction.regression.plugin_tabnet_regressor module

class TabNetRegressorPlugin(n_d: int = 64, n_a: int = 64, lr: float = 0.001, n_steps: int = 3, gamma: float = 1.5, n_independent: int = 2, n_shared: int = 2, lambda_sparse: float = 0.0001, momentum: float = 0.3, clip_value: float = 2.0, epsilon: float = 1e-15, n_iter: int = 1000, patience: int = 50, batch_size: int = 50, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features.

Parameters
  • n_d – int Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting. Values typically range from 8 to 64.

  • n_a – int Width of the attention embedding for each mask. According to the paper n_d=n_a is usually a good choice. (default=8)

  • lr – float Learning rate

  • n_steps – int Number of steps in the architecture (usually between 3 and 10)

  • gamma – float This is the coefficient for feature reusage in the masks. A value close to 1 will make mask selection least correlated between layers. Values range from 1.0 to 2.0.

  • n_independent – int Number of independent Gated Linear Units layers at each step. Usual values range from 1 to 5.

  • n_shared – int Number of shared Gated Linear Units at each step Usual values range from 1 to 5

  • lambda_sparse – float This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.

  • momentum – float Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)

  • clip_value – float If a float is given this will clip the gradient at clip_value.

  • max_epochs – int Maximum number of epochs for trainng.

  • patience – int Number of consecutive epochs without improvement before performing early stopping.

  • batch_size – int Batch size

  • random_state – int Random seed

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regression").get("tabnet")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y) # returns the probabilities for each class
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_tabnet_regressor.TabNetRegressorPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_tabnet_regressor.TabNetRegressorPlugin

autoprognosis.plugins.prediction.regression.plugin_xgboost_regressor module

class XGBoostRegressorPlugin(reg_lambda: Optional[float] = None, reg_alpha: Optional[float] = None, colsample_bytree: Optional[float] = None, colsample_bynode: Optional[float] = None, colsample_bylevel: Optional[float] = None, n_estimators: int = 100, max_depth: Optional[int] = 3, lr: Optional[float] = None, subsample: Optional[float] = None, min_child_weight: Optional[int] = None, max_bin: int = 256, booster: int = 0, grow_policy: int = 0, eta: float = 0.3, model: Optional[Any] = None, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)

Bases: autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Regression plugin based on the XGBoost.

Method:

Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoostRegressor algorithm has a robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can fine-tune.

Parameters
  • n_estimators – int The maximum number of estimators at which boosting is terminated.

  • max_depth – int Maximum depth of a tree.

  • reg_lambda – float L2 regularization term on weights (xgb’s lambda).

  • reg_alpha – float L1 regularization term on weights (xgb’s alpha).

  • colsample_bytree – float Subsample ratio of columns when constructing each tree.

  • colsample_bynode – float Subsample ratio of columns for each split.

  • colsample_bylevel – float Subsample ratio of columns for each level.

  • subsample – float Subsample ratio of the training instance.

  • learning_rate – float Boosting learning rate

  • booster – str Specify which booster to use: gbtree, gblinear or dart.

  • min_child_weight – int Minimum sum of instance weight(hessian) needed in a child.

  • max_bin – int Number of bins for histogram construction.

  • random_state – float Random number seed.

Example

>>> from autoprognosis.plugins.prediction import Predictions
>>> plugin = Predictions(category="regressors").get("xgboost_regressor")
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> plugin.fit_predict(X, y)
booster = ['gbtree', 'gblinear', 'dart']
change_output(output: str) None
explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin

Train the plugin

Parameters

X – pd.DataFrame

fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and predict the training data. Used by predictors.

fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Fit the model and transform the training data. Used by imputers and preprocessors.

classmethod fqdn() str

The fully-qualified name of the plugin: type->subtype->name

get_args() dict
grow_policy = ['depthwise', 'lossguide']
static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter search domain, used for tuning.

classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]

The hyperparameter domain using they fully-qualified name.

is_fitted() bool

Check if the model was trained

classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_xgboost_regressor.XGBoostRegressorPlugin

Load the plugin from bytes

static name() str

The name of the plugin, e.g.: xgboost

predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame

Run predictions for the input. Used by predictors.

Parameters

X – pd.DataFrame

predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters for Optuna.

classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters using they fully-qualified name.

classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]

Sample hyperparameters as a dict.

save() bytes

Save the plugin to bytes

score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
static subtype() str

The type of the plugin, e.g.: classifier

transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transform the input. Used by imputers and preprocessors.

Parameters

X – pd.DataFrame

static type() str

The type of the plugin, e.g.: prediction

plugin

alias of autoprognosis.plugins.prediction.regression.plugin_xgboost_regressor.XGBoostRegressorPlugin

Explainability plugins

Explainability plugins

autoprognosis.plugins.explainers.plugin_invase module

class INVASEPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, feature_names: Optional[List] = None, n_epoch: int = 10000, n_epoch_inner: int = 2, n_folds: int = 5, task_type: str = 'classification', samples: int = 2000, prefit: bool = False, random_state: int = 0)

Bases: autoprognosis.plugins.explainers.base.ExplainerPlugin

Interpretability plugin based on the INVASE algorithm.

Parameters
  • estimator – model. The model to explain.

  • X – dataframe. Training set

  • y – dataframe. Training labels

  • time_to_event – dataframe. Used for risk estimation tasks.

  • eval_times – list. Used for risk estimation tasks.

  • n_epoch – int. training epochs

  • task_type – str. classification or risk_estimation

  • samples – int. Number of samples to use.

  • prefit – bool. If true, the estimator won’t be trained.

Example

>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>>from autoprognosis.plugins.explainers import Explainers
>>> from autoprognosis.plugins.prediction.classifiers import Classifiers
>>>
>>> X, y = load_iris(return_X_y=True)
>>>
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> model = Classifiers().get("logistic_regression")
>>>
>>> explainer = Explainers().get(
>>>     "invase",
>>>     model,
>>>     X_train,
>>>     y_train,
>>>     task_type="classification",
>>> )
>>>
>>> explainer.explain(X_test)
explain(X: pandas.core.frame.DataFrame) numpy.ndarray
static name() str
plot(values: pandas.core.frame.DataFrame) None
static pretty_name() str
static type() str
class Masking(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

forward(tensors: List[torch.Tensor]) torch.Tensor
bitmask_intervals(n: int, low: int, high: int) Generator
bitmasks(n: int, m: int) Generator
class invaseBase(estimator: Any, X: numpy.ndarray, n_epoch: int = 10000, n_epoch_inner: int = 1, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 50, batch_size: int = 300, learning_rate: float = 0.001, penalty_l2: float = 0.001, feature_names: List = [])

Bases: object

abstract explain(X: numpy.ndarray, *args: Any, **kwargs: Any) numpy.ndarray
class invaseCV(estimator: Any, X: numpy.ndarray, critic_latent_dim: int = 200, n_epoch: int = 10000, n_epoch_inner: int = 2, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 50, n_folds: int = 5, seed: int = 42, feature_names: List = [])

Bases: object

explain(x: numpy.ndarray) numpy.ndarray
class invaseClassifier(estimator: Any, X: numpy.ndarray, critic_latent_dim: int = 200, n_epoch: int = 10000, n_epoch_inner: int = 2, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 50, batch_size: int = 300, learning_rate: float = 0.001, penalty_l2: float = 0.001, feature_names: List = [])

Bases: autoprognosis.plugins.explainers.plugin_invase.invaseBase

explain(X: numpy.ndarray, *args: Any, **kwargs: Any) numpy.ndarray
class invaseRiskEstimation(estimator: Any, X: numpy.ndarray, eval_times: List, critic_latent_dim: int = 200, n_epoch: int = 10000, n_epoch_inner: int = 2, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 10, batch_size: int = 500, learning_rate: float = 0.001, penalty_l2: float = 0.001, samples: int = 20000, feature_names: List = [])

Bases: autoprognosis.plugins.explainers.plugin_invase.invaseBase

explain(X: numpy.ndarray, *args: Any, **kwargs: Any) numpy.ndarray
plugin

alias of autoprognosis.plugins.explainers.plugin_invase.INVASEPlugin

sample(X: numpy.ndarray, nsamples: int = 100, random_state: int = 0) numpy.ndarray

autoprognosis.plugins.explainers.plugin_kernel_shap module

class KernelSHAPPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, task_type: str = 'classification', feature_names: Optional[List] = None, subsample: int = 10, prefit: bool = False, n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.explainers.base.ExplainerPlugin

Interpretability plugin based on KernelSHAP.

Parameters
  • estimator – model. The model to explain.

  • X – dataframe. Training set

  • y – dataframe. Training labels

  • task_type – str. classification or risk_estimation

  • prefit – bool. If true, the estimator won’t be trained.

  • n_epoch – int. training epochs

  • subsample – int. Number of samples to use.

  • time_to_event – dataframe. Used for risk estimation tasks.

  • eval_times – list. Used for risk estimation tasks.

Example

>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>>from autoprognosis.plugins.explainers import Explainers
>>> from autoprognosis.plugins.prediction.classifiers import Classifiers
>>>
>>> X, y = load_iris(return_X_y=True)
>>>
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> model = Classifiers().get("logistic_regression")
>>>
>>> explainer = Explainers().get(
>>>     "kernel_shap",
>>>     model,
>>>     X_train,
>>>     y_train,
>>>     task_type="classification",
>>> )
>>>
>>> explainer.explain(X_test)
explain(X: pandas.core.frame.DataFrame) numpy.ndarray
static name() str
plot(X: pandas.core.frame.DataFrame) None
static pretty_name() str
static type() str
plugin

alias of autoprognosis.plugins.explainers.plugin_kernel_shap.KernelSHAPPlugin

autoprognosis.plugins.explainers.plugin_lime module

class LimePlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, feature_names: Optional[List] = None, task_type: str = 'classification', prefit: bool = False, n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.explainers.base.ExplainerPlugin

Interpretability plugin based on LIME.

Parameters
  • estimator – model. The model to explain.

  • X – dataframe. Training set

  • y – dataframe. Training labels

  • task_type – str. classification of risk_estimation

  • prefit – bool. If true, the estimator won’t be trained.

  • n_epoch – int. training epochs

  • time_to_event – dataframe. Used for risk estimation tasks.

  • eval_times – list. Used for risk estimation tasks.

Example

>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>>from autoprognosis.plugins.explainers import Explainers
>>> from autoprognosis.plugins.prediction.classifiers import Classifiers
>>>
>>> X, y = load_iris(return_X_y=True)
>>>
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> model = Classifiers().get("logistic_regression")
>>>
>>> explainer = Explainers().get(
>>>     "lime",
>>>     model,
>>>     X_train,
>>>     y_train,
>>>     task_type="classification",
>>> )
>>>
>>> explainer.explain(X_test)
explain(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
static name() str
plot(importances: pandas.core.frame.DataFrame, feature_names: Optional[list] = None) None
static pretty_name() str
static type() str
plugin

alias of autoprognosis.plugins.explainers.plugin_lime.LimePlugin

autoprognosis.plugins.explainers.plugin_risk_effect_size module

class RiskEffectSizePlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, task_type: str = 'classification', feature_names: Optional[List] = None, subsample: int = 10, prefit: bool = False, effect_size: float = 0.5, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.explainers.base.ExplainerPlugin

Interpretability plugin based on Risk Effect size and Cohen’s D.

Parameters
  • estimator – model. The model to explain.

  • X – dataframe. Training set

  • y – dataframe. Training labels

  • task_type – str. classification or risk_estimation

  • prefit – bool. If true, the estimator won’t be trained.

  • n_epoch – int. training epochs

  • time_to_event – dataframe. Used for risk estimation tasks.

  • eval_times – list. Used for risk estimation tasks.

Example

>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>>from autoprognosis.plugins.explainers import Explainers
>>> from autoprognosis.plugins.prediction.classifiers import Classifiers
>>>
>>> X, y = load_iris(return_X_y=True)
>>>
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> model = Classifiers().get("logistic_regression")
>>>
>>> explainer = Explainers().get(
>>>     "risk_effect_size",
>>>     model,
>>>     X_train,
>>>     y_train,
>>>     task_type="classification",
>>> )
>>>
>>> explainer.explain(X_test)
explain(X: pandas.core.frame.DataFrame, effect_size: Optional[float] = None) numpy.ndarray
static name() str
plot(X: pandas.core.frame.DataFrame, ax: Optional[Any] = None) None
static pretty_name() str
static type() str
plugin

alias of autoprognosis.plugins.explainers.plugin_risk_effect_size.RiskEffectSizePlugin

autoprognosis.plugins.explainers.plugin_shap_permutation_sampler module

class ShapPermutationSamplerPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, feature_names: Optional[List] = None, task_type: str = 'classification', n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, prefit: bool = False, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.explainers.base.ExplainerPlugin

Interpretability plugin based on ShapPermutation sampler.

Parameters
  • estimator – model. The model to explain.

  • X – dataframe. Training set

  • y – dataframe. Training labels

  • task_type – str. classification of risk_estimation

  • prefit – bool. If true, the estimator won’t be trained.

  • n_epoch – int. training epochs

  • subsample – int. Number of samples to use.

  • time_to_event – dataframe. Used for risk estimation tasks.

  • eval_times – list. Used for risk estimation tasks.

Example

>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>>from autoprognosis.plugins.explainers import Explainers
>>> from autoprognosis.plugins.prediction.classifiers import Classifiers
>>>
>>> X, y = load_iris(return_X_y=True)
>>>
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> model = Classifiers().get("logistic_regression")
>>>
>>> explainer = Explainers().get(
>>>     "shap_permutation_sampler",
>>>     model,
>>>     X_train,
>>>     y_train,
>>>     task_type="classification",
>>> )
>>>
>>> explainer.explain(X_test)
explain(X: pandas.core.frame.DataFrame, max_evals: Union[int, str] = 'auto') Any
static name() str
plot(importances: pandas.core.frame.DataFrame, feature_names: Optional[list] = None) None
static pretty_name() str
static type() str
plugin

alias of autoprognosis.plugins.explainers.plugin_shap_permutation_sampler.ShapPermutationSamplerPlugin

autoprognosis.plugins.explainers.plugin_symbolic_pursuit module

class SymbolicPursuitPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, task_type: str = 'classification', feature_names: Optional[List] = None, subsample: int = 10, prefit: bool = False, n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, loss_tol: float = 0.001, ratio_tol: float = 0.9, maxiter: int = 100, eps: float = 1e-05, patience: int = 10, random_state: int = 0, **kwargs: Any)

Bases: autoprognosis.plugins.explainers.base.ExplainerPlugin

Interpretability plugin based on Symbolic Pursuit.

Based on the NeurIPS 2020 paper “Learning outside the black-box: at the pursuit of interpretable models”.

Parameters
  • estimator – model. The model to explain.

  • X – dataframe. Training set

  • y – dataframe. Training labels

  • task_type – str. classification or risk_estimation

  • prefit – bool. If true, the estimator won’t be trained.

  • n_epoch – int. training epochs

  • subsample – int. Number of samples to use.

  • time_to_event – dataframe. Used for risk estimation tasks.

  • eval_times – list. Used for risk estimation tasks.

  • loss_tol – float. The tolerance for the loss under which the pursuit stops

  • ratio_tol – float. A new term is added only if new_loss / old_loss < ratio_tol

  • maxiter – float. Maximum number of iterations for optimization

  • eps – float. Number used for numerical stability

  • random_state – float. Random seed for reproducibility

Example

>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>>from autoprognosis.plugins.explainers import Explainers
>>> from autoprognosis.plugins.prediction.classifiers import Classifiers
>>>
>>> X, y = load_iris(return_X_y=True)
>>>
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>>
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> model = Classifiers().get("logistic_regression")
>>>
>>> explainer = Explainers().get(
>>>     "symbolic_pursuit",
>>>     model,
>>>     X_train,
>>>     y_train,
>>>     task_type="classification",
>>> )
>>>
>>> explainer.explain(X_test)
explain(X: pandas.core.frame.DataFrame) numpy.ndarray
static name() str
plot(X: pandas.core.frame.DataFrame) tuple
static pretty_name() str
static type() str
plugin

alias of autoprognosis.plugins.explainers.plugin_symbolic_pursuit.SymbolicPursuitPlugin

Benchmarks

autoprognosis.utils.tester module

class classifier_metrics(metric: Union[str, list] = ['aucroc', 'aucprc', 'accuracy', 'f1_score_micro', 'f1_score_macro', 'f1_score_weighted', 'kappa', 'kappa_quadratic', 'precision_micro', 'precision_macro', 'precision_weighted', 'recall_micro', 'recall_macro', 'recall_weighted', 'mcc'])

Bases: object

Helper class for evaluating the performance of the classifier.

Parameters

metric

list, default=[“aucroc”, “aucprc”, “accuracy”, “f1_score_micro”, “f1_score_macro”, “f1_score_weighted”, “kappa”, “precision_micro”, “precision_macro”, “precision_weighted”, “recall_micro”, “recall_macro”, “recall_weighted”, “mcc”,] The type of metric to use for evaluation. Potential values:

  • ”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

  • ”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

  • ”accuracy” : Accuracy classification score.

  • ”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.

  • ”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

  • ”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).

  • ”kappa”, “kappa_quadratic”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.

  • ”precision_micro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.

  • ”precision_macro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(macro) calculates metrics for each label, and finds their unweighted mean.

  • ”precision_weighted”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(weighted) calculates metrics for each label, and find their average weighted by support.

  • ”recall_micro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.

  • ”recall_macro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(macro) calculates metrics for each label, and finds their unweighted mean.

  • ”recall_weighted”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(weighted) calculates metrics for each label, and find their average weighted by support.

  • ”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

average_precision_score(y_test: numpy.ndarray, y_pred_proba: numpy.ndarray) float
get_metric() Union[str, list]
roc_auc_score(y_test: numpy.ndarray, y_pred_proba: numpy.ndarray) float
score_proba(y_test: numpy.ndarray, y_pred_proba: numpy.ndarray) Dict[str, float]
evaluate_estimator(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, seed: int = 0, pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None, *args: Any, **kwargs: Any) Dict

Helper for evaluating classifiers.

Parameters
  • estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.

  • X – pd.DataFrame or np.ndarray: The covariates

  • Y – pd.Series or np.ndarray or list: The labels

  • n_folds – int cross-validation folds

  • seed – int Random seed

  • pretrained – bool If the estimator was already trained or not.

  • group_ids – pd.Series The group_ids to use for stratified cross-validation

Returns

Dict containing “raw” and “str” nodes. The “str” node contains prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. Both “raw” and “str” nodes contain the following metrics:

  • ”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

  • ”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

  • ”accuracy” : Accuracy classification score.

  • ”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.

  • ”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

  • ”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).

  • ”kappa”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.

  • ”precision_micro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.

  • ”precision_macro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(macro) calculates metrics for each label, and finds their unweighted mean.

  • ”precision_weighted”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(weighted) calculates metrics for each label, and find their average weighted by support.

  • ”recall_micro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.

  • ”recall_macro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(macro) calculates metrics for each label, and finds their unweighted mean.

  • ”recall_weighted”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(weighted) calculates metrics for each label, and find their average weighted by support.

  • ”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

evaluate_estimator_multiple_seeds(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, seeds: List[int] = [0, 1, 2], pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None) Dict

Helper for evaluating classifiers with multiple seeds.

Parameters
  • estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.

  • X – pd.DataFrame or np.ndarray: The covariates

  • Y – pd.Series or np.ndarray or list: The labels

  • n_folds – int cross-validation folds

  • seeds – List Random seeds

  • pretrained – bool If the estimator was already trained or not.

  • group_ids – pd.Series The group_ids to use for stratified cross-validation

Returns

Dict containing “seeds”, “agg” and “str” nodes. The “str” node contains the aggregated prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “seeds” node contains the results for each random seed. Both “agg” and “str” nodes contain the following metrics:

  • ”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

  • ”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.

  • ”accuracy” : Accuracy classification score.

  • ”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.

  • ”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

  • ”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).

  • ”kappa”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.

  • ”precision_micro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.

  • ”precision_macro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(macro) calculates metrics for each label, and finds their unweighted mean.

  • ”precision_weighted”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(weighted) calculates metrics for each label, and find their average weighted by support.

  • ”recall_micro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.

  • ”recall_macro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(macro) calculates metrics for each label, and finds their unweighted mean.

  • ”recall_weighted”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(weighted) calculates metrics for each label, and find their average weighted by support.

  • ”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.

evaluate_regression(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, seed: int = 0, pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None, *args: Any, **kwargs: Any) Dict

Helper for evaluating regression tasks.

Parameters
  • estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.

  • X – pd.DataFrame or np.ndarray covariates

  • Y – pd.Series or np.ndarray or list outcomes

  • n_folds – int Number of cross-validation folds

  • seed – int Random seed

  • group_ids – pd.Series Optional group_ids for stratified cross-validation

Returns

Dict containing “raw” and “str” nodes. The “str” node contains prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. Both “raw” and “str” nodes contain the following metrics:

  • ”r2”: R^2(coefficient of determination) regression score function.

  • ”mse”: Mean squared error regression loss.

  • ”mae”: Mean absolute error regression loss.

evaluate_regression_multiple_seeds(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None, seeds: List[int] = [0, 1, 2]) Dict

Helper for evaluating regression tasks with multiple seeds.

Parameters
  • estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.

  • X – pd.DataFrame or np.ndarray covariates

  • Y – pd.Series or np.ndarray or list outcomes

  • n_folds – int Number of cross-validation folds

  • seeds – list Random seeds

  • group_ids – pd.Series Optional group_ids for stratified cross-validation

Returns

Dict containing “seeds”, “agg” and “str” nodes. The “str” node contains the aggregated prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “seeds” node contains the results for each random seed. Both “agg” and “str” nodes contain the following metrics:

  • ”r2”: R^2(coefficient of determination) regression score function.

  • ”mse”: Mean squared error regression loss.

  • ”mae”: Mean absolute error regression loss.

evaluate_survival_estimator(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], T: Union[pandas.core.series.Series, numpy.ndarray, List], Y: Union[pandas.core.series.Series, numpy.ndarray, List], time_horizons: Union[List[float], numpy.ndarray], n_folds: int = 3, seed: int = 0, pretrained: bool = False, risk_threshold: float = 0.5, group_ids: Optional[pandas.core.series.Series] = None) Dict

Helper for evaluating survival analysis tasks.

Parameters
  • estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.

  • X – DataFrame or np.ndarray The covariates

  • T – Series or np.ndarray or list time to event/censoring values

  • Y – Series or np.ndarray or list event or censored

  • time_horizons – list or np.ndarray Horizons where to evaluate the performance.

  • n_folds – int Number of folds for cross validation

  • seed – int Random seed

  • pretrained – bool If the estimator was trained or not

  • group_ids – Group labels for the samples used while splitting the dataset into train/test set.

Returns

Dict containing “raw”, “str” and “horizons” nodes. The “str” node contains prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “horizons” node splits the metrics by horizon. Each nodes contain the following metrics:

  • ”c_index” : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.

  • ”brier_score”: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.

  • ”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

  • ”sensitivity”: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.

  • ”specificity”: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.

  • ”PPV”: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.

  • ”NPV”: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.

evaluate_survival_estimator_multiple_seeds(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], T: Union[pandas.core.series.Series, numpy.ndarray, List], Y: Union[pandas.core.series.Series, numpy.ndarray, List], time_horizons: Union[List[float], numpy.ndarray], n_folds: int = 3, pretrained: bool = False, risk_threshold: float = 0.5, group_ids: Optional[pandas.core.series.Series] = None, seeds: List[int] = [0, 1, 2]) Dict

Helper for evaluating survival analysis tasks with multiple random seeds.

Parameters
  • estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.

  • X – DataFrame or np.ndarray The covariates

  • T – Series or np.ndarray or list time to event

  • Y – Series or np.ndarray or list event or censored

  • time_horizons – list or np.ndarray Horizons where to evaluate the performance.

  • n_folds – int Number of folds for cross validation

  • seeds – List Random seeds

  • pretrained – bool If the estimator was trained or not

  • group_ids – Group labels for the samples used while splitting the dataset into train/test set.

Returns

Dict containing “seeds”, “agg” and “str” nodes. The “str” node contains the aggregated prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “seeds” node contains the results for each random seed. Both “agg” and “str” nodes contain the following metrics:

  • ”c_index” : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.

  • ”brier_score”: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.

  • ”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

  • ”sensitivity”: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.

  • ”specificity”: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.

  • ”PPV”: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.

  • ”NPV”: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.

score_classification_model(estimator: Any, X_train: pandas.core.frame.DataFrame, X_test: pandas.core.frame.DataFrame, y_train: pandas.core.frame.DataFrame, y_test: pandas.core.frame.DataFrame) float