AutoPrognosis documentation!
AutoPrognosis - A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

🔑 Features
🚀 Automatically learns ensembles of pipelines for classification, regression or survival analysis tasks.
🌀 Easy to extend pluginable architecture.
🔥 Interpretability and uncertainty quantification tools.
🩹 Data imputation using HyperImpute.
⚡ Build demonstrators using Streamlit.
📓 Python and R tutorials available.
🚀 Installation
Using pip
The library can be installed from PyPI using
$ pip install autoprognosis
or from source, using
$ pip install .
Redis (Optional, but recommended)
AutoPrognosis can use Redis as a backend to improve the performance and quality of the searches.
For that, install the redis-server package following the steps described on the official site.
Environment variables
The library can be configured from a set of environment variables.
Variable |
Description |
---|---|
|
Number of cores to use for hyperparameter search. Default : 1 |
|
Number of cores to use by inidividual learners. Default: all cpus |
|
IP address for the Redis database. Default 127.0.0.1 |
|
Redis port. Default: 6379 |
Example: export N_OPT_JOBS = 2
to use 2 cores for hyperparam search.
💥 Sample Usage
Advanced Python tutorials can be found in the Python tutorials section.
R examples can be found in the R tutorials section.
List the available classifiers
from autoprognosis.plugins.prediction.classifiers import Classifiers
print(Classifiers().list_available())
Create a study for classifiers
from sklearn.datasets import load_breast_cancer
from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator
X, Y = load_breast_cancer(return_X_y=True, as_frame=True)
df = X.copy()
df["target"] = Y
study_name = "example"
study = ClassifierStudy(
study_name=study_name,
dataset=df, # pandas DataFrame
target="target", # the label column in the dataset
)
model = study.fit()
# Predict the probabilities of each class using the model
model.predict_proba(X)
(Advanced) Customize the study for classifiers
from pathlib import Path
from sklearn.datasets import load_breast_cancer
from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator
X, Y = load_breast_cancer(return_X_y=True, as_frame=True)
df = X.copy()
df["target"] = Y
workspace = Path("workspace")
study_name = "example"
study = ClassifierStudy(
study_name=study_name,
dataset=df, # pandas DataFrame
target="target", # the label column in the dataset
num_iter=100, # how many trials to do for each candidate
timeout=60, # seconds
classifiers=["logistic_regression", "lda", "qda"],
workspace=workspace,
)
study.run()
output = workspace / study_name / "model.p"
model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_estimator(model, X, Y)
print(f"model {model.name()} -> {metrics['clf']}")
# Train the model
model.fit(X, Y)
# Predict the probabilities of each class using the model
model.predict_proba(X)
List the available regressors
from autoprognosis.plugins.prediction.regression import Regression
print(Regression().list_available())
Create a Regression study
# third party
import pandas as pd
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
from autoprognosis.studies.regression import RegressionStudy
# Load dataset
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
header=None,
sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])
df = X.copy()
df["target"] = y
# Search the model
study_name="regression_example"
study = RegressionStudy(
study_name=study_name,
dataset=df, # pandas DataFrame
target="target", # the label column in the dataset
)
model = study.fit()
# Predict using the model
model.predict(X)
Advanced Customize the Regression study
# stdlib
from pathlib import Path
# third party
import pandas as pd
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
from autoprognosis.studies.regression import RegressionStudy
# Load dataset
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
header=None,
sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])
df = X.copy()
df["target"] = y
# Search the model
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name="regression_example"
study = RegressionStudy(
study_name=study_name,
dataset=df, # pandas DataFrame
target="target", # the label column in the dataset
num_iter=10, # how many trials to do for each candidate. Default: 50
num_study_iter=2, # how many outer iterations to do. Default: 5
timeout=50, # timeout for optimization for each classfier. Default: 600 seconds
regressors=["linear_regression", "xgboost_regressor"],
workspace=workspace,
)
study.run()
# Test the model
output = workspace / study_name / "model.p"
model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_regression(model, X, y)
print(f"Model {model.name()} score: {metrics['str']}")
# Train the model
model.fit(X, y)
# Predict using the model
model.predict(X)
List available survival analysis estimators
from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation
print(RiskEstimation().list_available())
Create a Survival analysis study
# third party
import numpy as np
from pycox import datasets
# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator
df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]
X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]
eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]
study_name = "example_risks"
study = RiskEstimationStudy(
study_name=study_name,
dataset=df,
target="event",
time_to_event="duration",
time_horizons=eval_time_horizons,
)
model = study.fit()
# Predict using the model
model.predict(X, eval_time_horizons)
Advanced Customize the Survival analysis study
# stdlib
import os
from pathlib import Path
# third party
import numpy as np
from pycox import datasets
# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator
df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]
X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]
eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]
workspace = Path("workspace")
study_name = "example_risks"
study = RiskEstimationStudy(
study_name=study_name,
dataset=df,
target="event",
time_to_event="duration",
time_horizons=eval_time_horizons,
num_iter=10,
num_study_iter=1,
timeout=10,
risk_estimators=["cox_ph", "survival_xgboost"],
score_threshold=0.5,
workspace=workspace,
)
study.run()
output = workspace / study_name / "model.p"
model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)
print(f"Model {model.name()} score: {metrics['clf']}")
# Train the model
model.fit(X, T, Y)
# Predict using the model
model.predict(X, eval_time_horizons)
⚡ Plugins
from autoprognosis.plugins.imputers import Imputers
imputer = Imputers().get(<NAME>)
from autoprognosis.plugins.preprocessors import Preprocessors
preprocessor = Preprocessors().get(<NAME>)
from autoprognosis.plugins.prediction.classifiers import Classifiers
classifier = Classifiers().get(<NAME>)
Name |
Description |
---|---|
neural_nets |
PyTorch based neural net classifier. |
logistic_regression |
``LogisticRegression` <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_ |
catboost |
Gradient boosting on decision trees - ``CatBoost` <https://catboost.ai/>`_ |
random_forest |
A random forest classifier. ``RandomForestClassifier` <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html>`_ |
tabnet |
``TabNet : Attentive Interpretable Tabular Learning` <https://github.com/dreamquark-ai/tabnet>`_ |
xgboost |
``XGBoostClassifier` <https://xgboost.readthedocs.io/en/stable/>`_ |
from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation
predictor = RiskEstimation().get(<NAME>)
Name |
Description |
---|---|
survival_xgboost |
``XGBoost Survival Embeddings` <https://github.com/loft-br/xgboost-survival-embeddings>`_ |
loglogistic_aft |
``Log-Logistic AFT model` <https://lifelines.readthedocs.io/en/latest/fitters/regression/LogLogisticAFTFitter.html>`_ |
deephit |
``DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks` <https://github.com/chl8856/DeepHit>`_ |
cox_ph |
``Cox’s proportional hazard model` <https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html>`_ |
weibull_aft |
``Weibull AFT model.` <https://lifelines.readthedocs.io/en/latest/fitters/regression/WeibullAFTFitter.html>`_ |
lognormal_aft |
``Log-Normal AFT model` <https://lifelines.readthedocs.io/en/latest/fitters/regression/LogNormalAFTFitter.html>`_ |
coxnet |
``CoxNet is a Cox proportional hazards model also referred to as DeepSurv` <https://github.com/havakv/pycox>`_ |
from autoprognosis.plugins.prediction.regression import Regression
regressor = Regression().get(<NAME>)
Name |
Description |
---|---|
tabnet_regressor |
``TabNet : Attentive Interpretable Tabular Learning` <https://github.com/dreamquark-ai/tabnet>`_ |
catboost_regressor |
Gradient boosting on decision trees - ``CatBoost` <https://catboost.ai/>`_ |
random_forest_regressor |
``RandomForestRegressor` <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html>`_ |
xgboost_regressor |
``XGBoostClassifier` <https://xgboost.readthedocs.io/en/stable/>`_ |
neural_nets_regression |
PyTorch-based neural net regressor. |
linear_regression |
``LinearRegression` <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html>`_ |
from autoprognosis.plugins.explainers import Explainers
explainer = Explainers().get(<NAME>)
Name |
Description |
---|---|
risk_effect_size |
Feature importance using Cohen’s distance between probabilities |
lime |
``Lime: Explaining the predictions of any machine learning classifier` <https://github.com/marcotcr/lime>`_ |
symbolic_pursuit |
``Symbolic Pursuit` <Learning outside the black-box: at the pursuit of interpretable models>`_ |
shap_permutation_sampler |
``SHAP Permutation Sampler` <https://shap.readthedocs.io/en/latest/generated/shap.explainers.Permutation.html>`_ |
kernel_shap |
``SHAP KernelExplainer` <https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html>`_ |
invase |
``INVASE: Instance-wise Variable Selection` <https://github.com/vanderschaarlab/invase>`_ |
from autoprognosis.plugins.uncertainty import UncertaintyQuantification
model = UncertaintyQuantification().get(<NAME>)
🔨 Test
After installing the library, the tests can be executed using pytest
$ pip install .[testing]
$ pytest -vxs -m "not slow"
Citing
If you use this code, please cite the associated paper:
@misc{https://doi.org/10.48550/arxiv.2210.12090,
doi = {10.48550/ARXIV.2210.12090},
url = {https://arxiv.org/abs/2210.12090},
author = {Imrie, Fergus and Cebere, Bogdan and McKinney, Eoin F. and van der Schaar, Mihaela},
keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
References
Examples
Tutorials
AutoPrognosis classification
Welcome! This tutorial will walk you through the steps of selecting a model for a classification task using AutoPrognosis.
Setup
[ ]:
# stdlib
import json
import warnings
# third party
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
warnings.filterwarnings("ignore")
Import ClassifierStudy
ClassifierStudy is the engine that learns an ensemble of pipelines and their hyperparameters automatically.
[ ]:
# autoprognosis absolute
from autoprognosis.studies.classifiers import ClassifierStudy
Load the target dataset
AutoPrognosis expects pandas.DataFrames as input.
For this example, we will use the Breast Cancer Wisconsin Dataset.
[ ]:
# stdlib
from pathlib import Path
X, Y = load_breast_cancer(return_X_y=True, as_frame=True)
df = X.copy()
df["target"] = Y
Create the classifier
While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.
You can see the supported plugins below:
[ ]:
# List the available plugins
# autoprognosis absolute
from autoprognosis.plugins import Plugins
print(json.dumps(Plugins().list_available(), indent=2))
We will set a few custom plugins for the pipelines and create the classifier study.
[ ]:
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name = "classification_example"
study = ClassifierStudy(
study_name=study_name,
dataset=df, # pandas DataFrame
target="target", # the label column in the dataset
num_iter=2, # DELETE THIS LINE FOR BETTER RESULTS. how many trials to do for each candidate. Default: 50
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS. how many outer iterations to do. Default: 5
classifiers=[
"logistic_regression",
"lda",
"qda",
], # DELETE THIS LINE FOR BETTER RESULTS.
workspace=workspace,
)
Search for the optimal ensemble
[ ]:
study.run()
[ ]:
# stdlib
import pprint
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator
output = workspace / study_name / "model.p"
model = load_model_from_file(output)
metrics = evaluate_estimator(model, X, Y)
print(f"Model {model.name()} ")
print("Score: ")
pprint.pprint(metrics)
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file
out = workspace / "tmp.bkp"
# Fit the model
model.fit(X, Y)
# Save
save_to_file(out, model)
# Reload
loaded_model = load_from_file(out)
print(loaded_model.name())
assert loaded_model.name() == model.name()
out.unlink()
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
Tutorial: Classification AutoML with imputation
Welcome to the classification AutoML tutorial!
This tutorial will show how to use AutoPrognosis to learn a model for datasets with missing data. We show how to use a predefined imputer or how to use AutoPrognosis to select the optimal imputer.
[ ]:
# stdlib
import json
import sys
import warnings
# third party
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
warnings.filterwarnings("ignore")
# autoprognosis absolute
import autoprognosis.logger as log
from autoprognosis.studies.classifiers import ClassifierStudy
[ ]:
log.add(sink=sys.stderr, level="INFO")
Load toy dataset
[ ]:
# stdlib
from pathlib import Path
def get_dataset() -> pd.DataFrame:
Path("data").mkdir(parents=True, exist_ok=True)
bkp_file = Path("data") / "anneal.csv"
if bkp_file.exists():
return pd.read_csv(bkp_file)
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data",
header=None,
)
df.to_csv(bkp_file, index=None)
return df
df = get_dataset()
df = df.replace("?", np.nan)
X = df.drop(columns=[df.columns[-1]])
y = df[df.columns[-1]]
X
[ ]:
dataset = X.copy()
dataset["target"] = y
[ ]:
for col in X.columns:
if X[col].isna().sum() == 0:
continue
col_type = "categorical" if len(X[col].unique()) < 10 else "cont"
print(
f"NaNs ratio in col = {col} col_type = {col_type} miss ratio = {X[col].isna().sum() / len(X[col])}"
)
[ ]:
[ ]:
# List available classifiers
# autoprognosis absolute
from autoprognosis.plugins.prediction import Classifiers
Classifiers().list_available()
Option 1: Predefined imputer
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name = "test_classification_studies"
study = ClassifierStudy(
study_name=study_name,
dataset=dataset,
target="target",
num_iter=10, # DELETE THIS LINE FOR BETTER RESULTS.
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS.
imputers=["mean"],
classifiers=["logistic_regression", "lda"], # DELETE THIS LINE FOR BETTER RESULTS.
workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator
model_path = workspace / study_name / "model.p"
model = load_model_from_file(model_path)
evaluate_estimator(model, X, y)
[ ]:
model.name()
Option 2: Let the optimizer find the optimal imputer
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
study_name = "test_classification_studies_v2"
study = ClassifierStudy(
study_name=study_name,
dataset=dataset,
target="target",
num_iter=10, # DELETE THIS LINE FOR BETTER RESULTS.
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS.
classifiers=[
"logistic_regression",
"lda",
"xgboost",
], # DELETE THIS LINE FOR BETTER RESULTS.
workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator
model_path = workspace / study_name / "model.p"
model = load_model_from_file(model_path)
evaluate_estimator(model, X, y)
[ ]:
model.name()
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file
out = workspace / "tmp.bkp"
# Fit the model
model.fit(X, y)
# Save
save_to_file(out, model)
# Reload
loaded_model = load_from_file(out)
print(loaded_model.name())
assert loaded_model.name() == model.name()
out.unlink()
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
AutoPrognosis - Tutorial on using classifiers with explainers
[ ]:
# Install AutoPrognosis
!pip install autoprognosis
[ ]:
# stdlib
import json
import sys
import warnings
# third party
import numpy as np
import pandas as pd
warnings.filterwarnings("ignore")
# autoprognosis absolute
# autoprognosis
import autoprognosis.logger as log
from autoprognosis.studies.classifiers import ClassifierStudy
log.add(sink=sys.stderr, level="INFO")
Load dataset
AutoPrognosis expects pandas.DataFrames as input.
For this example, we will use the Breast Cancer Wisconsin Dataset.
[ ]:
# third party
# Load dataset
from sklearn.datasets import load_breast_cancer
X, Y = load_breast_cancer(return_X_y=True, as_frame=True)
X
Run a study with AutoPrognosis
[ ]:
dataset = X.copy()
dataset["target"] = Y
[ ]:
# List available classifiers
# autoprognosis absolute
from autoprognosis.plugins.prediction import Classifiers
Classifiers().list_available()
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
study_name = "test_classification_studies"
study = ClassifierStudy(
study_name=study_name,
dataset=dataset,
target="target",
num_iter=100, # DELETE THIS LINE FOR BETTER RESULTS.
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS.
imputers=[], # Dataset is complete, so imputation not necessary
classifiers=[
"logistic_regression",
"perceptron",
"xgboost",
"decision_trees",
], # DELETE THIS LINE FOR BETTER RESULTS.
feature_scaling=[],
score_threshold=0.4,
workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator
model_path = workspace / study_name / "model.p"
model = load_model_from_file(model_path)
[ ]:
model.name()
[ ]:
evaluate_estimator(model, X, Y)
Interpretability
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.explainers import Explainers
[ ]:
# Explain using Kernel SHAP
explainer = Explainers().get(
"kernel_shap",
model,
X,
Y,
feature_names=X.columns,
task_type="classification",
)
explainer.plot(X.sample(frac=0.1))
[ ]:
# Explain using Risk Effect Size
explainer = Explainers().get(
"risk_effect_size",
model,
X,
Y,
task_type="classification",
)
explainer.plot(X)
Value of information
[ ]:
def evaluate_for_effect_size(effect_size):
exp = Explainers().get(
"risk_effect_size",
model,
X,
Y,
task_type="classification",
effect_size=effect_size,
)
important_features = exp.explain(X, effect_size).index.tolist()
return important_features
def evaluate_using_important_feature(effect_size):
filtered_model = load_model_from_file(model_path)
important_features = evaluate_for_effect_size(effect_size)
X_filtered = X[important_features]
metrics = evaluate_estimator(
filtered_model,
X_filtered,
Y,
)
print("\033[1mEvaluation for effect size \033[0m", effect_size)
print(
" >>> \033[1mSelected features for effect size\033[0m ", important_features
)
print(" >>> \033[1mSelected features count\033[0m ", len(important_features))
print(" >>> \033[1mEvaluation:\033[0m ")
print(f" >>>> score = {metrics['str']}")
print("========================================")
[ ]:
# Evaluate performance for difference feature subsets defined by effect size
for effect_size in [0.5, 1.0, 1.5, 2.0]:
evaluate_using_important_feature(effect_size)
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to learn more about machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
Check out our website and paper for AutoPrognosis
Learn more about our lab and other work
AutoPrognosis survival analysis
Welcome! This tutorial will walk you through the steps of selecting a model for a survival analysis task using AutoPrognosis.
Setup
[ ]:
# stdlib
import json
import warnings
# third party
from lifelines.datasets import load_rossi
import pandas as pd
from sklearn.model_selection import train_test_split
warnings.filterwarnings("ignore")
Import RiskEstimationStudy
RiskEstimationStudy is the engine that learns an ensemble of survival analysis pipelines and their hyperparameters automatically.
[ ]:
# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
Load the target dataset
AutoPrognosis expects pandas.DataFrames as input.
For this example, we will use the Rossi dataset.
[ ]:
# third party
from lifelines.datasets import load_rossi
rossi = load_rossi()
X = rossi.drop(["week", "arrest"], axis=1)
Y = rossi["arrest"]
T = rossi["week"]
eval_time_horizons = [
int(T[Y.iloc[:] == 1].quantile(0.25)),
int(T[Y.iloc[:] == 1].quantile(0.50)),
int(T[Y.iloc[:] == 1].quantile(0.75)),
]
Create the risk estimation study
While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.
You can see the supported plugins below:
[ ]:
# stdlib
# List the available plugins
import json
from pathlib import Path
# autoprognosis absolute
from autoprognosis.plugins import Plugins
print(json.dumps(Plugins().list_available(), indent=2))
We will set a few custom plugins for the pipelines and create the classifier study.
[ ]:
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name = "test_risk_estimation_studies"
study = RiskEstimationStudy(
study_name=study_name,
dataset=rossi,
target="arrest",
time_to_event="week",
time_horizons=eval_time_horizons,
num_iter=10, # DELETE THIS LINE FOR BETTER RESULTS. number of BO iterations per estimator. Default: 50
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS. number of outer optimization iterations. Default: 5
risk_estimators=[
"cox_ph",
"lognormal_aft",
"loglogistic_aft",
], # DELETE THIS LINE FOR BETTER RESULTS.
workspace=workspace,
score_threshold=0.4,
)
Search for the best ensemble
[ ]:
study.run()
[ ]:
# stdlib
import pprint
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator
output = workspace / study_name / "model.p"
model = load_model_from_file(output)
metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)
print(f"Model {model.name()}")
print(f"Score: ")
pprint.pprint(metrics)
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file
out = workspace / "tmp.bkp"
# Fit the model
model.fit(X, T, Y)
# Save
save_to_file(out, model)
# Reload
loaded_model = load_from_file(out)
print(loaded_model.name())
assert loaded_model.name() == model.name()
out.unlink()
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
Tutorial: Survival Analysis AutoML with imputation
Welcome to the Survival analysis AutoML tutorial!
This tutorial will show how to use AutoPrognosis to learn a model for datasets with missing data. We show how to use a predefined imputer or how to use AutoPrognosis to select the optimal imputer.
[ ]:
# stdlib
import sys
import warnings
# third party
import numpy as np
import pandas as pd
warnings.filterwarnings("ignore")
# autoprognosis absolute
import autoprognosis.logger as log
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
[ ]:
log.add(sink=sys.stderr, level="INFO")
Load dataset
[ ]:
# third party
from pycox import datasets
df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]
X = df.drop(columns=["duration", "event"])
T = df["duration"]
Y = df["event"]
eval_time_horizons = [
int(T[Y.iloc[:] == 1].quantile(0.50)),
]
[ ]:
# stdlib
import random
total_len = len(X)
for col in ["x3", "x4"]:
indices = random.sample(range(0, total_len), 10)
X.loc[indices, col] = np.nan
X.isnull().any()
[ ]:
dataset = X.copy()
dataset["target"] = Y
dataset["time_to_event"] = T
Option 1: Predefined imputer
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
study_name = "test_risk_estimation_studies"
study = RiskEstimationStudy(
study_name=study_name,
dataset=dataset,
target="target",
time_to_event="time_to_event",
time_horizons=eval_time_horizons,
num_iter=2, # DELETE THIS LINE FOR BETTER RESULTS.
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS.
risk_estimators=[
"cox_ph",
"lognormal_aft",
"survival_xgboost",
], # DELETE THIS LINE FOR BETTER RESULTS.
imputers=["mean"],
feature_scaling=["minmax_scaler", "nop"], # DELETE THIS LINE FOR BETTER RESULTS.
score_threshold=0.4,
workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator
model_path = workspace / study_name / "model.p"
model = load_model_from_file(model_path)
X_imp = Imputers().get("mean").fit_transform(X)
evaluate_survival_estimator(model, X_imp, T, Y, eval_time_horizons)
Option 2: Let the optimizer find the best imputer
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name = "test_risk_estimation_studies_v2"
study = RiskEstimationStudy(
study_name=study_name,
dataset=dataset,
target="target",
time_to_event="time_to_event",
time_horizons=eval_time_horizons,
num_iter=2, # DELETE THIS LINE FOR BETTER RESULTS.
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS.
risk_estimators=[
"cox_ph",
"lognormal_aft",
"survival_xgboost",
], # DELETE THIS LINE FOR BETTER RESULTS.
imputers=["mean", "ice", "median"], # DELETE THIS LINE FOR BETTER RESULTS.
feature_scaling=["minmax_scaler", "nop"], # DELETE THIS LINE FOR BETTER RESULTS.
score_threshold=0.4,
workspace=workspace,
)
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator
model_path = workspace / study_name / "model.p"
model = load_model_from_file(model_path)
evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
AutoPrognosis regression
Welcome! This tutorial will walk you through the steps of selecting a model for a regression task using AutoPrognosis.
Setup
[ ]:
# stdlib
import json
import warnings
# third party
import pandas as pd
from sklearn.model_selection import train_test_split
warnings.filterwarnings("ignore")
Import RegressionStudy
RegressionStudy is the engine that learns an ensemble of regression pipelines and their hyperparameters automatically.
[ ]:
# autoprognosis absolute
from autoprognosis.studies.regression import RegressionStudy
Load the target dataset
AutoPrognosis expects pandas.DataFrames as input.
For this example, we will use the Airfoil Self-Noise Data Set.
[ ]:
# third party
import pandas as pd
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
header=None,
sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])
df = X.copy()
df["target"] = y
df
Create the regressor
While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.
You can see the supported plugins below:
[ ]:
# stdlib
# List the available plugins
import json
# autoprognosis absolute
from autoprognosis.plugins import Plugins
print(json.dumps(Plugins().list_available(), indent=2))
We will set a few custom plugins for the pipelines and create the classifier study.
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name = "regression_example"
study = RegressionStudy(
study_name=study_name,
dataset=df, # pandas DataFrame
target="target", # the label column in the dataset
num_iter=10, # DELETE THIS LINE FOR BETTER RESULTS. how many trials to do for each candidate. Default: 50
num_study_iter=2, # DELETE THIS LINE FOR BETTER RESULTS. how many outer iterations to do. Default: 5
regressors=[
"linear_regression",
"xgboost_regressor",
], # DELETE THIS LINE FOR BETTER RESULTS.
workspace=workspace,
)
Search for the optimal ensemble
[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
output = workspace / study_name / "model.p"
model = load_model_from_file(output)
metrics = evaluate_regression(model, X, y)
f"Model {model.name()} score: {metrics['raw']}"
Serialization
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file
out = workspace / "tmp.bkp"
# Fit the model
model.fit(X, y)
# Save
save_to_file(out, model)
# Reload
loaded_model = load_from_file(out)
print(loaded_model.name())
assert loaded_model.name() == model.name()
out.unlink()
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
Tutorial: Simulating multiple imputation(MICE) using AutoPrognosis
Welcome to the classification AutoML tutorial!
This tutorial will show how to use AutoPrognosis and multiple imputation to learn a model for datasets with missing data.
[ ]:
# stdlib
import json
import sys
import warnings
# third party
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
warnings.filterwarnings("ignore")
# autoprognosis absolute
import autoprognosis.logger as log
from autoprognosis.studies.classifiers import ClassifierStudy
[ ]:
log.add(sink=sys.stderr, level="INFO")
Load toy dataset
[ ]:
# stdlib
from pathlib import Path
def get_dataset() -> pd.DataFrame:
Path("data").mkdir(parents=True, exist_ok=True)
bkp_file = Path("data") / "anneal.csv"
if bkp_file.exists():
return pd.read_csv(bkp_file)
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data",
header=None,
)
df.to_csv(bkp_file, index=None)
return df
df = get_dataset()
df = df.replace("?", np.nan)
X = df.drop(columns=[df.columns[-1]])
y = df[df.columns[-1]]
X
[ ]:
dataset = X.copy()
dataset["target"] = y
[ ]:
for col in X.columns:
if X[col].isna().sum() == 0:
continue
col_type = "categorical" if len(X[col].unique()) < 10 else "cont"
print(
f"NaNs ratio in col = {col} col_type = {col_type} miss ratio = {X[col].isna().sum() / len(X[col])}"
)
[ ]:
[ ]:
# List available classifiers
# autoprognosis absolute
from autoprognosis.plugins.prediction import Classifiers
Classifiers().list_available()
Search model with the ICE imputer
[ ]:
# stdlib
from pathlib import Path
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)
study_name = "test_classification_studies_mice"
study = ClassifierStudy(
study_name=study_name,
dataset=dataset,
target="target",
imputers=[
"ice"
], # Using chained equations. Can use it for "missforest" or "hyperimpute" plugins as well.
num_iter=10, # DELETE THIS LINE FOR BETTER RESULTS.
num_study_iter=1, # DELETE THIS LINE FOR BETTER RESULTS.
classifiers=["logistic_regression", "lda"], # DELETE THIS LINE FOR BETTER RESULTS.
workspace=workspace,
)
study.run()
Train the model template using multiple random seeds
[ ]:
# autoprognosis absolute
from autoprognosis.plugins.imputers import Imputers
from autoprognosis.utils.serialization import load_model_from_file
model_path = workspace / study_name / "model.p"
model = load_model_from_file(model_path)
model.name()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.distributions import enable_reproducible_results
from autoprognosis.utils.tester import evaluate_estimator_multiple_seeds
score = evaluate_estimator_multiple_seeds(model, X, y, seeds=list(range(5)))
[ ]:
score
Congratulations!
Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!
Star AutoPrognosis on GitHub
The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.
AutoML studies
AutoML studies
autoprognosis.studies.classifiers module
- class ClassifierStudy(dataset: pandas.core.frame.DataFrame, target: str, num_iter: int = 20, num_study_iter: int = 5, num_ensemble_iter: int = 15, timeout: int = 360, metric: str = 'aucroc', study_name: Optional[str] = None, feature_scaling: List[str] = ['normal_transform', 'maxabs_scaler', 'feature_normalizer', 'minmax_scaler', 'nop', 'scaler', 'uniform_transform'], feature_selection: List[str] = ['nop', 'pca', 'fast_ica'], classifiers: List[str] = ['random_forest', 'xgboost', 'catboost', 'lgbm', 'logistic_regression'], imputers: List[str] = ['ice'], workspace: pathlib.Path = PosixPath('tmp'), hooks: autoprognosis.hooks.base.Hooks = <autoprognosis.hooks.default.DefaultHooks object>, score_threshold: float = 0.65, group_id: Optional[str] = None, nan_placeholder: Optional[Any] = None, random_state: int = 0, sample_for_search: bool = True, max_search_sample_size: int = 10000, ensemble_size: int = 3, n_folds_cv: int = 5)
Bases:
autoprognosis.studies._base.Study
Core logic for classification studies.
A study automatically handles imputation, preprocessing and model selection for a certain dataset. The output is an optimal model architecture, selected by the AutoML logic.
- Parameters
dataset – DataFrame. The dataset to analyze.
target – str. The target column in the dataset.
num_iter – int. Maximum Number of optimization trials. This is the limit of trials for each base estimator in the “classifiers” list, used in combination with the “timeout” parameter. For each estimator, the search will end after “num_iter” trials or “timeout” seconds.
num_study_iter – int. The number of study iterations. This is the limit for the outer optimization loop. After each outer loop, an intermediary model is cached and can be used by another process, while the outer loop continues to improve the result.
timeout – int. Maximum wait time(seconds) for each estimator hyperparameter search. This timeout will apply to each estimator in the “classifiers” list.
metric –
str. The metric to use for optimization. Available objective metrics:
”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
”accuracy” : Accuracy classification score.
”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.
”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
”kappa”, “kappa_quadratic”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
study_name – str. The name of the study, to be used in the caches.
feature_scaling –
list. Plugin search pool to use in the pipeline for scaling. Defaults to : [‘maxabs_scaler’, ‘scaler’, ‘feature_normalizer’, ‘normal_transform’, ‘uniform_transform’, ‘nop’, ‘minmax_scaler’] Available plugins, retrieved using Preprocessors(category=”feature_scaling”).list_available():
’maxabs_scaler’
’scaler’
’feature_normalizer’
’normal_transform’
’uniform_transform’
’nop’ # empty operation
’minmax_scaler’
feature_selection –
list. Plugin search pool to use in the pipeline for feature selection. Defaults [“nop”, “variance_threshold”, “pca”, “fast_ica”] Available plugins, retrieved using Preprocessors(category=”dimensionality_reduction”).list_available():
’feature_agglomeration’
’fast_ica’
’variance_threshold’
’gauss_projection’
’pca’
’nop’ # no operation
classifiers –
list. Plugin search pool to use in the pipeline for prediction. Defaults to [“random_forest”, “xgboost”, “logistic_regression”, “catboost”]. Available plugins, retrieved using Classifiers().list_available():
’adaboost’
’bernoulli_naive_bayes’
’neural_nets’
’linear_svm’
’qda’
’decision_trees’
’logistic_regression’
’hist_gradient_boosting’
’extra_tree_classifier’
’bagging’
’gradient_boosting’
’ridge_classifier’
’gaussian_process’
’perceptron’
’lgbm’
’catboost’
’random_forest’
’tabnet’
’multinomial_naive_bayes’
’lda’
’gaussian_naive_bayes’
’knn’
’xgboost’
imputers –
list. Plugin search pool to use in the pipeline for imputation. Defaults to [“mean”, “ice”, “missforest”, “hyperimpute”]. Available plugins, retrieved using Imputers().list_available():
’sinkhorn’
’EM’
’mice’
’ice’
’hyperimpute’
’most_frequent’
’median’
’missforest’
’softimpute’
’nop’
’mean’
’gain’
hooks – Hooks. Custom callbacks to be notified about the search progress.
workspace – Path. Where to store the output model.
score_threshold – float. The minimum metric score for a candidate.
id – str. The id column in the dataset.
random_state – int Random seed
sample_for_search – bool Subsample the evaluation dataset in the search pipeline. Improves the speed of the search.
max_search_sample_size – int Subsample size for the evaluation dataset, if sample is True.
n_folds_cv – int. Number of cross-validation folds to use for study evaluation
ensemble_size – int Maximum number of models to include in the ensemble
Example
>>> from sklearn.datasets import load_breast_cancer >>> >>> from autoprognosis.studies.classifiers import ClassifierStudy >>> from autoprognosis.utils.serialization import load_model_from_file >>> from autoprognosis.utils.tester import evaluate_estimator >>> >>> X, Y = load_breast_cancer(return_X_y=True, as_frame=True) >>> >>> df = X.copy() >>> df["target"] = Y >>> >>> study_name = "example" >>> >>> study = ClassifierStudy( >>> study_name=study_name, >>> dataset=df, # pandas DataFrame >>> target="target", # the label column in the dataset >>> ) >>> model = study.fit() >>> >>> # Predict the probabilities of each class using the model >>> model.predict_proba(X)
- fit() Any
Run the study and train the model. The call returns the fitted model.
- run() Any
Run the study. The call returns the optimal model architecture - not fitted.
autoprognosis.studies.regression module
- class RegressionStudy(dataset: pandas.core.frame.DataFrame, target: str, num_iter: int = 20, num_study_iter: int = 5, num_ensemble_iter: int = 15, timeout: int = 360, metric: str = 'r2', study_name: Optional[str] = None, feature_scaling: List[str] = ['normal_transform', 'maxabs_scaler', 'feature_normalizer', 'minmax_scaler', 'nop', 'scaler', 'uniform_transform'], feature_selection: List[str] = ['nop', 'pca', 'fast_ica'], regressors: List[str] = ['random_forest_regressor', 'xgboost_regressor', 'linear_regression', 'catboost_regressor'], imputers: List[str] = ['ice'], workspace: pathlib.Path = PosixPath('tmp'), hooks: autoprognosis.hooks.base.Hooks = <autoprognosis.hooks.default.DefaultHooks object>, score_threshold: float = 0.65, nan_placeholder: Optional[Any] = None, group_id: Optional[str] = None, random_state: int = 0, sample_for_search: bool = True, max_search_sample_size: int = 10000, ensemble_size: int = 3, n_folds_cv: int = 5)
Bases:
autoprognosis.studies._base.Study
Core logic for regression studies.
A study automatically handles imputation, preprocessing and model selection for a certain dataset. The output is an optimal model architecture, selected by the AutoML logic.
- Parameters
dataset – DataFrame. The dataset to analyze.
target – str. The target column in the dataset.
num_iter – int. Maximum Number of optimization trials. This is the limit of trials for each base estimator in the “regressors” list, used in combination with the “timeout” parameter. For each estimator, the search will end after “num_iter” trials or “timeout” seconds.
num_study_iter – int. The number of study iterations. This is the limit for the outer optimization loop. After each outer loop, an intermediary model is cached and can be used by another process, while the outer loop continues to improve the result.
timeout – int. Maximum wait time(seconds) for each estimator hyperparameter search. This timeout will apply to each estimator in the “regressors” list.
metric –
str. The metric to use for optimization. Available metric:
”r2”
study_name – str. The name of the study, to be used in the caches.
feature_scaling –
list. Plugin search pool to use in the pipeline for scaling. Defaults to : [‘maxabs_scaler’, ‘scaler’, ‘feature_normalizer’, ‘normal_transform’, ‘uniform_transform’, ‘nop’, ‘minmax_scaler’] Available plugins, retrieved using Preprocessors(category=”feature_scaling”).list_available():
’maxabs_scaler’
’scaler’
’feature_normalizer’
’normal_transform’
’uniform_transform’
’nop’ # empty operation
’minmax_scaler’
feature_selection –
list. Plugin search pool to use in the pipeline for feature selection. Defaults [“nop”, “variance_threshold”, “pca”, “fast_ica”] Available plugins, retrieved using Preprocessors(category=”dimensionality_reduction”).list_available():
’feature_agglomeration’
’fast_ica’
’variance_threshold’
’gauss_projection’
’pca’
’nop’ # no operation
imputers –
list. Plugin search pool to use in the pipeline for imputation. Defaults to [“mean”, “ice”, “missforest”, “hyperimpute”]. Available plugins, retrieved using Imputers().list_available():
’sinkhorn’
’EM’
’mice’
’ice’
’hyperimpute’
’most_frequent’
’median’
’missforest’
’softimpute’
’nop’
’mean’
’gain’
regressors –
list. Plugin search pool to use in the pipeline for prediction. Defaults to [“random_forest_regressor”,”xgboost_regressor”, “linear_regression”, “catboost_regressor”] Available plugins, retrieved using Regression().list_available():
’kneighbors_regressor’
’bayesian_ridge’
’tabnet_regressor’
’catboost_regressor’
’random_forest_regressor’
’mlp_regressor’
’xgboost_regressor’
’neural_nets_regression’
’linear_regression’
hooks – Hooks. Custom callbacks to be notified about the search progress.
workspace – Path. Where to store the output model.
score_threshold – float. The minimum metric score for a candidate.
id – str. The id column in the dataset.
random_state – int Random seed
sample_for_search – bool Subsample the evaluation dataset in the search pipeline. Improves the speed of the search.
max_search_sample_size – int Subsample size for the evaluation dataset, if sample is True.
Example
>>> import pandas as pd >>> from autoprognosis.utils.serialization import load_model_from_file >>> from autoprognosis.utils.tester import evaluate_regression >>> from autoprognosis.studies.regression import RegressionStudy >>> >>> # Load dataset >>> df = pd.read_csv( >>> "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat", >>> header=None, >>> sep="\t", >>> ) >>> last_col = df.columns[-1] >>> y = df[last_col] >>> X = df.drop(columns=[last_col]) >>> >>> df = X.copy() >>> df["target"] = y >>> >>> # Search the model >>> >>> study_name="regression_example" >>> study = RegressionStudy( >>> study_name=study_name, >>> dataset=df, # pandas DataFrame >>> target="target", # the label column in the dataset >>> ) >>> model = study.fit() >>> >>> # Predict using the model >>> model.predict(X)
- fit() Any
Run the study and train the model. The call returns the fitted model.
- run() Any
Run the study. The call returns the optimal model architecture - not fitted.
autoprognosis.studies.risk_estimation module
- class RiskEstimationStudy(dataset: pandas.core.frame.DataFrame, target: str, time_to_event: str, time_horizons: List[int], num_iter: int = 20, num_study_iter: int = 5, num_ensemble_iter: int = 15, timeout: int = 360, study_name: Optional[str] = None, workspace: pathlib.Path = PosixPath('tmp'), risk_estimators: List[str] = ['survival_xgboost', 'loglogistic_aft', 'deephit', 'cox_ph', 'weibull_aft', 'lognormal_aft', 'coxnet'], imputers: List[str] = ['ice'], feature_scaling: List[str] = ['normal_transform', 'maxabs_scaler', 'feature_normalizer', 'minmax_scaler', 'nop', 'scaler', 'uniform_transform'], feature_selection: List[str] = ['nop', 'pca', 'fast_ica'], hooks: autoprognosis.hooks.base.Hooks = <autoprognosis.hooks.default.DefaultHooks object>, score_threshold: float = 0.65, nan_placeholder: Optional[Any] = None, group_id: Optional[str] = None, random_state: int = 0, sample_for_search: bool = True, max_search_sample_size: int = 10000, ensemble_size: int = 3, n_folds_cv: int = 5)
Bases:
autoprognosis.studies._base.Study
Core logic for risk estimation studies.
A study automatically handles imputation, preprocessing and model selection for a certain dataset. The output is an optimal model architecture, selected by the AutoML logic.
- Parameters
dataset – DataFrame. The dataset to analyze.
target – str. The target column in the dataset.
time_to_event – str. The time_to_event column in the dataset.
num_iter – int. Maximum Number of optimization trials. This is the limit of trials for each base estimator in the “risk_estimators” list, used in combination with the “timeout” parameter. For each estimator, the search will end after “num_iter” trials or “timeout” seconds.
num_study_iter – int. The number of study iterations. This is the limit for the outer optimization loop. After each outer loop, an intermediary model is cached and can be used by another process, while the outer loop continues to improve the result.
timeout – int. Maximum wait time(seconds) for each estimator hyperparameter search. This timeout will apply to each estimator in the “risk_estimators” list.
study_name – str. The name of the study, to be used in the caches.
feature_scaling –
list. Plugin search pool to use in the pipeline for scaling. Defaults to : [‘maxabs_scaler’, ‘scaler’, ‘feature_normalizer’, ‘normal_transform’, ‘uniform_transform’, ‘nop’, ‘minmax_scaler’] Available plugins, retrieved using Preprocessors(category=”feature_scaling”).list_available():
’maxabs_scaler’
’scaler’
’feature_normalizer’
’normal_transform’
’uniform_transform’
’nop’ # empty operation
’minmax_scaler’
feature_selection –
list. Plugin search pool to use in the pipeline for feature selection. Defaults [“nop”, “variance_threshold”, “pca”, “fast_ica”] Available plugins, retrieved using Preprocessors(category=”dimensionality_reduction”).list_available():
’feature_agglomeration’
’fast_ica’
’variance_threshold’
’gauss_projection’
’pca’
’nop’ # no operation
imputers –
list. Plugin search pool to use in the pipeline for imputation. Defaults to [“mean”, “ice”, “missforest”, “hyperimpute”]. Available plugins, retrieved using Imputers().list_available():
’sinkhorn’
’EM’
’mice’
’ice’
’hyperimpute’
’most_frequent’
’median’
’missforest’
’softimpute’
’nop’
’mean’
’gain’
risk_estimators –
list. Plugin search pool to use in the pipeline for risk estimation. Defaults to [“survival_xgboost”, “loglogistic_aft”, “deephit”, “cox_ph”, “weibull_aft”, “lognormal_aft”, “coxnet”] Available plugins:
’survival_xgboost’
’loglogistic_aft’
’deephit’
’cox_ph’
’weibull_aft’
’lognormal_aft’
’coxnet’
hooks – Hooks. Custom callbacks to be notified about the search progress.
workspace – Path. Where to store the output model.
score_threshold – float. The minimum metric score for a candidate.
random_state – int Random seed
sample_for_search – bool Subsample the evaluation dataset in the search pipeline. Improves the speed of the search.
max_search_sample_size – int Subsample size for the evaluation dataset, if sample is True.
Example
>>> import numpy as np >>> from pycox import datasets >>> from autoprognosis.studies.risk_estimation import RiskEstimationStudy >>> from autoprognosis.utils.serialization import load_model_from_file >>> from autoprognosis.utils.tester import evaluate_survival_estimator >>> >>> df = datasets.gbsg.read_df() >>> df = df[df["duration"] > 0] >>> >>> X = df.drop(columns = ["duration"]) >>> T = df["duration"] >>> Y = df["event"] >>> >>> eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1] >>> >>> study_name = "example_risks" >>> study = RiskEstimationStudy( >>> study_name=study_name, >>> dataset=df, >>> target="event", >>> time_to_event="duration", >>> time_horizons=eval_time_horizons, >>> ) >>> >>> model = study.fit() >>> # Predict using the model >>> model.predict(X, eval_time_horizons)
- fit() Any
Run the study and train the model. The call returns the fitted model.
- run() Any
Run the study. The call returns the optimal model architecture - not fitted.
Imputation plugins
Imputation plugins
autoprognosis.plugins.imputers.plugin_hyperimpute module
- class HyperImputePlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
“HyperImpute strategy, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters.
- Parameters
classifier_seed – list. List of ClassifierPlugin names for the search pool.
regression_seed – list. List of RegressionPlugin names for the search pool.
imputation_order – int. 0 - ascending, 1 - descending, 2 - random
baseline_imputer – int. 0 - mean, 1 - median, 2- most_frequent
optimizer – str. Hyperparam search strategy. Options: simple, hyperband, bayesian
class_threshold – int. Maximum number of unique items in a categorical column.
optimize_thresh – int. The number of subsamples used for the model search.
n_inner_iter – int. number of imputation iterations.
select_model_by_column – bool. If False, reuse the first model selected in the current iteration for all columns. Else, search the model for each column.
select_model_by_iteration – bool. If False, reuse the models selected in the first iteration. Otherwise, refresh the models on each iteration.
select_lazy – bool. If True, if there is a trend towards a certain model architecture, the loop reuses than for all columns, instead of calling the optimizer.
inner_loop_hook – Callable. Debug hook, called before each iteration.
random_state – int. random seed.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("hyperimpute") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
Reference: “HyperImpute: Generalized Iterative Imputation with Automatic Model Selection”
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.imputers.plugin_EM module
- class EMPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
The EM algorithm is an optimization algorithm that assumes a distribution for the partially missing data and tries to maximize the expected complete data log-likelihood under that distribution.
- Steps:
For an input dataset X with missing values, we assume that the values are sampled from distribution N(Mu, Sigma).
We generate the “observed” and “missing” masks from X, and choose some initial values for Mu = Mu0 and Sigma = Sigma0.
The EM loop tries to approximate the (Mu, Sigma) pair by some iterative means under the conditional distribution of missing components.
The E step finds the conditional expectation of the “missing” data, given the observed values and current estimates of the parameters. These expectations are then substituted for the “missing” data.
In the M step, maximum likelihood estimates of the parameters are computed as though the missing data had been filled in.
The X_reconstructed contains the approximation after each iteration.
- Args:
- maxit: int, default=500
maximum number of imputation rounds to perform.
- convergence_thresholdfloat, default=1e-08
Minimum ration difference between iterations before stopping.
- random_state: int
Random seed
Paper: “Maximum Likelihood from Incomplete Data via the EM Algorithm”, A. P. Dempster, N. M. Laird and D. B. Rubin
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("EM") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- plugin
autoprognosis.plugins.imputers.plugin_gain module
- class GainPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
- GAIN Imputation for static data using Generative Adversarial Nets.
- The training steps are:
The generato imputes the missing components conditioned on what is actually observed, and outputs a completed vector.
The discriminator takes a completed vector and attempts to determine which components were actually observed and which were imputed.
Args:
- batch_size: int
The batch size for the training steps.
- n_epochs: int
Number of epochs for training.
- hint_rate: float
Percentage of additional information for the discriminator.
- loss_alpha: int
Hyperparameter for the generator loss.
Paper: J. Yoon, J. Jordon, M. van der Schaar, “GAIN: Missing Data Imputation using Generative Adversarial Nets, ” ICML, 2018. Original code: https://github.com/jsyoon0823/GAIN
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("gain") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- plugin
alias of
autoprognosis.plugins.imputers.plugin_gain.GainPlugin
autoprognosis.plugins.imputers.plugin_ice module
- class IterativeChainedEquationsPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
Imputation plugin for completing missing values using the Multivariate Iterative chained equations Imputation strategy.
- Method:
Multivariate Iterative chained equations(MICE) methods model each feature with missing values as a function of other features in a round-robin fashion. For each step of the round-robin imputation, we use a BayesianRidge estimator, which does a regularized linear regression.
- Parameters
max_iter – int, default=500 maximum number of imputation rounds to perform.
random_state – int, default set to the current time. seed of the pseudo random number generator to use.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("ice") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.000000 1.000000 1.000000 1.000000 1 1.333333 1.666667 1.666667 1.333333 2 1.000000 2.000000 2.000000 1.000000 3 2.000000 2.000000 2.000000 2.000000
Reference: “mice: Multivariate Imputation by Chained Equations in R”, Stef van Buuren, Karin Groothuis-Oudshoorn
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.imputers.plugin_mice module
- class MicePlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
Imputation plugin for completing missing values using the Multivariate Iterative chained equations and multiple imputations.
- Method:
Multivariate Iterative chained equations(MICE) methods model each feature with missing values as a function of other features in a round-robin fashion. For each step of the round-robin imputation, we use a BayesianRidge estimator, which does a regularized linear regression. The class sklearn.impute.IterativeImputer is able to generate multiple imputations of the same incomplete dataset. We can then learn a regression or classification model on different imputations of the same dataset. Setting sample_posterior=True for the IterativeImputer will randomly draw values to fill each missing value from the Gaussian posterior of the predictions. If each IterativeImputer uses a different random_state, this results in multiple imputations, each of which can be used to train a predictive model. The final result is the average of all the n_imputation estimates.
- Parameters
n_imputations – int, default=5i number of multiple imputations to perform.
max_iter – int, default=500 maximum number of imputation rounds to perform.
random_state – int, default set to the current time. seed of the pseudo random number generator to use.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("mice") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.000000 1.00000 1.000000 1.000000 1 1.222412 1.68686 1.687483 1.221473 2 1.000000 2.00000 2.000000 1.000000 3 2.000000 2.00000 2.000000 2.000000
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- plugin
alias of
autoprognosis.plugins.imputers.plugin_mice.MicePlugin
autoprognosis.plugins.imputers.plugin_missforest module
- class MissForestPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
Imputation plugin for completing missing values using the MissForest strategy.
- Method:
Iterative chained equations(ICE) methods model each feature with missing values as a function of other features in a round-robin fashion. For each step of the round-robin imputation, we use a ExtraTreesRegressor, which fits a number of randomized extra-trees and averages the results.
- Parameters
n_estimators – int, default=10 The number of trees in the forest.
max_iter – int, default=500 maximum number of imputation rounds to perform.
random_state – int, default set to the current time. seed of the pseudo random number generator to use.
- AutoPrognosis Hyperparameters:
n_estimators: The number of trees in the forest.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("missforest") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.0 1.0 1.0 1.0 1 1.0 1.9 1.9 1.0 2 1.0 2.0 2.0 1.0 3 2.0 2.0 2.0 2.0
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.imputers.plugin_sinkhorn module
- class SinkhornPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
Sinkhorn imputation can be used to impute quantitative data and it relies on the idea that two batches extracted randomly from the same dataset should share the same distribution and consists in minimizing optimal transport distances between batches.
- Args:
- eps: float, default=0.01
Sinkhorn regularization parameter.
- lrfloat, default = 0.01
Learning rate.
- opt: torch.nn.optim.Optimizer, default=torch.optim.Adam
Optimizer class to use for fitting.
- n_epochsint, default=15
Number of gradient updates for each model within a cycle.
- batch_sizeint, defatul=256
Size of the batches on which the sinkhorn divergence is evaluated.
- n_pairsint, default=10
Number of batch pairs used per gradient update.
- noisefloat, default = 0.1
Noise used for the missing values initialization.
- scaling: float, default=0.9
Scaling parameter in Sinkhorn iterations
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("sinkhorn") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.000000 1.000000 1.000000 1.000000 1 1.404637 1.651113 1.651093 1.404638 2 1.000000 2.000000 2.000000 1.000000 3 2.000000 2.000000 2.000000 2.000000
- Reference: “Missing Data Imputation using Optimal Transport”, Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi
Original code: https://github.com/BorisMuzellec/MissingDataOT
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- plugin
alias of
autoprognosis.plugins.imputers.plugin_sinkhorn.SinkhornPlugin
autoprognosis.plugins.imputers.plugin_softimpute module
- class SoftImputePlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
- The SoftImpute algorithm fits a low-rank matrix approximation to a matrix with missing values via nuclear- norm regularization. The algorithm can be used to impute quantitative data.
To calibrate the the nuclear-norm regularization parameter(shrink_lambda), we perform cross- validation(_cv_softimpute)
- Args:
- maxit: int, default=500
maximum number of imputation rounds to perform.
- convergence_thresholdfloat, default=1e-5
Minimum ration difference between iterations before stopping.
- max_rankint, default=2
Perform a truncated SVD on each iteration with this value as its rank.
- shrink_lambda: float, default=0
Value by which we shrink singular values on each iteration. If it’s missing, it is calibrated using cross validation.
- cv_len: int, default=15
the length of the grid on which the cross-validation is performed.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("softimpute") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1 3.820605e-16 1.708249e-16 1.708249e-16 3.820605e-16 2 1.000000e+00 2.000000e+00 2.000000e+00 1.000000e+00 3 2.000000e+00 2.000000e+00 2.000000e+00 2.000000e+00
Reference: “Spectral Regularization Algorithms for Learning Large Incomplete Matrices”, by Mazumder, Hastie, and Tibshirani.
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.imputers.plugin_mean module
- class MeanPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
Imputation plugin for completing missing values using the Mean Imputation strategy.
- Method:
The Mean Imputation strategy replaces the missing values using the mean along each column.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("mean") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.000000 1.000000 1.000000 1.000000 1 1.333333 1.666667 1.666667 1.333333 2 1.000000 2.000000 2.000000 1.000000 3 2.000000 2.000000 2.000000 2.000000
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- plugin
alias of
autoprognosis.plugins.imputers.plugin_mean.MeanPlugin
autoprognosis.plugins.imputers.plugin_median module
- class MedianPlugin(random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.imputers.base.ImputerPlugin
Imputation plugin for completing missing values using the Median Imputation strategy.
- Method:
The Median Imputation strategy replaces the missing values using the median along each column.
Example
>>> import numpy as np >>> from autoprognosis.plugins.imputers import Imputers >>> plugin = Imputers().get("median") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.0 1.0 1.0 1.0 1 1.0 2.0 2.0 1.0 2 1.0 2.0 2.0 1.0 3 2.0 2.0 2.0 2.0
- change_output(output: str) None
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.imputers.base.ImputerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- plugin
alias of
autoprognosis.plugins.imputers.plugin_median.MedianPlugin
Preprocessing plugins
Preprocessing plugins
autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_data_cleanup module
autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_fast_ica module
- class FastICAPlugin(model: Optional[Any] = None, random_state: int = 0, n_components: int = 2, max_iter=10000)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for dimensionality reduction based on Independent Component Analysis algorithm.
- Method:
Independent component analysis separates a multivariate signal into additive subcomponents that are maximally independent.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html
- Parameters
n_components – int Number of components to use.
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors(category="dimensionality_reduction").get("fast_ica") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_fast_ica.FastICAPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_feature_agglomeration module
- class FeatureAgglomerationPlugin(model: Optional[Any] = None, random_state: int = 0, n_clusters: int = 2)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for dimensionality reduction based on Feature Agglomeration algorithm.
- Method:
FeatureAgglomeration uses agglomerative clustering to group together features that look very similar, thus decreasing the number of features.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.FeatureAgglomeration.html
- Parameters
n_clusters – int Number of clusters to find.
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors(category="dimensionality_reduction").get("feature_agglomeration") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_feature_agglomeration.FeatureAgglomerationPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_gauss_projection module
- class GaussianRandomProjectionPlugin(random_state: int = 0, model: Optional[Any] = None, n_components: int = 2)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for dimensionality reduction based on Gaussian random projection algorithm.
- Method:
The Gaussian random projection reduces the dimensionality by projecting the original input space on a randomly generated matrix where components are drawn from N(0, 1 / n_components).
- Reference:
- Parameters
n_components – int Number of components to use.
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors(category="dimensionality_reduction").get("gauss_projection") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_gauss_projection.GaussianRandomProjectionPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_pca module
- class PCAPlugin(random_state: int = 0, model: Optional[Any] = None, n_components: int = 2)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for dimensionality reduction based on the PCA method.
- Method:
PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- Parameters
n_components – int Number of components to use.
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors(category="dimensionality_reduction").get("pca") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_pca.PCAPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_variance_threshold module
- class VarianceThresholdPlugin(random_state: int = 0, model: Optional[Any] = None, threshold: float = 0.001)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for dimensionality reduction based on removing features with low variance.
- Method:
VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html
- Parameters
threshold – float Features with a training-set variance lower than this threshold will be removed.
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors(category="dimensionality_reduction").get("variance_threshold", threshold=1.0) >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.dimensionality_reduction.plugin_variance_threshold.VarianceThresholdPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.feature_scaling.plugin_feature_normalizer module
- class FeatureNormalizerPlugin(random_state: int = 0, model: Optional[Any] = None)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for sample normalization based on L2 normalization.
- Method:
Normalization is the process of scaling individual samples to have unit norm.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors().get("feature_normalizer") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_feature_normalizer.FeatureNormalizerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.feature_scaling.plugin_maxabs_scaler module
- class MaxAbsScalerPlugin(random_state: int = 0, model: Optional[Any] = None)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for feature scaling based on maximum absolute value.
- Method:
The MaxAbs estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors().get("maxabs_scaler") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_maxabs_scaler.MaxAbsScalerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.feature_scaling.plugin_minmax_scaler module
- class MinMaxScalerPlugin(random_state: int = 0, model: Optional[Any] = None)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for feature scaling to a given range.
- Method:
The MinMax estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors().get("minmax_scaler") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_minmax_scaler.MinMaxScalerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.feature_scaling.plugin_scaler module
- class ScalerPlugin(random_state: int = 0, model: Optional[Any] = None)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for feature scaling based on StandardScaler implementation.
- Method:
The Scaler plugin standardizes the features by removing the mean and scaling to unit variance.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors().get("scaler") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_scaler.ScalerPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.feature_scaling.plugin_normal_transform module
- class NormalTransformPlugin(random_state: int = 0, n_quantiles: int = 100, model: Optional[Any] = None)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for feature scaling based on quantile information.
- Method:
This method transforms the features to follow a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors().get("normal_transform") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y) 0 1 2 3 0 -0.701131 1.061219 -1.205040 -1.138208 1 -1.154434 -0.084214 -1.205040 -1.138208 2 -1.523968 0.443066 -1.674870 -1.138208 3 -1.710095 0.229099 -0.836836 -1.138208 4 -0.923581 1.222611 -1.205040 -1.138208 .. ... ... ... ... 145 1.017901 -0.084214 0.778555 1.523968 146 0.509020 -1.297001 0.547708 0.813193 147 0.778555 -0.084214 0.778555 0.949666 148 0.378986 0.824957 0.869109 1.523968 149 0.109568 -0.084214 0.669219 0.627699
[150 rows x 4 columns]
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_normal_transform.NormalTransformPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.preprocessors.feature_scaling.plugin_uniform_transform module
- class UniformTransformPlugin(random_state: int = 0, n_quantiles: int = 100, model: Optional[Any] = None)
Bases:
autoprognosis.plugins.preprocessors.base.PreprocessorPlugin
Preprocessing plugin for feature scaling based on quantile information.
- Method:
This method transforms the features to follow a uniform distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values.
- Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html
Example
>>> from autoprognosis.plugins.preprocessors import Preprocessors >>> plugin = Preprocessors().get("uniform_transform") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_transform(X, y)
- change_output(output: str) None
- static components_interval(*args: Any, **kwargs: Any) Tuple[int, int]
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.preprocessors.feature_scaling.plugin_uniform_transform.UniformTransformPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
Prediction plugins
Prediction plugins
Classifiers
autoprognosis.plugins.prediction.classifiers.plugin_adaboost module
- class AdaBoostPlugin(estimator: int = 0, n_estimators: int = 10, learning_rate: float = 0.1, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the AdaBoost estimator.
- Method:
An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
- Parameters
estimator – int Base Learner to use. 0: HistGradientBoostingClassifier, 1: CatBoostClassifier, 2: LGBM, 3: LogisticRegression
n_estimators – int The maximum number of estimators at which boosting is terminated.
learning_rate – float Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between the learning_rate and n_estimators parameters.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("adaboost") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- base_estimators = [sklearn.ensemble.HistGradientBoostingClassifier, catboost.CatBoostClassifier, sklearn.base.ClassifierMixin, sklearn.linear_model.LogisticRegression]
- calibrations = ['none', 'sigmoid', 'isotonic']
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_adaboost.AdaBoostPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_bagging module
- class BaggingPlugin(n_estimators: int = 10, max_samples: float = 1.0, max_features: float = 1.0, estimator: int = 0, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Bagging estimator.
- Method:
A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.
- Parameters
n_estimators – int The number of base estimators in the ensemble.
max_samples – float The number of samples to draw from X to train each base estimator.
max_features – float The number of features to draw from X to train each base estimator.
estimator – int Base estimator to use. 0: HistGradientBoostingClassifier, 1: CatBoostClassifier, 2: LGBM, 3: LogisticRegression.
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("bagging") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- base_estimators = [sklearn.ensemble.HistGradientBoostingClassifier, catboost.CatBoostClassifier, sklearn.base.ClassifierMixin, sklearn.linear_model.LogisticRegression]
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_bagging.BaggingPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_catboost module
- class CatBoostPlugin(n_estimators: int = 100, depth: int = 5, grow_policy: int = 0, l2_leaf_reg: float = 3, learning_rate: float = 0.001, min_data_in_leaf: int = 1, random_strength: float = 1, random_state: int = 0, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the CatBoost framework.
- Method:
CatBoost provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. It uses Ordered Boosting to overcome over fitting and Symmetric Trees for faster execution.
- Parameters
n_estimators – int Number of gradient boosted trees. Equivalent to number of boosting rounds.
depth – int Depth of the tree.
grow_policy – int The tree growing policy. Defines how to perform greedy tree construction: [SymmetricTree, Depthwise]
l2_leaf_reg – float Coefficient at the L2 regularization term of the cost function.
learning_rate – float The learning rate used for reducing the gradient step.
min_data_in_leaf – int The minimum number of training samples in a leaf.
random_strength – float The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("catboost") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- grow_policies = ['Depthwise', 'SymmetricTree', 'Lossguide']
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_catboost.CatBoostPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_decision_trees module
- class DecisionTreePlugin(criterion: int = 0, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Decision trees.
- Method:
Decision Trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.
- Parameters
criterion – int The function to measure the quality of a split. Supported criteria are “gini”(0) for the Gini impurity and “entropy”(1) for the information gain.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("decision_trees") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- criterions = ['gini', 'entropy']
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_decision_trees.DecisionTreePlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_extra_tree_classifier module
- class ExtraTreeClassifierPlugin(criterion: int = 0, calibration: int = 0, random_state: int = 0, model: Optional[Any] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on extra-trees classifier.
- Method:
The Extra-Trees classifierimplements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
- Parameters
criterion – int The function to measure the quality of a split. Supported criteria are “gini”(0) for the Gini impurity and “entropy”(1) for the information gain.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("extra_tree_classifier") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- criterions = ['gini', 'entropy']
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_extra_tree_classifier.ExtraTreeClassifierPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_gaussian_naive_bayes module
- class GaussianNaiveBayesPlugin(calibration: int = 0, random_state: int = 0, model: Optional[Any] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Gaussian Naive Bayes algorithm for classification.
- Method:
The plugin implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian.
- Parameters
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("gaussian_naive_bayes") >>> plugin.fit_predict(...)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_gaussian_naive_bayes.GaussianNaiveBayesPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_gradient_boosting module
- class GradientBoostingPlugin(n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 6, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Gradient boosting method.
- Method:
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest.
- Parameters
n_estimators – int The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
learning_rate – float Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
max_depth – int The maximum depth of the individual regression estimators.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("gradient_boosting") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_gradient_boosting.GradientBoostingPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_knn module
- class KNNPlugin(n_neighbors: int = 5, weights: int = 0, algorithm: int = 0, leaf_size: int = 30, p: int = 2, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the k-nearest neighbors vote.
- Method:
Neighbors-based classification is a type of instance-based learning or non-generalizing learning: it does not attempt to construct a general internal model, but simply stores instances of the training data. Classification is computed from a simple majority vote of the nearest neighbors of each point: a query point is assigned the data class which has the most representatives within the nearest neighbors of the point.
- Parameters
n_neighbors – int Number of neighbors to use
weights – str Weight function used in prediction. Possible values: “uniform”, “distance”
algorithm – str Algorithm used to compute the nearest neighbors: “ball_tree”, “kd_tree”, “brute” or “auto”.
leaf_size – int Leaf size passed to BallTree or KDTree.
p – int Power parameter for the Minkowski metric.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("knn") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- algorithms = ['auto', 'ball_tree', 'kd_tree', 'brute']
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_knn.KNNPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- weights = ['uniform', 'distance']
autoprognosis.plugins.prediction.classifiers.plugin_lda module
- class LinearDiscriminantAnalysisPlugin(calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on Linear Discriminant Analysis.
- Method:
The plugin is based on Linear Discriminant Analysis, a classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.
- Parameters
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("lda") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_lda.LinearDiscriminantAnalysisPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_lgbm module
- class LightGBMPlugin(n_estimators: int = 100, boosting_type: str = 'gbdt', learning_rate: float = 0.01, max_depth: int = 6, reg_lambda: float = 0.001, reg_alpha: float = 0.001, colsample_bytree: float = 0.1, subsample: float = 0.1, num_leaves: int = 31, min_child_samples: int = 1, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on LightGBM.
- Method:
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient boosted trees, which usually outperforms random forest.
- Parameters
n_estimators – int The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
learning_rate – float Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
max_depth – int The maximum depth of the individual regression estimators.
boosting_type – str ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
objective – str Specify the learning task and the corresponding learning objective or a custom objective function to be used.
reg_lambda – float L2 regularization term on weights.
reg_alpha – float L1 regularization term on weights.
colsample_bytree – float Subsample ratio of columns when constructing each tree.
subsample – float Subsample ratio of the training instance.
num_leaves – int Maximum tree leaves for base learners.
min_child_samples – int Minimum sum of instance weight (hessian) needed in a child (leaf).
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("lgbm") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_lgbm.LightGBMPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_linear_svm module
- class LinearSVMPlugin(penalty: int = 1, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Linear Support Vector Classification algorithm.
- Method:
The plugin is based on LinearSVC, an implementation of Support Vector Classification for the case of a linear kernel.
- Parameters
penalty – int Specifies the norm used in the penalization. 0: l1, 1: l2
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("linear_svm") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_linear_svm.LinearSVMPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- penalties = ['l1', 'l2']
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_logistic_regression module
- class LogisticRegressionPlugin(C: float = 1.0, solver: int = 1, multi_class: int = 0, class_weight: int = 0, max_iter: int = 10000, penalty: str = 'l2', calibration: int = 0, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, n_jobs: int = 2, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Logistic Regression classifier.
- Method:
Logistic regression is a linear model for classification rather than regression. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.
- Parameters
C – float Inverse of regularization strength; must be a positive float.
solver – int index Algorithm to use in the optimization problem: [‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’]
multi_class – int If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.
class_weight – int index Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.
max_iter – int Maximum number of iterations taken for the solvers to converge.
penalty – str Specify the norm of the penalty:
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("logistic_regression") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- classes = ['auto', 'ovr', 'multinomial']
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_logistic_regression.LogisticRegressionPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- solvers = ['newton-cg', 'lbfgs', 'sag', 'saga']
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- weights = ['balanced', None]
autoprognosis.plugins.prediction.classifiers.plugin_multinomial_naive_bayes module
- class MultinomialNaiveBayesPlugin(alpha: float = 1.0, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Multinomial Naive Bayes algorithm.
- Method:
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification).
- Parameters
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("multinomial_naive_bayes") >>> plugin.fit_predict(...)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_multinomial_naive_bayes.MultinomialNaiveBayesPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_neural_nets module
- class BasicNet(*args: Any, **kwargs: Any)
Bases:
torch.nn.Module
Basic neural net.
- Parameters
n_unit_in (int) – Number of features
categories (int) –
n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)
n_units_hidden (int) – Number of hidden units in each hypothesis layer
nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.
lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.
weight_decay (float) – l2 (ridge) penalty for the weights.
n_iter (int) – Maximum number of iterations.
batch_size (int) – Batch size
n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.
val_split_prop (float) – Proportion of samples used for validation split (can be 0)
patience (int) – Number of iterations to wait before early stopping after decrease in validation loss
n_iter_min (int) – Minimum number of iterations to go through before starting early stopping
clipping_value (int, default 1) – Gradients clipping value
- forward(X: torch.Tensor) torch.Tensor
- train(X: torch.Tensor, y: torch.Tensor) autoprognosis.plugins.prediction.classifiers.plugin_neural_nets.BasicNet
- class NeuralNetsPlugin(n_layers_hidden: int = 1, n_units_hidden: int = 100, nonlin: str = 'relu', lr: float = 0.001, weight_decay: float = 0.001, n_iter: int = 1000, batch_size: int = 128, n_iter_print: int = 10, patience: int = 10, n_iter_min: int = 100, dropout: float = 0.1, clipping_value: int = 1, batch_norm: bool = True, early_stopping: bool = True, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on Neural networks.
- Parameters
n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)
n_units_hidden (int) – Number of hidden units in each hypothesis layer
nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.
lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.
weight_decay (float) – l2 (ridge) penalty for the weights.
n_iter (int) – Maximum number of iterations.
batch_size (int) – Batch size
n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.
val_split_prop (float) – Proportion of samples used for validation split (can be 0)
patience (int) – Number of iterations to wait before early stopping after decrease in validation loss
n_iter_min (int) – Minimum number of iterations to go through before starting early stopping
clipping_value (int, default 1) – Gradients clipping value
random_state (int, default 0) – Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("neural_nets", n_layers_hidden = 2) >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_neural_nets.NeuralNetsPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_perceptron module
- class PerceptronPlugin(penalty: int = 1, alpha: float = 0.0001, calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on perceptrons.
- Method:
Perceptron is simple classification algorithm suitable for large scale learning. By default, it does not require a learning rate and it updates its model only on mistakes.
- Parameters
penalty – str The penalty to be used: {‘l2’,’l1’,’elasticnet’}
alpha – float Constant that multiplies the regularization term if regularization is used.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("perceptron") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_perceptron.PerceptronPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- penalties = ['l1', 'l2', 'elasticnet']
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_qda module
- class QuadraticDiscriminantAnalysisPlugin(calibration: int = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on Quadratic Discriminant Analysis.
- Method:
The plugin is based on Quadratic Discriminant Analysis, a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
- Parameters
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("qda") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_qda.QuadraticDiscriminantAnalysisPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_random_forest module
- class RandomForestPlugin(n_estimators: int = 100, criterion: int = 0, min_samples_split: int = 2, bootstrap: bool = True, min_samples_leaf: int = 2, calibration: int = 0, max_depth: int = 4, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on Random forests.
- Method:
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
- Parameters
n_estimators – int The number of trees in the forest.
criterion – str The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.
min_samples_split – int The minimum number of samples required to split an internal node.
boostrap – bool Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
min_samples_leaf – int The minimum number of samples required to be at a leaf node.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("random_forest") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- change_output(output: str) None
- criterions = ['gini', 'entropy']
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_random_forest.RandomForestPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_ridge_classifier module
- class RidgeClassifierPlugin(solver: int = 0, calibration: int = 0, random_state: int = 0, model: Optional[Any] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the Ridge classifier.
- Method:
The RidgeClassifier converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).
- Parameters
solver – str Algorithm to use in the optimization problem: {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("ridge_classifier") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_ridge_classifier.RidgeClassifierPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- solvers = ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg']
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_tabnet module
- class TabNetPlugin(n_d: int = 64, n_a: int = 64, lr: float = 0.001, n_steps: int = 3, gamma: float = 1.5, n_independent: int = 2, n_shared: int = 2, lambda_sparse: float = 0.0001, momentum: float = 0.3, clip_value: float = 2.0, max_epochs: int = 1000, patience: int = 20, batch_size: int = 50, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features.
- Parameters
n_d – int Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting. Values typically range from 8 to 64.
n_a – int Width of the attention embedding for each mask. According to the paper n_d=n_a is usually a good choice. (default=8)
lr – float Learning rate
n_steps – int Number of steps in the architecture (usually between 3 and 10)
gamma – float This is the coefficient for feature reusage in the masks. A value close to 1 will make mask selection least correlated between layers. Values range from 1.0 to 2.0.
n_independent – int Number of independent Gated Linear Units layers at each step. Usual values range from 1 to 5.
n_shared – int Number of shared Gated Linear Units at each step Usual values range from 1 to 5
lambda_sparse – float This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.
momentum – float Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)
clip_value – float If a float is given this will clip the gradient at clip_value.
max_epochs – int Maximum number of epochs for trainng.
patience – int Number of consecutive epochs without improvement before performing early stopping.
batch_size – int Batch size
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("tabnet", max_epochs = 100) >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
Original implementation: https://github.com/dreamquark-ai/tabnet
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_tabnet.TabNetPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.classifiers.plugin_xgboost module
- class XGBoostPlugin(n_estimators: int = 100, reg_lambda: float = 0.001, reg_alpha: float = 0.001, colsample_bytree: float = 0.1, colsample_bynode: float = 0.1, colsample_bylevel: float = 0.1, max_depth: int = 6, subsample: float = 0.1, learning_rate: float = 0.01, min_child_weight: int = 0, max_bin: int = 256, booster: int = 0, grow_policy: int = 0, random_state: int = 0, calibration: int = 0, gamma: float = 0, model: Optional[Any] = None, nthread: int = 2, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.classifiers.base.ClassifierPlugin
Classification plugin based on the XGBoost classifier.
- Method:
Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm has a robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can fine-tune.
- Parameters
n_estimators – int The maximum number of estimators at which boosting is terminated.
max_depth – int Maximum depth of a tree.
reg_lambda – float L2 regularization term on weights (xgb’s lambda).
reg_alpha – float L1 regularization term on weights (xgb’s alpha).
colsample_bytree – float Subsample ratio of columns when constructing each tree.
colsample_bynode – float Subsample ratio of columns for each split.
colsample_bylevel – float Subsample ratio of columns for each level.
subsample – float Subsample ratio of the training instance.
learning_rate – float Boosting learning rate
booster – int index Specify which booster to use: gbtree, gblinear or dart.
min_child_weight – int Minimum sum of instance weight(hessian) needed in a child.
max_bin – int Number of bins for histogram construction.
grow_policy – int index Controls a way new nodes are added to the tree. 0: “depthwise”, 1 : “lossguide”
random_state – float Random number seed.
calibration – int Enable/disable calibration. 0: disabled, 1 : sigmoid, 2: isotonic.
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="classifiers").get("xgboost", n_estimators = 20) >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- booster = ['gbtree', 'gblinear', 'dart']
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.core.base_plugin.Plugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- grow_policy = ['depthwise', 'lossguide']
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.classifiers.plugin_xgboost.XGBoostPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
Risk estimation
autoprognosis.plugins.prediction.risk_estimation.plugin_coxnet module
- class CoxnetRiskEstimationPlugin(hidden_dim: int = 100, hidden_len: int = 2, batch_norm: bool = True, dropout: float = 0.1, lr: float = 0.001, epochs: int = 5000, patience: int = 50, batch_size: int = 128, verbose: bool = False, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
CoxPH neural net plugin for survival analysis.
- Parameters
hidden_dim – int Number of neurons in the hidden layers
hidden_len – int Number of hidden layers
batch_norm – bool. Batch norm on/off.
dropout – float. Dropout value.
lr – float. Learning rate.
epochs – int. Number of training epochs
patience – int. Number of iterations without validation improvement.
batch_size – int. Batch size
verbose – bool. Enable debug logs
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> from pycox.datasets import metabric >>> >>> df = metabric.read_df() >>> X = df.drop(["duration", "event"], axis=1) >>> Y = df["event"] >>> T = df["duration"] >>> >>> plugin = Predictions(category="risk_estimation").get("coxnet") >>> plugin.fit(X, T, Y) >>> >>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))] >>> plugin.predict(X, eval_time_horizons)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
Return the hyperparameter space for the current model.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_coxnet.CoxnetRiskEstimationPlugin
Load the plugin from bytes
- static name() str
Return the name of the current model.
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.risk_estimation.plugin_deephit module
- class DeepHitRiskEstimationPlugin(model: Optional[Any] = None, num_durations: int = 10, batch_size: int = 100, epochs: int = 5000, lr: float = 0.01, dim_hidden: int = 300, alpha: float = 0.28, sigma: float = 0.38, dropout: float = 0.2, patience: int = 20, batch_norm: bool = False, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
DeepHit plugin for survival analysis. DeepHit, that uses a deep neural network to learn the distribution of survival times directly.DeepHit makes no assumptions about the underlying stochastic process and allows for the possibility that the relationship between covariates and risk(s) changes over time. Most importantly, DeepHit smoothly handles competing risks; i.e. settings in which there is more than one possible event of interest.
- Parameters
num_durations – int Number of points in the survival function
batch_size – int Batch size
epochs – int Number of iterations
lr – float learning rate
dim_hidden – int Number of neurons in the hidden layers
alpha – float Weighting (0, 1) likelihood and rank loss (L2 in paper). 1 gives only likelihood, and 0 gives only rank loss. (default: {0.2})
sigma – float From eta in rank loss (L2 in paper) (default: {0.1})
dropout – float Dropout value
patience – int Number of epochs without improvement.
batch_norm – bool Enable/Disable batch_norm
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> from pycox.datasets import metabric >>> >>> df = metabric.read_df() >>> X = df.drop(["duration", "event"], axis=1) >>> Y = df["event"] >>> T = df["duration"] >>> >>> plugin = Predictions(category="risk_estimation").get("deephit") >>> plugin.fit(X, T, Y) >>> >>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))] >>> plugin.predict(X, eval_time_horizons)
References: [1] Changhee Lee, William R Zame, Jinsung Yoon, and Mihaela van der Schaar. Deephit: A deep learning
approach to survival analysis with competing risks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. http://medianetlab.ee.ucla.edu/papers/AAAI_2018_DeepHit
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_deephit.DeepHitRiskEstimationPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.risk_estimation.plugin_loglogistic_aft module
- class LogLogisticAFTPlugin(alpha: float = 0.05, l1_ratio: float = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Log-Logistic AFT plugin for survival analysis.
- Parameters
alpha – float the level in the confidence intervals.
l1_ratio – float the penalizer coefficient to the size of the coefficients.
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> from pycox.datasets import metabric >>> >>> df = metabric.read_df() >>> X = df.drop(["duration", "event"], axis=1) >>> Y = df["event"] >>> T = df["duration"] >>> >>> plugin = Predictions(category="risk_estimation").get("loglogistic_aft") >>> plugin.fit(X, T, Y) >>> >>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))] >>> plugin.predict(X, eval_time_horizons)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_loglogistic_aft.LogLogisticAFTPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.risk_estimation.plugin_lognormal_aft module
- class LogNormalAFTPlugin(alpha: float = 0.05, l1_ratio: float = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Log-Normal AFT plugin for survival analysis.
- Parameters
alpha – float the level in the confidence intervals.
l1_ratio – float the penalizer coefficient to the size of the coefficients.
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> from pycox.datasets import metabric >>> >>> df = metabric.read_df() >>> X = df.drop(["duration", "event"], axis=1) >>> Y = df["event"] >>> T = df["duration"] >>> >>> plugin = Predictions(category="risk_estimation").get("lognormal_aft") >>> plugin.fit(X, T, Y) >>> >>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))] >>> plugin.predict(X, eval_time_horizons)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_lognormal_aft.LogNormalAFTPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.risk_estimation.plugin_survival_xgboost module
- class XGBoostRiskEstimationPlugin(n_estimators: int = 100, reg_lambda: float = 0.001, reg_alpha: float = 0.001, colsample_bytree: float = 0.1, colsample_bynode: float = 0.1, colsample_bylevel: float = 0.1, max_depth: int = 6, subsample: float = 0.1, learning_rate: float = 0.01, min_child_weight: int = 0, max_bin: int = 256, booster: int = 0, grow_policy: int = 0, objective: str = 'aft', strategy: str = 'weibull', model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Survival XGBoost plugin for survival analysis.
- Parameters
n_estimators – int The maximum number of estimators at which boosting is terminated.
max_depth – int Maximum depth of a tree.
reg_lambda – float L2 regularization term on weights (xgb’s lambda).
reg_alpha – float L1 regularization term on weights (xgb’s alpha).
colsample_bytree – float Subsample ratio of columns when constructing each tree.
colsample_bynode – float Subsample ratio of columns for each split.
colsample_bylevel – float Subsample ratio of columns for each level.
subsample – float Subsample ratio of the training instance.
learning_rate – float Boosting learning rate
booster – int index Specify which booster to use: gbtree, gblinear or dart.
min_child_weight – int Minimum sum of instance weight(hessian) needed in a child.
max_bin – int Number of bins for histogram construction.
grow_policy – int index Controls a way new nodes are added to the tree. 0: “depthwise”, 1 : “lossguide”
random_state – float Random number seed.
objective – str Survival analysis objective. Can be “aft” or “cox”
strategy – str Survival analysis model. Can be “weibull”, “debiased_bce”
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> from pycox.datasets import metabric >>> >>> df = metabric.read_df() >>> X = df.drop(["duration", "event"], axis=1) >>> Y = df["event"] >>> T = df["duration"] >>> >>> plugin = Predictions(category="risk_estimation").get("survival_xgboost") >>> plugin.fit(X, T, Y) >>> >>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))] >>> plugin.predict(X, eval_time_horizons)
- booster = ['gbtree', 'gblinear', 'dart']
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- grow_policy = ['depthwise', 'lossguide']
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_survival_xgboost.XGBoostRiskEstimationPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.risk_estimation.plugin_weibull_aft module
- class WeibullAFTPlugin(alpha: float = 0.05, l1_ratio: float = 0, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Weibull AFT plugin for survival analysis.
- Parameters
alpha – float the level in the confidence intervals.
l1_ratio – float the penalizer coefficient to the size of the coefficients.
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> from pycox.datasets import metabric >>> >>> df = metabric.read_df() >>> X = df.drop(["duration", "event"], axis=1) >>> Y = df["event"] >>> T = df["duration"] >>> >>> plugin = Predictions(category="risk_estimation").get("weibull_aft") >>> plugin.fit(X, T, Y) >>> >>> eval_time_horizons = [int(T[Y.iloc[:] == 1].quantile(0.50))] >>> plugin.predict(X, eval_time_horizons)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.risk_estimation.base.RiskEstimationPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.risk_estimation.plugin_weibull_aft.WeibullAFTPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
Regression
autoprognosis.plugins.prediction.regression.plugin_bayesian_ridge module
- class BayesianRidgePlugin(n_iter: int = 1000, tol: float = 0.001, hyperparam_search_iterations: Optional[int] = None, model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Bayesian ridge regression.
- Parameters
n_iter – int Maximum number of iterations. Should be greater than or equal to 1.
tol – float Stop the algorithm if w has converged.
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("bayesian_ridge") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_bayesian_ridge.BayesianRidgePlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.regression.plugin_catboost_regressor module
- class CatBoostRegressorPlugin(depth: int = 5, grow_policy: int = 0, n_estimators: int = 100, l2_leaf_reg: float = 3, learning_rate: float = 0.001, min_data_in_leaf: int = 1, random_strength: float = 1, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on the CatBoost framework.
- Method:
CatBoost provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. It uses Ordered Boosting to overcome over fitting and Symmetric Trees for faster execution.
- Parameters
n_estimators – int Number of gradient boosted trees. Equivalent to number of boosting rounds.
depth – int Depth of the tree.
grow_policy – int The tree growing policy. Defines how to perform greedy tree construction: [SymmetricTree, Depthwise]
l2_leaf_reg – float Coefficient at the L2 regularization term of the cost function.
learning_rate – float The learning rate used for reducing the gradient step.
min_data_in_leaf – int The minimum number of training samples in a leaf.
random_strength – float The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("catboost_regressor") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- grow_policies = ['Depthwise', 'SymmetricTree', 'Lossguide']
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_catboost_regressor.CatBoostRegressorPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.regression.plugin_kneighbors_regressor module
- class KNeighborsRegressorPlugin(n_neighbors: int = 5, weights: int = 0, algorithm: int = 0, leaf_size: int = 30, p: int = 2, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, model: Optional[Any] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on the KNeighborsRegressor.
- Parameters
n_neighbors – int Number of neighbors to use
weights – str Weight function used in prediction. Possible values: “uniform”, “distance”
algorithm – int index Algorithm used to compute the nearest neighbors: “ball_tree”, “kd_tree”, “brute” or “auto”.
leaf_size – int Leaf size passed to BallTree or KDTree.
p – int Power parameter for the Minkowski metric.
random_state – int, default 0 Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("kneighbors_regressor") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- algorithm = ['auto', 'ball_tree', 'kd_tree', 'brute']
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_kneighbors_regressor.KNeighborsRegressorPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
- weights = ['uniform', 'distance']
autoprognosis.plugins.prediction.regression.plugin_linear_regression module
- class LinearRegressionPlugin(model: Optional[Any] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on the Linear Regression.
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("linear_regression") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_linear_regression.LinearRegressionPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- solvers = ['auto', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression module
- class BasicNet(*args: Any, **kwargs: Any)
Bases:
torch.nn.Module
Basic neural net.
- Parameters
n_unit_in (int) – Number of features
n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)
n_units_hidden (int) – Number of hidden units in each hypothesis layer
nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.
lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.
weight_decay (float) – l2 (ridge) penalty for the weights.
n_iter (int) – Maximum number of iterations.
batch_size (int) – Batch size
n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.
val_split_prop (float) – Proportion of samples used for validation split (can be 0)
patience (int) – Number of iterations to wait before early stopping after decrease in validation loss
n_iter_min (int) – Minimum number of iterations to go through before starting early stopping
clipping_value (int, default 1) – Gradients clipping value
- forward(X: torch.Tensor) torch.Tensor
- train(X: torch.Tensor, y: torch.Tensor) autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression.BasicNet
- class NeuralNetsRegressionPlugin(n_layers_hidden: int = 1, n_units_hidden: int = 100, nonlin: str = 'relu', lr: float = 0.001, weight_decay: float = 0.001, n_iter: int = 1000, batch_size: int = 512, n_iter_print: int = 10, patience: int = 10, n_iter_min: int = 100, dropout: float = 0.1, clipping_value: int = 1, batch_norm: bool = True, early_stopping: bool = True, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on Neural networks.
- Parameters
n_layers_hidden (int) – Number of hypothesis layers (n_layers_hidden x n_units_hidden + 1 x Linear layer)
n_units_hidden (int) – Number of hidden units in each hypothesis layer
nonlin (string, default 'elu') – Nonlinearity to use in NN. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.
lr (float) – learning rate for optimizer. step_size equivalent in the JAX version.
weight_decay (float) – l2 (ridge) penalty for the weights.
n_iter (int) – Maximum number of iterations.
batch_size (int) – Batch size
n_iter_print (int) – Number of iterations after which to print updates and check the validation loss.
val_split_prop (float) – Proportion of samples used for validation split (can be 0)
patience (int) – Number of iterations to wait before early stopping after decrease in validation loss
n_iter_min (int) – Minimum number of iterations to go through before starting early stopping
clipping_value (int, default 1) – Gradients clipping value
random_state (int) – Random seed
Example –
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("neural_nets_regression", n_iter = 100) >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_neural_nets_regression.NeuralNetsRegressionPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.regression.plugin_random_forest_regressor module
- class RandomForestRegressionPlugin(n_estimators: int = 50, criterion: int = 0, min_samples_split: int = 2, bootstrap: bool = True, min_samples_leaf: int = 2, model: Optional[Any] = None, hyperparam_search_iterations: Optional[int] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on Random forests.
- Method:
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
- Parameters
n_estimators – int The number of trees in the forest.
criterion – str The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.
min_samples_split – int The minimum number of samples required to split an internal node.
boostrap – bool Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
min_samples_leaf – int The minimum number of samples required to be at a leaf node.
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("random_forest") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- change_output(output: str) None
- criterions = ['squared_error', 'absolute_error', 'friedman_mse', 'poisson']
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_random_forest_regressor.RandomForestRegressionPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.regression.plugin_tabnet_regressor module
- class TabNetRegressorPlugin(n_d: int = 64, n_a: int = 64, lr: float = 0.001, n_steps: int = 3, gamma: float = 1.5, n_independent: int = 2, n_shared: int = 2, lambda_sparse: float = 0.0001, momentum: float = 0.3, clip_value: float = 2.0, epsilon: float = 1e-15, n_iter: int = 1000, patience: int = 50, batch_size: int = 50, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features.
- Parameters
n_d – int Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting. Values typically range from 8 to 64.
n_a – int Width of the attention embedding for each mask. According to the paper n_d=n_a is usually a good choice. (default=8)
lr – float Learning rate
n_steps – int Number of steps in the architecture (usually between 3 and 10)
gamma – float This is the coefficient for feature reusage in the masks. A value close to 1 will make mask selection least correlated between layers. Values range from 1.0 to 2.0.
n_independent – int Number of independent Gated Linear Units layers at each step. Usual values range from 1 to 5.
n_shared – int Number of shared Gated Linear Units at each step Usual values range from 1 to 5
lambda_sparse – float This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.
momentum – float Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)
clip_value – float If a float is given this will clip the gradient at clip_value.
max_epochs – int Maximum number of epochs for trainng.
patience – int Number of consecutive epochs without improvement before performing early stopping.
batch_size – int Batch size
random_state – int Random seed
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regression").get("tabnet") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y) # returns the probabilities for each class
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_tabnet_regressor.TabNetRegressorPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
autoprognosis.plugins.prediction.regression.plugin_xgboost_regressor module
- class XGBoostRegressorPlugin(reg_lambda: Optional[float] = None, reg_alpha: Optional[float] = None, colsample_bytree: Optional[float] = None, colsample_bynode: Optional[float] = None, colsample_bylevel: Optional[float] = None, n_estimators: int = 100, max_depth: Optional[int] = 3, lr: Optional[float] = None, subsample: Optional[float] = None, min_child_weight: Optional[int] = None, max_bin: int = 256, booster: int = 0, grow_policy: int = 0, eta: float = 0.3, model: Optional[Any] = None, random_state: int = 0, hyperparam_search_iterations: Optional[int] = None, **kwargs: Any)
Bases:
autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Regression plugin based on the XGBoost.
- Method:
Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoostRegressor algorithm has a robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can fine-tune.
- Parameters
n_estimators – int The maximum number of estimators at which boosting is terminated.
max_depth – int Maximum depth of a tree.
reg_lambda – float L2 regularization term on weights (xgb’s lambda).
reg_alpha – float L1 regularization term on weights (xgb’s alpha).
colsample_bytree – float Subsample ratio of columns when constructing each tree.
colsample_bynode – float Subsample ratio of columns for each split.
colsample_bylevel – float Subsample ratio of columns for each level.
subsample – float Subsample ratio of the training instance.
learning_rate – float Boosting learning rate
booster – str Specify which booster to use: gbtree, gblinear or dart.
min_child_weight – int Minimum sum of instance weight(hessian) needed in a child.
max_bin – int Number of bins for histogram construction.
random_state – float Random number seed.
Example
>>> from autoprognosis.plugins.prediction import Predictions >>> plugin = Predictions(category="regressors").get("xgboost_regressor") >>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> plugin.fit_predict(X, y)
- booster = ['gbtree', 'gblinear', 'dart']
- change_output(output: str) None
- explain(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- fit(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) autoprognosis.plugins.prediction.regression.base.RegressionPlugin
Train the plugin
- Parameters
X – pd.DataFrame
- fit_predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and predict the training data. Used by predictors.
- fit_transform(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Fit the model and transform the training data. Used by imputers and preprocessors.
- classmethod fqdn() str
The fully-qualified name of the plugin: type->subtype->name
- get_args() dict
- grow_policy = ['depthwise', 'lossguide']
- static hyperparameter_space(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter search domain, used for tuning.
- classmethod hyperparameter_space_fqdn(*args: Any, **kwargs: Any) List[autoprognosis.plugins.core.params.Params]
The hyperparameter domain using they fully-qualified name.
- is_fitted() bool
Check if the model was trained
- classmethod load(buff: bytes) autoprognosis.plugins.prediction.regression.plugin_xgboost_regressor.XGBoostRegressorPlugin
Load the plugin from bytes
- static name() str
The name of the plugin, e.g.: xgboost
- predict(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
Run predictions for the input. Used by predictors.
- Parameters
X – pd.DataFrame
- predict_proba(X: pandas.core.frame.DataFrame, *args: Any, **kwargs: Any) pandas.core.frame.DataFrame
- classmethod sample_hyperparameters(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters for Optuna.
- classmethod sample_hyperparameters_fqdn(trial: optuna.trial.Trial, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters using they fully-qualified name.
- classmethod sample_hyperparameters_np(random_state: int = 0, *args: Any, **kwargs: Any) Dict[str, Any]
Sample hyperparameters as a dict.
- save() bytes
Save the plugin to bytes
- score(X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, metric: str = 'aucroc') float
- static subtype() str
The type of the plugin, e.g.: classifier
- transform(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transform the input. Used by imputers and preprocessors.
- Parameters
X – pd.DataFrame
- static type() str
The type of the plugin, e.g.: prediction
Explainability plugins
Explainability plugins
autoprognosis.plugins.explainers.plugin_invase module
- class INVASEPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, feature_names: Optional[List] = None, n_epoch: int = 10000, n_epoch_inner: int = 2, n_folds: int = 5, task_type: str = 'classification', samples: int = 2000, prefit: bool = False, random_state: int = 0)
Bases:
autoprognosis.plugins.explainers.base.ExplainerPlugin
Interpretability plugin based on the INVASE algorithm.
- Parameters
estimator – model. The model to explain.
X – dataframe. Training set
y – dataframe. Training labels
time_to_event – dataframe. Used for risk estimation tasks.
eval_times – list. Used for risk estimation tasks.
n_epoch – int. training epochs
task_type – str. classification or risk_estimation
samples – int. Number of samples to use.
prefit – bool. If true, the estimator won’t be trained.
Example
>>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>>from autoprognosis.plugins.explainers import Explainers >>> from autoprognosis.plugins.prediction.classifiers import Classifiers >>> >>> X, y = load_iris(return_X_y=True) >>> >>> X = pd.DataFrame(X) >>> y = pd.Series(y) >>> >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> model = Classifiers().get("logistic_regression") >>> >>> explainer = Explainers().get( >>> "invase", >>> model, >>> X_train, >>> y_train, >>> task_type="classification", >>> ) >>> >>> explainer.explain(X_test)
- explain(X: pandas.core.frame.DataFrame) numpy.ndarray
- static name() str
- plot(values: pandas.core.frame.DataFrame) None
- static pretty_name() str
- static type() str
- class Masking(*args: Any, **kwargs: Any)
Bases:
torch.nn.Module
- forward(tensors: List[torch.Tensor]) torch.Tensor
- bitmask_intervals(n: int, low: int, high: int) Generator
- bitmasks(n: int, m: int) Generator
- class invaseBase(estimator: Any, X: numpy.ndarray, n_epoch: int = 10000, n_epoch_inner: int = 1, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 50, batch_size: int = 300, learning_rate: float = 0.001, penalty_l2: float = 0.001, feature_names: List = [])
Bases:
object
- abstract explain(X: numpy.ndarray, *args: Any, **kwargs: Any) numpy.ndarray
- class invaseCV(estimator: Any, X: numpy.ndarray, critic_latent_dim: int = 200, n_epoch: int = 10000, n_epoch_inner: int = 2, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 50, n_folds: int = 5, seed: int = 42, feature_names: List = [])
Bases:
object
- explain(x: numpy.ndarray) numpy.ndarray
- class invaseClassifier(estimator: Any, X: numpy.ndarray, critic_latent_dim: int = 200, n_epoch: int = 10000, n_epoch_inner: int = 2, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 50, batch_size: int = 300, learning_rate: float = 0.001, penalty_l2: float = 0.001, feature_names: List = [])
Bases:
autoprognosis.plugins.explainers.plugin_invase.invaseBase
- explain(X: numpy.ndarray, *args: Any, **kwargs: Any) numpy.ndarray
- class invaseRiskEstimation(estimator: Any, X: numpy.ndarray, eval_times: List, critic_latent_dim: int = 200, n_epoch: int = 10000, n_epoch_inner: int = 2, patience: int = 5, min_epochs: int = 100, n_epoch_print: int = 10, batch_size: int = 500, learning_rate: float = 0.001, penalty_l2: float = 0.001, samples: int = 20000, feature_names: List = [])
Bases:
autoprognosis.plugins.explainers.plugin_invase.invaseBase
- explain(X: numpy.ndarray, *args: Any, **kwargs: Any) numpy.ndarray
- plugin
alias of
autoprognosis.plugins.explainers.plugin_invase.INVASEPlugin
- sample(X: numpy.ndarray, nsamples: int = 100, random_state: int = 0) numpy.ndarray
autoprognosis.plugins.explainers.plugin_kernel_shap module
- class KernelSHAPPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, task_type: str = 'classification', feature_names: Optional[List] = None, subsample: int = 10, prefit: bool = False, n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.explainers.base.ExplainerPlugin
Interpretability plugin based on KernelSHAP.
- Parameters
estimator – model. The model to explain.
X – dataframe. Training set
y – dataframe. Training labels
task_type – str. classification or risk_estimation
prefit – bool. If true, the estimator won’t be trained.
n_epoch – int. training epochs
subsample – int. Number of samples to use.
time_to_event – dataframe. Used for risk estimation tasks.
eval_times – list. Used for risk estimation tasks.
Example
>>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>>from autoprognosis.plugins.explainers import Explainers >>> from autoprognosis.plugins.prediction.classifiers import Classifiers >>> >>> X, y = load_iris(return_X_y=True) >>> >>> X = pd.DataFrame(X) >>> y = pd.Series(y) >>> >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> model = Classifiers().get("logistic_regression") >>> >>> explainer = Explainers().get( >>> "kernel_shap", >>> model, >>> X_train, >>> y_train, >>> task_type="classification", >>> ) >>> >>> explainer.explain(X_test)
- explain(X: pandas.core.frame.DataFrame) numpy.ndarray
- static name() str
- plot(X: pandas.core.frame.DataFrame) None
- static pretty_name() str
- static type() str
autoprognosis.plugins.explainers.plugin_lime module
- class LimePlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, feature_names: Optional[List] = None, task_type: str = 'classification', prefit: bool = False, n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.explainers.base.ExplainerPlugin
Interpretability plugin based on LIME.
- Parameters
estimator – model. The model to explain.
X – dataframe. Training set
y – dataframe. Training labels
task_type – str. classification of risk_estimation
prefit – bool. If true, the estimator won’t be trained.
n_epoch – int. training epochs
time_to_event – dataframe. Used for risk estimation tasks.
eval_times – list. Used for risk estimation tasks.
Example
>>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>>from autoprognosis.plugins.explainers import Explainers >>> from autoprognosis.plugins.prediction.classifiers import Classifiers >>> >>> X, y = load_iris(return_X_y=True) >>> >>> X = pd.DataFrame(X) >>> y = pd.Series(y) >>> >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> model = Classifiers().get("logistic_regression") >>> >>> explainer = Explainers().get( >>> "lime", >>> model, >>> X_train, >>> y_train, >>> task_type="classification", >>> ) >>> >>> explainer.explain(X_test)
- explain(X: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
- static name() str
- plot(importances: pandas.core.frame.DataFrame, feature_names: Optional[list] = None) None
- static pretty_name() str
- static type() str
- plugin
alias of
autoprognosis.plugins.explainers.plugin_lime.LimePlugin
autoprognosis.plugins.explainers.plugin_risk_effect_size module
- class RiskEffectSizePlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, task_type: str = 'classification', feature_names: Optional[List] = None, subsample: int = 10, prefit: bool = False, effect_size: float = 0.5, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.explainers.base.ExplainerPlugin
Interpretability plugin based on Risk Effect size and Cohen’s D.
- Parameters
estimator – model. The model to explain.
X – dataframe. Training set
y – dataframe. Training labels
task_type – str. classification or risk_estimation
prefit – bool. If true, the estimator won’t be trained.
n_epoch – int. training epochs
time_to_event – dataframe. Used for risk estimation tasks.
eval_times – list. Used for risk estimation tasks.
Example
>>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>>from autoprognosis.plugins.explainers import Explainers >>> from autoprognosis.plugins.prediction.classifiers import Classifiers >>> >>> X, y = load_iris(return_X_y=True) >>> >>> X = pd.DataFrame(X) >>> y = pd.Series(y) >>> >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> model = Classifiers().get("logistic_regression") >>> >>> explainer = Explainers().get( >>> "risk_effect_size", >>> model, >>> X_train, >>> y_train, >>> task_type="classification", >>> ) >>> >>> explainer.explain(X_test)
- explain(X: pandas.core.frame.DataFrame, effect_size: Optional[float] = None) numpy.ndarray
- static name() str
- plot(X: pandas.core.frame.DataFrame, ax: Optional[Any] = None) None
- static pretty_name() str
- static type() str
autoprognosis.plugins.explainers.plugin_shap_permutation_sampler module
- class ShapPermutationSamplerPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, feature_names: Optional[List] = None, task_type: str = 'classification', n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, prefit: bool = False, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.explainers.base.ExplainerPlugin
Interpretability plugin based on ShapPermutation sampler.
- Parameters
estimator – model. The model to explain.
X – dataframe. Training set
y – dataframe. Training labels
task_type – str. classification of risk_estimation
prefit – bool. If true, the estimator won’t be trained.
n_epoch – int. training epochs
subsample – int. Number of samples to use.
time_to_event – dataframe. Used for risk estimation tasks.
eval_times – list. Used for risk estimation tasks.
Example
>>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>>from autoprognosis.plugins.explainers import Explainers >>> from autoprognosis.plugins.prediction.classifiers import Classifiers >>> >>> X, y = load_iris(return_X_y=True) >>> >>> X = pd.DataFrame(X) >>> y = pd.Series(y) >>> >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> model = Classifiers().get("logistic_regression") >>> >>> explainer = Explainers().get( >>> "shap_permutation_sampler", >>> model, >>> X_train, >>> y_train, >>> task_type="classification", >>> ) >>> >>> explainer.explain(X_test)
- explain(X: pandas.core.frame.DataFrame, max_evals: Union[int, str] = 'auto') Any
- static name() str
- plot(importances: pandas.core.frame.DataFrame, feature_names: Optional[list] = None) None
- static pretty_name() str
- static type() str
autoprognosis.plugins.explainers.plugin_symbolic_pursuit module
- class SymbolicPursuitPlugin(estimator: Any, X: pandas.core.frame.DataFrame, y: pandas.core.frame.DataFrame, task_type: str = 'classification', feature_names: Optional[List] = None, subsample: int = 10, prefit: bool = False, n_epoch: int = 10000, time_to_event: Optional[pandas.core.frame.DataFrame] = None, eval_times: Optional[List] = None, loss_tol: float = 0.001, ratio_tol: float = 0.9, maxiter: int = 100, eps: float = 1e-05, patience: int = 10, random_state: int = 0, **kwargs: Any)
Bases:
autoprognosis.plugins.explainers.base.ExplainerPlugin
Interpretability plugin based on Symbolic Pursuit.
Based on the NeurIPS 2020 paper “Learning outside the black-box: at the pursuit of interpretable models”.
- Parameters
estimator – model. The model to explain.
X – dataframe. Training set
y – dataframe. Training labels
task_type – str. classification or risk_estimation
prefit – bool. If true, the estimator won’t be trained.
n_epoch – int. training epochs
subsample – int. Number of samples to use.
time_to_event – dataframe. Used for risk estimation tasks.
eval_times – list. Used for risk estimation tasks.
loss_tol – float. The tolerance for the loss under which the pursuit stops
ratio_tol – float. A new term is added only if new_loss / old_loss < ratio_tol
maxiter – float. Maximum number of iterations for optimization
eps – float. Number used for numerical stability
random_state – float. Random seed for reproducibility
Example
>>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>>from autoprognosis.plugins.explainers import Explainers >>> from autoprognosis.plugins.prediction.classifiers import Classifiers >>> >>> X, y = load_iris(return_X_y=True) >>> >>> X = pd.DataFrame(X) >>> y = pd.Series(y) >>> >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> model = Classifiers().get("logistic_regression") >>> >>> explainer = Explainers().get( >>> "symbolic_pursuit", >>> model, >>> X_train, >>> y_train, >>> task_type="classification", >>> ) >>> >>> explainer.explain(X_test)
- explain(X: pandas.core.frame.DataFrame) numpy.ndarray
- static name() str
- plot(X: pandas.core.frame.DataFrame) tuple
- static pretty_name() str
- static type() str
Benchmarks
autoprognosis.utils.tester module
- class classifier_metrics(metric: Union[str, list] = ['aucroc', 'aucprc', 'accuracy', 'f1_score_micro', 'f1_score_macro', 'f1_score_weighted', 'kappa', 'kappa_quadratic', 'precision_micro', 'precision_macro', 'precision_weighted', 'recall_micro', 'recall_macro', 'recall_weighted', 'mcc'])
Bases:
object
Helper class for evaluating the performance of the classifier.
- Parameters
metric –
list, default=[“aucroc”, “aucprc”, “accuracy”, “f1_score_micro”, “f1_score_macro”, “f1_score_weighted”, “kappa”, “precision_micro”, “precision_macro”, “precision_weighted”, “recall_micro”, “recall_macro”, “recall_weighted”, “mcc”,] The type of metric to use for evaluation. Potential values:
”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
”accuracy” : Accuracy classification score.
”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.
”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
”kappa”, “kappa_quadratic”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
”precision_micro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
”precision_macro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(macro) calculates metrics for each label, and finds their unweighted mean.
”precision_weighted”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(weighted) calculates metrics for each label, and find their average weighted by support.
”recall_micro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
”recall_macro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(macro) calculates metrics for each label, and finds their unweighted mean.
”recall_weighted”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(weighted) calculates metrics for each label, and find their average weighted by support.
”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
- average_precision_score(y_test: numpy.ndarray, y_pred_proba: numpy.ndarray) float
- get_metric() Union[str, list]
- roc_auc_score(y_test: numpy.ndarray, y_pred_proba: numpy.ndarray) float
- score_proba(y_test: numpy.ndarray, y_pred_proba: numpy.ndarray) Dict[str, float]
- evaluate_estimator(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, seed: int = 0, pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None, *args: Any, **kwargs: Any) Dict
Helper for evaluating classifiers.
- Parameters
estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.
X – pd.DataFrame or np.ndarray: The covariates
Y – pd.Series or np.ndarray or list: The labels
n_folds – int cross-validation folds
seed – int Random seed
pretrained – bool If the estimator was already trained or not.
group_ids – pd.Series The group_ids to use for stratified cross-validation
- Returns
Dict containing “raw” and “str” nodes. The “str” node contains prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. Both “raw” and “str” nodes contain the following metrics:
”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
”accuracy” : Accuracy classification score.
”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.
”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
”kappa”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
”precision_micro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
”precision_macro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(macro) calculates metrics for each label, and finds their unweighted mean.
”precision_weighted”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(weighted) calculates metrics for each label, and find their average weighted by support.
”recall_micro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
”recall_macro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(macro) calculates metrics for each label, and finds their unweighted mean.
”recall_weighted”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(weighted) calculates metrics for each label, and find their average weighted by support.
”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
- evaluate_estimator_multiple_seeds(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, seeds: List[int] = [0, 1, 2], pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None) Dict
Helper for evaluating classifiers with multiple seeds.
- Parameters
estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.
X – pd.DataFrame or np.ndarray: The covariates
Y – pd.Series or np.ndarray or list: The labels
n_folds – int cross-validation folds
seeds – List Random seeds
pretrained – bool If the estimator was already trained or not.
group_ids – pd.Series The group_ids to use for stratified cross-validation
- Returns
Dict containing “seeds”, “agg” and “str” nodes. The “str” node contains the aggregated prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “seeds” node contains the results for each random seed. Both “agg” and “str” nodes contain the following metrics:
”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
”aucprc” : The average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.
”accuracy” : Accuracy classification score.
”f1_score_micro”: F1 score is a harmonic mean of the precision and recall. This version uses the “micro” average: calculate metrics globally by counting the total true positives, false negatives and false positives.
”f1_score_macro”: F1 score is a harmonic mean of the precision and recall. This version uses the “macro” average: calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
”f1_score_weighted”: F1 score is a harmonic mean of the precision and recall. This version uses the “weighted” average: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
”kappa”: computes Cohen’s kappa, a score that expresses the level of agreement between two annotators on a classification problem.
”precision_micro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(micro) calculates metrics globally by counting the total true positives.
”precision_macro”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(macro) calculates metrics for each label, and finds their unweighted mean.
”precision_weighted”: Precision is defined as the number of true positives over the number of true positives plus the number of false positives. This version(weighted) calculates metrics for each label, and find their average weighted by support.
”recall_micro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(micro) calculates metrics globally by counting the total true positives.
”recall_macro”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(macro) calculates metrics for each label, and finds their unweighted mean.
”recall_weighted”: Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. This version(weighted) calculates metrics for each label, and find their average weighted by support.
”mcc”: The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
- evaluate_regression(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, seed: int = 0, pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None, *args: Any, **kwargs: Any) Dict
Helper for evaluating regression tasks.
- Parameters
estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.
X – pd.DataFrame or np.ndarray covariates
Y – pd.Series or np.ndarray or list outcomes
n_folds – int Number of cross-validation folds
seed – int Random seed
group_ids – pd.Series Optional group_ids for stratified cross-validation
- Returns
Dict containing “raw” and “str” nodes. The “str” node contains prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. Both “raw” and “str” nodes contain the following metrics:
”r2”: R^2(coefficient of determination) regression score function.
”mse”: Mean squared error regression loss.
”mae”: Mean absolute error regression loss.
- evaluate_regression_multiple_seeds(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], Y: Union[pandas.core.series.Series, numpy.ndarray, List], n_folds: int = 3, pretrained: bool = False, group_ids: Optional[pandas.core.series.Series] = None, seeds: List[int] = [0, 1, 2]) Dict
Helper for evaluating regression tasks with multiple seeds.
- Parameters
estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.
X – pd.DataFrame or np.ndarray covariates
Y – pd.Series or np.ndarray or list outcomes
n_folds – int Number of cross-validation folds
seeds – list Random seeds
group_ids – pd.Series Optional group_ids for stratified cross-validation
- Returns
Dict containing “seeds”, “agg” and “str” nodes. The “str” node contains the aggregated prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “seeds” node contains the results for each random seed. Both “agg” and “str” nodes contain the following metrics:
”r2”: R^2(coefficient of determination) regression score function.
”mse”: Mean squared error regression loss.
”mae”: Mean absolute error regression loss.
- evaluate_survival_estimator(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], T: Union[pandas.core.series.Series, numpy.ndarray, List], Y: Union[pandas.core.series.Series, numpy.ndarray, List], time_horizons: Union[List[float], numpy.ndarray], n_folds: int = 3, seed: int = 0, pretrained: bool = False, risk_threshold: float = 0.5, group_ids: Optional[pandas.core.series.Series] = None) Dict
Helper for evaluating survival analysis tasks.
- Parameters
estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.
X – DataFrame or np.ndarray The covariates
T – Series or np.ndarray or list time to event/censoring values
Y – Series or np.ndarray or list event or censored
time_horizons – list or np.ndarray Horizons where to evaluate the performance.
n_folds – int Number of folds for cross validation
seed – int Random seed
pretrained – bool If the estimator was trained or not
group_ids – Group labels for the samples used while splitting the dataset into train/test set.
- Returns
Dict containing “raw”, “str” and “horizons” nodes. The “str” node contains prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “horizons” node splits the metrics by horizon. Each nodes contain the following metrics:
”c_index” : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
”brier_score”: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.
”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
”sensitivity”: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
”specificity”: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
”PPV”: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.
”NPV”: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.
- evaluate_survival_estimator_multiple_seeds(estimator: Any, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], T: Union[pandas.core.series.Series, numpy.ndarray, List], Y: Union[pandas.core.series.Series, numpy.ndarray, List], time_horizons: Union[List[float], numpy.ndarray], n_folds: int = 3, pretrained: bool = False, risk_threshold: float = 0.5, group_ids: Optional[pandas.core.series.Series] = None, seeds: List[int] = [0, 1, 2]) Dict
Helper for evaluating survival analysis tasks with multiple random seeds.
- Parameters
estimator – Baseline model to evaluate. if pretrained == False, it must not be fitted.
X – DataFrame or np.ndarray The covariates
T – Series or np.ndarray or list time to event
Y – Series or np.ndarray or list event or censored
time_horizons – list or np.ndarray Horizons where to evaluate the performance.
n_folds – int Number of folds for cross validation
seeds – List Random seeds
pretrained – bool If the estimator was trained or not
group_ids – Group labels for the samples used while splitting the dataset into train/test set.
- Returns
Dict containing “seeds”, “agg” and “str” nodes. The “str” node contains the aggregated prettified metrics, while the raw metrics includes tuples of form (mean, std) for each metric. The “seeds” node contains the results for each random seed. Both “agg” and “str” nodes contain the following metrics:
”c_index” : The concordance index or c-index is a metric to evaluate the predictions made by a survival algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs.
”brier_score”: The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions.
”aucroc” : the Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
”sensitivity”: Sensitivity (true positive rate) is the probability of a positive test result, conditioned on the individual truly being positive.
”specificity”: Specificity (true negative rate) is the probability of a negative test result, conditioned on the individual truly being negative.
”PPV”: The positive predictive value(PPV) is the probability that following a positive test result, that individual will truly have that specific disease.
”NPV”: The negative predictive value(NPV) is the probability that following a negative test result, that individual will truly not have that specific disease.
- score_classification_model(estimator: Any, X_train: pandas.core.frame.DataFrame, X_test: pandas.core.frame.DataFrame, y_train: pandas.core.frame.DataFrame, y_test: pandas.core.frame.DataFrame) float