AutoPrognosis regression

Welcome! This tutorial will walk you through the steps of selecting a model for a regression task using AutoPrognosis.

Setup

[ ]:
# stdlib
import json
import warnings

# third party
import pandas as pd
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

Import RegressionStudy

RegressionStudy is the engine that learns an ensemble of regression pipelines and their hyperparameters automatically.

[ ]:
# autoprognosis absolute
from autoprognosis.studies.regression import RegressionStudy

Load the target dataset

AutoPrognosis expects pandas.DataFrames as input.

For this example, we will use the Airfoil Self-Noise Data Set.

[ ]:
# third party
import pandas as pd

df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
    header=None,
    sep="\\t",
)


last_col = df.columns[-1]

y = df[last_col]
X = df.drop(columns=[last_col])


df = X.copy()
df["target"] = y

df

Create the regressor

While AutoPrognosis provides default plugins, it allows the user to customize the plugins for the pipelines.

You can see the supported plugins below:

[ ]:
# stdlib
# List the available plugins
import json

# autoprognosis absolute
from autoprognosis.plugins import Plugins

print(json.dumps(Plugins().list_available(), indent=2))

We will set a few custom plugins for the pipelines and create the classifier study.

[ ]:
# stdlib
from pathlib import Path

workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name = "regression_example"

study = RegressionStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=10,  # DELETE THIS LINE FOR BETTER RESULTS.  how many trials to do for each candidate. Default: 50
    num_study_iter=2,  # DELETE THIS LINE FOR BETTER RESULTS.  how many outer iterations to do. Default: 5
    regressors=[
        "linear_regression",
        "xgboost_regressor",
    ],  # DELETE THIS LINE FOR BETTER RESULTS.
    workspace=workspace,
)

Search for the optimal ensemble

[ ]:
study.run()
[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression

output = workspace / study_name / "model.p"

model = load_model_from_file(output)

metrics = evaluate_regression(model, X, y)

f"Model {model.name()} score: {metrics['raw']}"

Serialization

[ ]:
# autoprognosis absolute
from autoprognosis.utils.serialization import load_from_file, save_to_file

out = workspace / "tmp.bkp"

# Fit the model
model.fit(X, y)

# Save
save_to_file(out, model)

# Reload
loaded_model = load_from_file(out)

print(loaded_model.name())

assert loaded_model.name() == model.name()

out.unlink()

Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

Star AutoPrognosis on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we’re building.