autoprognosis.plugins.ensemble.combos module

Stacking (meta ensembling). See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ for more information.

class BaseAggregator(base_estimators, pre_fitted=False)

Bases: ABC

Abstract class for all combination classes.

Stacking (meta ensembling). See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/for more information.

Parameters:

base_estimators (list, length must be greater than 1) – A list of base estimators. Certain methods must be present, e.g., fit and predict.
pre_fitted (bool, optional (default=False)) – Whether the base estimators are trained. If True, fit process may be skipped.

abstract fit(X, y=None)

Fit estimator. y is optional for unsupervised methods.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Return type:

self

abstract fit_predict(X, y=None)

Fit estimator and predict on X. y is optional for unsupervised methods.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters:: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

abstract predict(X)

Predict the class labels for the provided data.

Parameters:: X (numpy array of shape (n_samples, n_features)) – The input samples.
Returns:: labels – Class labels for each data sample.
Return type:: numpy array of shape (n_samples,)

abstract predict_proba(X)

Return probability estimates for the test data X.

Parameters:: X (numpy array of shape (n_samples, n_features)) – The input samples.
Returns:: p – The class probabilities of the input samples. Classes are ordered by lexicographic order.
Return type:: numpy array of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns:: self
Return type:: object

class SimpleClassifierAggregator(base_estimators, method='average', threshold=0.5, weights=None, pre_fitted=False)

Bases: BaseAggregator

A collection of simple classifier combination methods.

Parameters:

base_estimators (list or numpy array (n_estimators,)) – A list of base classifiers.
method (str, optional (default='average')) – Combination method: {‘average’, ‘maximization’, ‘majority vote’, ‘median’}. Pass in weights of classifier for weighted version.
threshold (float in (0, 1), optional (default=0.5)) – Cut-off value to convert scores into binary labels.
weights (numpy array of shape (1, n_classifiers)) – Classifier weights.
pre_fitted (bool, optional (default=False)) – Whether the base classifiers are trained. If True, fit process may be skipped.

fit(X, y)

Fit classifier.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

fit_predict(X, y)

Fit estimator and predict on X

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters:: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

predict(X)

Predict the class labels for the provided data.

Parameters:: X (numpy array of shape (n_samples, n_features)) – The input samples.
Returns:: labels – Class labels for each data sample.
Return type:: numpy array of shape (n_samples,)

predict_proba(X)

Return probability estimates for the test data X.

Parameters:: X (numpy array of shape (n_samples, n_features)) – The input samples.
Returns:: p – The class probabilities of the input samples. Classes are ordered by lexicographic order.
Return type:: numpy array of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns:: self
Return type:: object

class Stacking(base_estimators, meta_clf=None, n_folds=3, keep_original=True, use_proba=False, shuffle_data=False, random_state=None, threshold=None, pre_fitted=None)

Bases: BaseAggregator

Meta ensembling, also known as stacking. See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ for more information

Parameters:

base_estimators (list or numpy array (n_estimators,)) – A list of base classifiers.
n_folds (int, optional (default=2)) – The number of splits of the training sample.
keep_original (bool, optional (default=False)) – If True, keep the original features for training and predicting.
use_proba (bool, optional (default=False)) – If True, use the probability prediction as the new features.
shuffle_data (bool, optional (default=False)) – If True, shuffle the input data.
random_state (int, RandomState or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
threshold (float in (0, 1), optional (default=None)) – Cut-off value to convert scores into binary labels.
pre_fitted (bool, optional (default=False)) – Whether the base classifiers are trained. If True, fit process may be skipped.

fit(X, y)

Fit classifier.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

fit_predict(X, y)

Fit estimator and predict on X

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters:: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

predict(X)

Predict the class labels for the provided data.

Parameters:: X (numpy array of shape (n_samples, n_features)) – The input samples.
Returns:: labels – Class labels for each data sample.
Return type:: numpy array of shape (n_samples,)

predict_proba(X)

Return probability estimates for the test data X.

Parameters:: X (numpy array of shape (n_samples, n_features)) – The input samples.
Returns:: p – The class probabilities of the input samples. Classes are ordered by lexicographic order.
Return type:: numpy array of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns:: self
Return type:: object

average(scores, estimator_weights=None)

Combination method to merge the scores from multiple estimators by taking the average.

Parameters:

scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
estimator_weights (numpy array of shape (1, n_estimators)) – If specified, using weighted average.

Returns:

combined_scores – The combined scores.

Return type:

numpy array of shape (n_samples, )

list_diff(first_list, second_list)

Utility function to calculate list difference (first_list-second_list) :param first_list: First list. :type first_list: list :param second_list: Second list. :type second_list: list

Returns:: diff
Return type:: different elements.

majority_vote(scores, n_classes=2, weights=None)

Combination method to merge the scores from multiple estimators by majority vote.

Parameters:

scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
n_classes (int, optional (default=2)) – The number of classes in scores matrix
weights (numpy array of shape (1, n_estimators)) – If specified, using weighted majority weight.

Returns:

combined_scores – The combined scores.

Return type:

numpy array of shape (n_samples, )

maximization(scores)

Combination method to merge the scores from multiple estimators by taking the maximum.

Parameters:: scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
Returns:: combined_scores – The combined scores.
Return type:: numpy array of shape (n_samples, )

median(scores)

Combination method to merge the scores from multiple estimators by taking the median.

Parameters:: scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
Returns:: combined_scores – The combined scores.
Return type:: numpy array of shape (n_samples, )

score_to_proba(scores)

Internal function to random score matrix into probability. :param scores: Raw score matrix. :type scores: numpy array of shape (n_samples, n_classes)

Returns:: proba – Scaled probability matrix.
Return type:: numpy array of shape (n_samples, n_classes)

split_datasets(X, y, n_folds=3, shuffle_data=False, random_state=None)

Utility function to split the data for stacking. The data is split into n_folds with roughly equal rough size.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,)) – The ground truth of the input samples (labels).
n_folds (int, optional (default=3)) – The number of splits of the training sample.
shuffle_data (bool, optional (default=False)) – If True, shuffle the input data.
random_state (RandomState, optional (default=None)) – A random number generator instance to define the state of the random permutations generator.

Returns:

X (numpy array of shape (n_samples, n_features)) – The input samples. If shuffle_data, return the shuffled data.
y (numpy array of shape (n_samples,)) – The ground truth of the input samples (labels). If shuffle_data, return the shuffled data.
index_lists (list of list) – The list of indexes of each fold regarding the returned X and y. For instance, index_lists[0] contains the indexes of fold 0.