autoprognosis.plugins.ensemble.combos module

Stacking (meta ensembling). See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ for more information.

class BaseAggregator(base_estimators, pre_fitted=False)

Bases: ABC

Abstract class for all combination classes.

Parameters:
  • base_estimators (list, length must be greater than 1) – A list of base estimators. Certain methods must be present, e.g., fit and predict.

  • pre_fitted (bool, optional (default=False)) – Whether the base estimators are trained. If True, fit process may be skipped.

abstract fit(X, y=None)

Fit estimator. y is optional for unsupervised methods.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Return type:

self

abstract fit_predict(X, y=None)

Fit estimator and predict on X. y is optional for unsupervised methods.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters:

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

abstract predict(X)

Predict the class labels for the provided data.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

abstract predict_proba(X)

Return probability estimates for the test data X.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns:

p – The class probabilities of the input samples. Classes are ordered by lexicographic order.

Return type:

numpy array of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns:

self

Return type:

object

class SimpleClassifierAggregator(base_estimators, method='average', threshold=0.5, weights=None, pre_fitted=False)

Bases: BaseAggregator

A collection of simple classifier combination methods.

Parameters:
  • base_estimators (list or numpy array (n_estimators,)) – A list of base classifiers.

  • method (str, optional (default='average')) – Combination method: {‘average’, ‘maximization’, ‘majority vote’, ‘median’}. Pass in weights of classifier for weighted version.

  • threshold (float in (0, 1), optional (default=0.5)) – Cut-off value to convert scores into binary labels.

  • weights (numpy array of shape (1, n_classifiers)) – Classifier weights.

  • pre_fitted (bool, optional (default=False)) – Whether the base classifiers are trained. If True, fit process may be skipped.

fit(X, y)

Fit classifier.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

fit_predict(X, y)

Fit estimator and predict on X

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters:

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

predict(X)

Predict the class labels for the provided data.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

predict_proba(X)

Return probability estimates for the test data X.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns:

p – The class probabilities of the input samples. Classes are ordered by lexicographic order.

Return type:

numpy array of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns:

self

Return type:

object

class Stacking(base_estimators, meta_clf=None, n_folds=3, keep_original=True, use_proba=False, shuffle_data=False, random_state=None, threshold=None, pre_fitted=None)

Bases: BaseAggregator

Meta ensembling, also known as stacking. See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ for more information

Parameters:
  • base_estimators (list or numpy array (n_estimators,)) – A list of base classifiers.

  • n_folds (int, optional (default=2)) – The number of splits of the training sample.

  • keep_original (bool, optional (default=False)) – If True, keep the original features for training and predicting.

  • use_proba (bool, optional (default=False)) – If True, use the probability prediction as the new features.

  • shuffle_data (bool, optional (default=False)) – If True, shuffle the input data.

  • random_state (int, RandomState or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • threshold (float in (0, 1), optional (default=None)) – Cut-off value to convert scores into binary labels.

  • pre_fitted (bool, optional (default=False)) – Whether the base classifiers are trained. If True, fit process may be skipped.

fit(X, y)

Fit classifier.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

fit_predict(X, y)

Fit estimator and predict on X

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters:

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

predict(X)

Predict the class labels for the provided data.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns:

labels – Class labels for each data sample.

Return type:

numpy array of shape (n_samples,)

predict_proba(X)

Return probability estimates for the test data X.

Parameters:

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns:

p – The class probabilities of the input samples. Classes are ordered by lexicographic order.

Return type:

numpy array of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns:

self

Return type:

object

average(scores, estimator_weights=None)

Combination method to merge the scores from multiple estimators by taking the average.

Parameters:
  • scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.

  • estimator_weights (numpy array of shape (1, n_estimators)) – If specified, using weighted average.

Returns:

combined_scores – The combined scores.

Return type:

numpy array of shape (n_samples, )

list_diff(first_list, second_list)

Utility function to calculate list difference (first_list-second_list) :param first_list: First list. :type first_list: list :param second_list: Second list. :type second_list: list

Returns:

diff

Return type:

different elements.

majority_vote(scores, n_classes=2, weights=None)

Combination method to merge the scores from multiple estimators by majority vote.

Parameters:
  • scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.

  • n_classes (int, optional (default=2)) – The number of classes in scores matrix

  • weights (numpy array of shape (1, n_estimators)) – If specified, using weighted majority weight.

Returns:

combined_scores – The combined scores.

Return type:

numpy array of shape (n_samples, )

maximization(scores)

Combination method to merge the scores from multiple estimators by taking the maximum.

Parameters:

scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.

Returns:

combined_scores – The combined scores.

Return type:

numpy array of shape (n_samples, )

median(scores)

Combination method to merge the scores from multiple estimators by taking the median.

Parameters:

scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.

Returns:

combined_scores – The combined scores.

Return type:

numpy array of shape (n_samples, )

score_to_proba(scores)

Internal function to random score matrix into probability. :param scores: Raw score matrix. :type scores: numpy array of shape (n_samples, n_classes)

Returns:

proba – Scaled probability matrix.

Return type:

numpy array of shape (n_samples, n_classes)

split_datasets(X, y, n_folds=3, shuffle_data=False, random_state=None)

Utility function to split the data for stacking. The data is split into n_folds with roughly equal rough size.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples,)) – The ground truth of the input samples (labels).

  • n_folds (int, optional (default=3)) – The number of splits of the training sample.

  • shuffle_data (bool, optional (default=False)) – If True, shuffle the input data.

  • random_state (RandomState, optional (default=None)) – A random number generator instance to define the state of the random permutations generator.

Returns:

  • X (numpy array of shape (n_samples, n_features)) – The input samples. If shuffle_data, return the shuffled data.

  • y (numpy array of shape (n_samples,)) – The ground truth of the input samples (labels). If shuffle_data, return the shuffled data.

  • index_lists (list of list) – The list of indexes of each fold regarding the returned X and y. For instance, index_lists[0] contains the indexes of fold 0.