autoprognosis.plugins.ensemble.combos module
Stacking (meta ensembling). See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ for more information.
- class BaseAggregator(base_estimators, pre_fitted=False)
Bases:
ABCAbstract class for all combination classes.
Stacking (meta ensembling). See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/for more information.
- Parameters:
base_estimators (list, length must be greater than 1) – A list of base estimators. Certain methods must be present, e.g., fit and predict.
pre_fitted (bool, optional (default=False)) – Whether the base estimators are trained. If True, fit process may be skipped.
- abstract fit(X, y=None)
Fit estimator. y is optional for unsupervised methods.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).
- Return type:
self
- abstract fit_predict(X, y=None)
Fit estimator and predict on X. y is optional for unsupervised methods.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).
- Returns:
labels – Class labels for each data sample.
- Return type:
numpy array of shape (n_samples,)
- get_params(deep=True)
Get parameters for this estimator.
See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
- Parameters:
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- abstract predict(X)
Predict the class labels for the provided data.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
- Returns:
labels – Class labels for each data sample.
- Return type:
numpy array of shape (n_samples,)
- abstract predict_proba(X)
Return probability estimates for the test data X.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
- Returns:
p – The class probabilities of the input samples. Classes are ordered by lexicographic order.
- Return type:
numpy array of shape (n_samples,)
- set_params(**params)
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
- Returns:
self
- Return type:
object
- class SimpleClassifierAggregator(base_estimators, method='average', threshold=0.5, weights=None, pre_fitted=False)
Bases:
BaseAggregatorA collection of simple classifier combination methods.
- Parameters:
base_estimators (list or numpy array (n_estimators,)) – A list of base classifiers.
method (str, optional (default='average')) – Combination method: {‘average’, ‘maximization’, ‘majority vote’, ‘median’}. Pass in weights of classifier for weighted version.
threshold (float in (0, 1), optional (default=0.5)) – Cut-off value to convert scores into binary labels.
weights (numpy array of shape (1, n_classifiers)) – Classifier weights.
pre_fitted (bool, optional (default=False)) – Whether the base classifiers are trained. If True, fit process may be skipped.
- fit(X, y)
Fit classifier.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).
- fit_predict(X, y)
Fit estimator and predict on X
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).
- Returns:
labels – Class labels for each data sample.
- Return type:
numpy array of shape (n_samples,)
- get_params(deep=True)
Get parameters for this estimator.
See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
- Parameters:
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- predict(X)
Predict the class labels for the provided data.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
- Returns:
labels – Class labels for each data sample.
- Return type:
numpy array of shape (n_samples,)
- predict_proba(X)
Return probability estimates for the test data X.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
- Returns:
p – The class probabilities of the input samples. Classes are ordered by lexicographic order.
- Return type:
numpy array of shape (n_samples,)
- set_params(**params)
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
- Returns:
self
- Return type:
object
- class Stacking(base_estimators, meta_clf=None, n_folds=3, keep_original=True, use_proba=False, shuffle_data=False, random_state=None, threshold=None, pre_fitted=None)
Bases:
BaseAggregatorMeta ensembling, also known as stacking. See http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ for more information
- Parameters:
base_estimators (list or numpy array (n_estimators,)) – A list of base classifiers.
n_folds (int, optional (default=2)) – The number of splits of the training sample.
keep_original (bool, optional (default=False)) – If True, keep the original features for training and predicting.
use_proba (bool, optional (default=False)) – If True, use the probability prediction as the new features.
shuffle_data (bool, optional (default=False)) – If True, shuffle the input data.
random_state (int, RandomState or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
threshold (float in (0, 1), optional (default=None)) – Cut-off value to convert scores into binary labels.
pre_fitted (bool, optional (default=False)) – Whether the base classifiers are trained. If True, fit process may be skipped.
- fit(X, y)
Fit classifier.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).
- fit_predict(X, y)
Fit estimator and predict on X
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,), optional (default=None)) – The ground truth of the input samples (labels).
- Returns:
labels – Class labels for each data sample.
- Return type:
numpy array of shape (n_samples,)
- get_params(deep=True)
Get parameters for this estimator.
See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
- Parameters:
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- predict(X)
Predict the class labels for the provided data.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
- Returns:
labels – Class labels for each data sample.
- Return type:
numpy array of shape (n_samples,)
- predict_proba(X)
Return probability estimates for the test data X.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
- Returns:
p – The class probabilities of the input samples. Classes are ordered by lexicographic order.
- Return type:
numpy array of shape (n_samples,)
- set_params(**params)
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
- Returns:
self
- Return type:
object
- average(scores, estimator_weights=None)
Combination method to merge the scores from multiple estimators by taking the average.
- Parameters:
scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
estimator_weights (numpy array of shape (1, n_estimators)) – If specified, using weighted average.
- Returns:
combined_scores – The combined scores.
- Return type:
numpy array of shape (n_samples, )
- list_diff(first_list, second_list)
Utility function to calculate list difference (first_list-second_list) :param first_list: First list. :type first_list: list :param second_list: Second list. :type second_list: list
- Returns:
diff
- Return type:
different elements.
- majority_vote(scores, n_classes=2, weights=None)
Combination method to merge the scores from multiple estimators by majority vote.
- Parameters:
scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
n_classes (int, optional (default=2)) – The number of classes in scores matrix
weights (numpy array of shape (1, n_estimators)) – If specified, using weighted majority weight.
- Returns:
combined_scores – The combined scores.
- Return type:
numpy array of shape (n_samples, )
- maximization(scores)
Combination method to merge the scores from multiple estimators by taking the maximum.
- Parameters:
scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
- Returns:
combined_scores – The combined scores.
- Return type:
numpy array of shape (n_samples, )
- median(scores)
Combination method to merge the scores from multiple estimators by taking the median.
- Parameters:
scores (numpy array of shape (n_samples, n_estimators)) – Score matrix from multiple estimators on the same samples.
- Returns:
combined_scores – The combined scores.
- Return type:
numpy array of shape (n_samples, )
- score_to_proba(scores)
Internal function to random score matrix into probability. :param scores: Raw score matrix. :type scores: numpy array of shape (n_samples, n_classes)
- Returns:
proba – Scaled probability matrix.
- Return type:
numpy array of shape (n_samples, n_classes)
- split_datasets(X, y, n_folds=3, shuffle_data=False, random_state=None)
Utility function to split the data for stacking. The data is split into n_folds with roughly equal rough size.
- Parameters:
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (numpy array of shape (n_samples,)) – The ground truth of the input samples (labels).
n_folds (int, optional (default=3)) – The number of splits of the training sample.
shuffle_data (bool, optional (default=False)) – If True, shuffle the input data.
random_state (RandomState, optional (default=None)) – A random number generator instance to define the state of the random permutations generator.
- Returns:
X (numpy array of shape (n_samples, n_features)) – The input samples. If shuffle_data, return the shuffled data.
y (numpy array of shape (n_samples,)) – The ground truth of the input samples (labels). If shuffle_data, return the shuffled data.
index_lists (list of list) – The list of indexes of each fold regarding the returned X and y. For instance, index_lists[0] contains the indexes of fold 0.