auswahl.BiPLS¶
- class auswahl.BiPLS(n_intervals_to_select: int = 1, interval_width: Optional[Union[int, float]] = None, pls: Optional[PLSRegression] = None, n_cv_folds: int = 10, model_hyperparams: Optional[Union[Dict, List[Dict]]] = None, n_jobs: int = 1)[source]¶
Feature Selection with Backward interval Partial Least Squares (BiPLS).
The method separates the features space into intervals of equal width and sequentially removes the worst interval. The last interval is smaller if the total number of features is not a whole multiple of the interval width.
The BiPLS method has been described in Xiaobo et al. [1].
Read more in the User Guide.
- Parameters
- n_intervals_to_selectint, default=None
Number of intervals to select.
- interval_widthint or float, default=None
Number of features that form an interval.
- plsPLSRegression, default=None
Estimator instance of the
PLSRegressionclass. Use this to adjust the hyperparameters of the PLS method.- n_cv_foldsint, default=10
Number of cross validation folds used to evaluate intervals
- n_jobsint, default=1
Number of parallel processes that fit PLS models on the different intervals
- Attributes
- support_ndarray of shape (n_features,)
Mask of selected features.
- rank_ndarray of shape (n_features,)
Relative rank of selection. The interval with the lowest relative rank has been removed first. The finally selected intervals have a relative rank of 1.
References
- 1
Zou Xiaobo, Zhao Jiewen, Li Yanxiao, ‘Selection of the efficient wavelength regions in FT-NIR spectroscopy for determination of SSC of ‘Fuji’ apple based on BiPLS and FiPLS models’, Vibrational Spectroscopy, vol. 44, no. 2, 220–227, 2007.
Examples
>>> import numpy as np >>> from auswahl import BiPLS >>> np.random.seed(1337) >>> X = np.random.randn(100, 10) >>> y = 5 * X[:, 0] - 4 * X[:,1] - 2 * X[:, 4] + 3 * X[:, 5] # y depends on two intervals >>> selector = BiPLS(n_intervals_to_select=2, interval_width=2) >>> selector.fit(X, y).get_support() array([ True, True, False, False, True, True, False, False, False, False])
- __init__(n_intervals_to_select: int = 1, interval_width: Optional[Union[int, float]] = None, pls: Optional[PLSRegression] = None, n_cv_folds: int = 10, model_hyperparams: Optional[Union[Dict, List[Dict]]] = None, n_jobs: int = 1)[source]¶
- evaluate(X, y, model, do_cv=True, *args)¶
Conduct a cross validationand hyperparameter optimization of the underlying estimator model.
- Parameters
- X: array-like, shape (n_samples, n_features)
Spectral data to be fitted
- y: array-like, shape (n_samples,)
Regression targets
- model: BaseEstimator
Regression model
- do_cv: bool, default=True
If True, the model is fitted to the data and a cross validation score is provided
- *args: arbitrary payload
Arbitrary payload returned with the evaluation result. Used for instance for identification of threads, if multiple models are evaluated in parallel
- Returns
- tuple: float, BaseEstimator
cross validation score if requested (otherwise None) and fitted estimator
- fit(X, y, mask=None)¶
Run the feature selection process.
- Parameters
- Xarray-like of shape (n_samples, n_features)
The input samples.
- yarray-like of shape (n_samples,)
The target values.
- mask: array-like of shape (n_features,)
Mask indicating (values == 0), which features are not to be taken into account during the feature selection
- Returns
- SpectralSelectorself
Returns the instance itself.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_best_estimator() BaseEstimator¶
Retrieve the best estimator model fitted on the selected features
- Returns
- best model fitted on selected features: sklearn.base.BaseEstimator
- get_feature_names_out(input_features=None)¶
Mask feature names according to selected features.
- Parameters
- input_featuresarray-like of str or None, default=None
Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns
- feature_names_outndarray of str objects
Transformed feature names.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- get_support(indices=False)¶
Get a mask, or integer index, of the features selected.
- Parameters
- indicesbool, default=False
If True, the return value will be an array of integers, rather than a boolean mask.
- Returns
- supportarray
An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
- inverse_transform(X)¶
Reverse the transformation operation.
- Parameters
- Xarray of shape [n_samples, n_selected_features]
The input samples.
- Returns
- X_rarray of shape [n_samples, n_original_features]
X with columns of zeros inserted where features would have been removed by
transform().
- reseed(seed: Union[int, RandomState])¶
Random state updating interface for benchmarking. Selector methods with more complex internal structure (such as methods wrapping other methods) are required to override this function accordingly.
- rethread(n_jobs: int)¶
n_jobs updating interface for benchmarking. Selector methods with more complex internal structure (such as methods wrapping other methods) are required to override this function accordingly.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- transform(X)¶
Reduce X to the selected features.
- Parameters
- Xarray of shape [n_samples, n_features]
The input samples.
- Returns
- X_rarray of shape [n_samples, n_selected_features]
The input samples with only the selected features.