auswahl.benchmarking.benchmark¶
- auswahl.benchmarking.benchmark(data: ~typing.List[~typing.Tuple[~numpy.array, ~numpy.array, str, float]], features: ~typing.List[~typing.Union[int, ~typing.Tuple[int, int]]], methods: ~typing.List[~typing.Union[~auswahl._base.SpectralSelector, ~typing.Tuple[~auswahl._base.SpectralSelector, str]]], n_runs: int = 10, reg_metrics: ~typing.List[~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float]] = <function mean_squared_error>, random_state: ~typing.Optional[~typing.Union[int, ~numpy.random.mtrand.RandomState]] = None, stab_metrics: ~typing.Optional[~typing.List[~auswahl.benchmarking.util.metrics.StabilityScore]] = None, n_jobs: int = 1, error_log_file: str = './error_log.txt', verbose: bool = True)[source]¶
Function performing benchmarking of Interval- and PointSelector feature selectors across different datasets and different parameterizations of the selectors.
- Parameters
- data: List of tuples (np.array, np.array, str, float)
list of tuples describing datasets (x, y, dataset_name, train_size)
- features: List of integers or tuple of integers (int, int)
Descriptor of the number of features to be selected. If an integer, the integer describes the number of features to be selected. If a tuple, the tuple is interpreted as (#intervals to select, interval width). If an IntervalSelector is included in the benchmarking, the features have to be described as tuples.
- n_runs: int, default=10
Number of runs per method, dataset and number of features to be selected. Used to elucidate method performance and selection stability.
- reg_metrics: List of Callable[[np.ndarray, np.ndarray], float], default=sklearn.metrics.mean_square_error
List of regression metrics to be evaluated and made available after the benchmarking
- stab_metrics: List of Callable[[DataHandler], float], default=None
List of stability metrics to be evaluated and made available after the benchmarking
- methods: List of SpectralSelector or tuples (SpectralSelector, str)
List of instances of classes subtyping
SpectralSelector. If the class names of the instances’ classes are not unique a tuple has to be passed specifying the name (instance, name)- random_state: int or numpy.random.RandomState, default=None
RandomState for reproducibility of the benchmarking results
- n_jobs: int, default=1
Number of jobs to be used during benchmarking. It is recommended to provide jobs to the benchmarking instead of individual selectors
- error_log_file: str, default=”./error_log.txt”
location and name of the file, in which errors are to be logged
- verbose: bool, default=True
If True, basic information of the state of benchmarking are plotted
- Returns
- benchmarking results:class:~auswahl.benchmarking.DataHandler
DataHandlerobject containing the results of the benchmarking. Data regarding regression, stability, selection and run time measurement.