auswahl.benchmarking.benchmark

auswahl.benchmarking.benchmark(data: ~typing.List[~typing.Tuple[~numpy.array, ~numpy.array, str, float]], features: ~typing.List[~typing.Union[int, ~typing.Tuple[int, int]]], methods: ~typing.List[~typing.Union[~auswahl._base.SpectralSelector, ~typing.Tuple[~auswahl._base.SpectralSelector, str]]], n_runs: int = 10, reg_metrics: ~typing.List[~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float]] = <function mean_squared_error>, random_state: ~typing.Optional[~typing.Union[int, ~numpy.random.mtrand.RandomState]] = None, stab_metrics: ~typing.Optional[~typing.List[~auswahl.benchmarking.util.metrics.StabilityScore]] = None, n_jobs: int = 1, error_log_file: str = './error_log.txt', verbose: bool = True)[source]

Function performing benchmarking of Interval- and PointSelector feature selectors across different datasets and different parameterizations of the selectors.

Parameters
data: List of tuples (np.array, np.array, str, float)

list of tuples describing datasets (x, y, dataset_name, train_size)

features: List of integers or tuple of integers (int, int)

Descriptor of the number of features to be selected. If an integer, the integer describes the number of features to be selected. If a tuple, the tuple is interpreted as (#intervals to select, interval width). If an IntervalSelector is included in the benchmarking, the features have to be described as tuples.

n_runs: int, default=10

Number of runs per method, dataset and number of features to be selected. Used to elucidate method performance and selection stability.

reg_metrics: List of Callable[[np.ndarray, np.ndarray], float], default=sklearn.metrics.mean_square_error

List of regression metrics to be evaluated and made available after the benchmarking

stab_metrics: List of Callable[[DataHandler], float], default=None

List of stability metrics to be evaluated and made available after the benchmarking

methods: List of SpectralSelector or tuples (SpectralSelector, str)

List of instances of classes subtyping SpectralSelector. If the class names of the instances’ classes are not unique a tuple has to be passed specifying the name (instance, name)

random_state: int or numpy.random.RandomState, default=None

RandomState for reproducibility of the benchmarking results

n_jobs: int, default=1

Number of jobs to be used during benchmarking. It is recommended to provide jobs to the benchmarking instead of individual selectors

error_log_file: str, default=”./error_log.txt”

location and name of the file, in which errors are to be logged

verbose: bool, default=True

If True, basic information of the state of benchmarking are plotted

Returns
benchmarking results:class:~auswahl.benchmarking.DataHandler

DataHandler object containing the results of the benchmarking. Data regarding regression, stability, selection and run time measurement.

Examples using auswahl.benchmarking.benchmark

Benchmarking - Example

Benchmarking - Example

Benchmarking - Example