Active Learning for experimental exploration

Aus SDQ-Institutsseminar
Vortragende(r) Steven Lorenz
Vortragstyp Proposal
Betreuer(in) Federico Matteucci
Termin Fr 12. Mai 2023
Vortragsmodus in Präsenz
Kurzfassung In this thesis, we are working with rankings. A ranking is obtained by applying a set of

encoders to an experimental condition (dataset, model, tuning, scoring) and rank them according to their averaged cv score. Furthermore, we can aggregate a set of rankings into a single consensus ranking, i.e. by taking the mean or median rank for each encoder. The goal of the thesis is to explore the space of possible consensus rankings, while running as few experiments as possible because it can be a time-consuming task. To make predictions on the consensus rankings, we employ a model capable of predicting the ranking of encoders given an experimental condition: (dataset, model, tuning, scoring) → associated ranking of encoders We can use this model to make predictions on the consensus rankings by taking a set of experimental conditions {E_1,...,E_N}, predict their rankings and aggregating the predictions into a consensus ranking. For this task, we evaluated different models (Decision Trees, Random Forests, SVM) using various encoding schemes (One-hot, BaseN, Label), with and without the use of meta features and using kendalls tau as evaluation metric. The DecisionTree achieved the best results thus far. To this model, we apply active learning to avoid running unnecessary experiments. In active learning, the model can decide which data points should be labeled next and subsequently decide the data it is trained on, in order to achieve greater accuracy with fewer labeled training instances. In our case, labeling data points is equivalent to obtaining the ranking of encoders of an experimental condition. Thus, we are minimizing the amount of experiments to be run.