Active Learning for experimental exploration: Unterschied zwischen den Versionen

Aus SDQ-Institutsseminar
(Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Steven Lorenz |email=uxoyb@student.kit.edu |vortragstyp=Proposal |betreuer=Federico Matteucci |termin=Institutsseminar/2023-05-12 |vort…“)
 
Keine Bearbeitungszusammenfassung
Zeile 6: Zeile 6:
|termin=Institutsseminar/2023-05-12
|termin=Institutsseminar/2023-05-12
|vortragsmodus=in Präsenz
|vortragsmodus=in Präsenz
|kurzfassung=Kurzfassung
|kurzfassung=In this thesis, we are working with rankings. A ranking is obtained by applying a set of
encoders to an experimental condition (dataset, model, tuning, scoring) and rank them
according to their averaged cv score. Furthermore, we can aggregate a set of rankings into a
single consensus ranking, i.e. by taking the mean or median rank for each encoder. The goal
of the thesis is to explore the space of possible consensus rankings, while running as few
experiments as possible because it can be a time-consuming task.
To make predictions on the consensus rankings, we employ a model capable of predicting
the ranking of encoders given an experimental condition:
(dataset, model, tuning, scoring) → associated ranking of encoders
We can use this model to make predictions on the consensus rankings by taking a set of
experimental conditions {E_1,...,E_N}, predict their rankings and aggregating the predictions
into a consensus ranking.
For this task, we evaluated different models (Decision Trees, Random Forests, SVM) using
various encoding schemes (One-hot, BaseN, Label), with and without the use of meta
features and using kendalls tau as evaluation metric. The DecisionTree achieved the best
results thus far.
To this model, we apply active learning to avoid running unnecessary experiments. In active
learning, the model can decide which data points should be labeled next and subsequently
decide the data it is trained on, in order to achieve greater accuracy with fewer labeled
training instances. In our case, labeling data points is equivalent to obtaining the ranking of
encoders of an experimental condition. Thus, we are minimizing the amount of experiments
to be run.
}}
}}

Version vom 5. Mai 2023, 13:02 Uhr

Vortragende(r) Steven Lorenz
Vortragstyp Proposal
Betreuer(in) Federico Matteucci
Termin Fr 12. Mai 2023
Vortragsmodus in Präsenz
Kurzfassung In this thesis, we are working with rankings. A ranking is obtained by applying a set of

encoders to an experimental condition (dataset, model, tuning, scoring) and rank them according to their averaged cv score. Furthermore, we can aggregate a set of rankings into a single consensus ranking, i.e. by taking the mean or median rank for each encoder. The goal of the thesis is to explore the space of possible consensus rankings, while running as few experiments as possible because it can be a time-consuming task. To make predictions on the consensus rankings, we employ a model capable of predicting the ranking of encoders given an experimental condition: (dataset, model, tuning, scoring) → associated ranking of encoders We can use this model to make predictions on the consensus rankings by taking a set of experimental conditions {E_1,...,E_N}, predict their rankings and aggregating the predictions into a consensus ranking. For this task, we evaluated different models (Decision Trees, Random Forests, SVM) using various encoding schemes (One-hot, BaseN, Label), with and without the use of meta features and using kendalls tau as evaluation metric. The DecisionTree achieved the best results thus far. To this model, we apply active learning to avoid running unnecessary experiments. In active learning, the model can decide which data points should be labeled next and subsequently decide the data it is trained on, in order to achieve greater accuracy with fewer labeled training instances. In our case, labeling data points is equivalent to obtaining the ranking of encoders of an experimental condition. Thus, we are minimizing the amount of experiments to be run.