Augmenting Bandit Algorithms with Domain Knowledge

Aus IPD-Institutsseminar
Version vom 21. Juli 2021, 02:03 Uhr von Pawel Bielski (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Tom George | |vortragstyp=Masterarbeit |betreuer=Pawel Bielski |termin=Institutsseminar/2021-07-23 Zusatzter…“)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Zur Navigation springen Zur Suche springen
Vortragende(r) Tom George
Vortragstyp Masterarbeit
Betreuer(in) Pawel Bielski
Termin Fr 23. Juli 2021
Kurzfassung Bandit algorithms are a family of algorithms that efficiently solve sequential decision problems, like monitoring in a cloud computing system, news recommendations or clinical trials. In such problems there is a trade of between exploring new options and exploiting presumably good ones and bandit algorithms provide theoretical guarantees while being practical.

While some approaches use additional information about the current state of the environment, bandit algorithms tend to ignore domain knowledge that can’t be extracted from data. It is not clear how to incorporate domain knowledge into bandit algorithms and how much improvement this yields.

In this masters thesis we propose two ways to augment bandit algorithms with domain knowledge: a push approach, which influences the distribution of arms to deal with non-stationarity as well as a group approach, which propagates feedback between similar arms. We conduct synthetic and real world experiments to examine the usefulness of our approaches. Additionally we evaluate the effect of incomplete and incorrect domain knowledge. We show that the group approach helps to reduce exploration time, especially for small number of iterations and plays, and that the push approach outperforms contextual and non-contextual baselines for large context spaces.