Hauptseite

Aus IPD-Institutsseminar
Wechseln zu: Navigation, Suche

Das Institutsseminar des Instituts für Programmstrukturen und Datenorganisation (IPD) ist eine ständige Lehrveranstaltung, die den Zweck hat, über aktuelle Forschungsarbeiten am Institut zu informieren. Insbesondere soll Studierenden am Institut die Gelegenheit gegeben werden, über ihre Bachelor- und Masterarbeiten vor einem größeren Auditorium zu berichten. Schwerpunkte liegen dabei auf der Problemstellung, den Lösungsansätzen und den erzielten Ergebnissen. Das Seminar steht aber allen Studierenden und Mitarbeiter/-innen des KIT sowie sonstigen Interessierten offen.

Ort Gebäude 50.34, Seminarraum 348 oder online, siehe Beschreibung
Zeit jeweils freitags, 11:30–13:00 Uhr / 14:00–15:30 Uhr

Die Vorträge müssen den folgenden zeitlichen Rahmen einhalten:

  • Masterarbeit: 30 Minuten Redezeit + 15 Minuten Diskussion
  • Bachelorarbeit: 20 Minuten Redezeit + 10 Minuten Diskussion
  • Proposal: 12 Minuten Redezeit + 8 Minuten Diskussion

Weitere Informationen: https://sdqweb.ipd.kit.edu/wiki/Institutsseminar. Bei Fragen und Anmerkungen können Sie eine E-Mail an das Institutsseminar-Team schreiben.

Nächste Vorträge

Freitag, 16. April 2021, 11:30 Uhr, https://conf.dfn.de/webapp/conference/979160755
Vortragende(r) Tanja Fenn
Titel Change Detection in High Dimensional Data Streams
Vortragstyp Proposal
Betreuer(in) Edouard Fouché
Kurzfassung The data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect "when" a drift occurs and neglect the fact that various drifts may occur in different dimensions, i.e., they do not detect "where" a drift occurs. This is particularly problematic when data streams are high-dimensional.

The goal of this Master’s thesis is to develop and evaluate a framework to efficiently and effectively detect “when” and “where” concept drift occurs in high-dimensional data streams. We introduce stream autoencoder windowing (SAW), an approach based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems compared to benchmarks on real data streams.

Vortragende(r) Patrick Ehrler
Titel Feature Selection using Bayesian Optimization
Vortragstyp Bachelorarbeit
Betreuer(in) Jakob Bach
Kurzfassung Datasets, like gene profiles from cancer patients, can have a large number of features. In order to apply prediction techniques, a lot of computing time and memory is needed. A solution to this problem is to reduce the number of features, whereby the main challenge is to still receive a satisfactory prediction performance afterwards. There are many state-of-the-art feature selection techniques, but they all have their limitations. We use Bayesian optimization, a technique to optimize expensive black-box-functions, and apply it to the problem of feature selection. Thereby, we face the challenge to adjust the standard optimization procedure to work with a discrete-valued search space, but also to find a way to optimize the acquisition function efficiently.

Overall, we propose 10 different Bayesian optimization feature selection approaches and evaluate their performance experimentally on 28 OpenML classification datasets. We do not only compare the approaches among themselves, but also to 9 state-of-the-art feature selection approaches. Our results state that especially four of our approaches perform well and can compete to most state-of-the-art approaches in terms of prediction performance. In terms of runtime, all our approaches do not perform outstandingly good, but similar to some filter and wrapper approaches.