Standardized Real-World Change Detection Data: Unterschied zwischen den Versionen

Aus SDQ-Institutsseminar
Keine Bearbeitungszusammenfassung
Keine Bearbeitungszusammenfassung
Zeile 6: Zeile 6:
|termin=Institutsseminar/2022-05-13 Zusatztermin
|termin=Institutsseminar/2022-05-13 Zusatztermin
|vortragsmodus=in Präsenz
|vortragsmodus=in Präsenz
|kurzfassung=Kurzfassung
|kurzfassung=Change point detection is a fundamental task with many applications in finance, bioinformatics and other areas. The basic assumption is that the distribution generating a data set might change at a so-called “Change Point” over time. The detection of those points is crucial and in practice an unsupervised problem. In order to analyse given algorithms for change point detection, there has to be labled data. Only few labled real world data sets are publicly available and many of them are either too small, reused, preprocessed or ambiguous. Recently, there has been a publication of data sets annotated by data scientists and ML researchers and an assessment of 14 algorithms on their data. Because they did the labelling by hand, there are issues raised. Can humans correctly identify changes and be consistent?
The goal of this Bachelor Thesis is to algorithmically label this data set and extend it. This is done by constructing a non-parametric hypothesis test using Maximum Mean Discrepancy (MMD) as a statistic and approximating the null-distribution performing a permutation test.
The obtained results should be analysed and compared to the human labelling. Furthermore, a new assessment of change point detection algorithms should be performed and again compared to the given one.
}}
}}

Version vom 8. Mai 2022, 09:52 Uhr

Vortragende(r) Moritz Teichner
Vortragstyp Proposal
Betreuer(in) Florian Kalinke
Termin Fr 13. Mai 2022
Vortragsmodus in Präsenz
Kurzfassung Change point detection is a fundamental task with many applications in finance, bioinformatics and other areas. The basic assumption is that the distribution generating a data set might change at a so-called “Change Point” over time. The detection of those points is crucial and in practice an unsupervised problem. In order to analyse given algorithms for change point detection, there has to be labled data. Only few labled real world data sets are publicly available and many of them are either too small, reused, preprocessed or ambiguous. Recently, there has been a publication of data sets annotated by data scientists and ML researchers and an assessment of 14 algorithms on their data. Because they did the labelling by hand, there are issues raised. Can humans correctly identify changes and be consistent?

The goal of this Bachelor Thesis is to algorithmically label this data set and extend it. This is done by constructing a non-parametric hypothesis test using Maximum Mean Discrepancy (MMD) as a statistic and approximating the null-distribution performing a permutation test. The obtained results should be analysed and compared to the human labelling. Furthermore, a new assessment of change point detection algorithms should be performed and again compared to the given one.