Standardized Real-World Change Detection Data: Unterschied zwischen den Versionen

Version vom 8. Mai 2022, 09:52 Uhr

Vortragende(r)	Moritz Teichner
Vortragstyp	Proposal
Betreuer(in)	Florian Kalinke
Termin	Fr 13. Mai 2022
Vortragsmodus	in Präsenz
Kurzfassung	Change point detection is a fundamental task with many applications in finance, bioinformatics and other areas. The basic assumption is that the distribution generating a data set might change at a so-called “Change Point” over time. The detection of those points is crucial and in practice an unsupervised problem. In order to analyse given algorithms for change point detection, there has to be labled data. Only few labled real world data sets are publicly available and many of them are either too small, reused, preprocessed or ambiguous. Recently, there has been a publication of data sets annotated by data scientists and ML researchers and an assessment of 14 algorithms on their data. Because they did the labelling by hand, there are issues raised. Can humans correctly identify changes and be consistent? The goal of this Bachelor Thesis is to algorithmically label this data set and extend it. This is done by constructing a non-parametric hypothesis test using Maximum Mean Discrepancy (MMD) as a statistic and approximating the null-distribution performing a permutation test. The obtained results should be analysed and compared to the human labelling. Furthermore, a new assessment of change point detection algorithms should be performed and again compared to the given one.

@@ Zeile 6: / Zeile 6: @@
 |termin=Institutsseminar/2022-05-13 Zusatztermin
 |vortragsmodus=in Präsenz
-|kurzfassung=Kurzfassung
+|kurzfassung=Change point detection is a fundamental task with many applications in finance, bioinformatics and other areas. The basic assumption is that the distribution generating a data set might change at a so-called “Change Point” over time. The detection of those points is crucial and in practice an unsupervised problem. In order to analyse given algorithms for change point detection, there has to be labled data. Only few labled real world data sets are publicly available and many of them are either too small, reused, preprocessed or ambiguous. Recently, there has been a publication of data sets annotated by data scientists and ML researchers and an assessment of 14 algorithms on their data. Because they did the labelling by hand, there are issues raised. Can humans correctly identify changes and be consistent?
+The goal of this Bachelor Thesis is to algorithmically label this data set and extend it. This is done by constructing a non-parametric hypothesis test using Maximum Mean Discrepancy (MMD) as a statistic and approximating the null-distribution performing a permutation test.
+The obtained results should be analysed and compared to the human labelling. Furthermore, a new assessment of change point detection algorithms should be performed and again compared to the given one.
 }}