Statistical Generation of High-Dimensional Data Streams with Complex Dependencies
|Termin||Fr 18. Mai 2018|
|Kurzfassung|| The extraction of knowledge from data streams is one of the most crucial tasks of modern day data science. Due to their nature data streams are ever evolving and knowledge derrived at one point in time may be obsolete in the next period. The need for specialized algorithms that can deal with high-dimensional data streams and concept drift is prevelant.
A lot of research has gone into creating these kind of algorithms. The problem here is the lack of data sets with which to evaluate them. A ground truth for a common evaluation approach is missing. A solution to this could be the synthetic generation of data streams with controllable statistical propoerties, such as the placement of outliers and the subspaces in which special kinds of dependencies occur. The goal of this Bachelor thesis is the conceptualization and implementation of a framework which can create high-dimensional data streams with complex dependencies.