|Datum||Fr 22. November 2019, 11:30 Uhr|
|Ort||Raum 348 (Gebäude 50.34)|
|Vorheriger Termin||Fr 15. November 2019|
|Nächster Termin||Fr 29. November 2019|
|Titel||Anytime Tradeoff Strategies with Multiple Targets|
|Kurzfassung||Modern applications typically need to find solutions to complex problems under limited time and resources. In settings, in which the exact computation of indicators can either be infeasible or economically undesirable, the use of “anytime” algorithms, which can return approximate results when interrupted, is particularly beneficial, since they offer a natural way to trade computational power for result accuracy.
However, modern systems typically need to solve multiple problems simultaneously. E.g. in order to find high correlations in a dataset, one needs to examine each pair of variables. This is challenging, in particular if the number of variables is large and the data evolves dynamically.
This thesis focuses on the following question: How should one distribute resources at anytime, in order to maximize the overall quality of multiple targets? First, we define the problem, considering various notions of quality and user requirements. Second, we propose a set of strategies to tackle this problem. Finally, we evaluate our strategies via extensive experiments.
|Titel||Subspace Search in Data Streams|
|Kurzfassung||Modern data mining often takes place on high-dimensional data streams, which evolve at a very fast pace: On the one hand, the "curse of dimensionality" leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in a few low-dimensional subspaces. On the other hand, data streams are non-stationary and virtually unbounded. Hence, algorithms operating on data streams must work incrementally and take concept drift into account.
While "high-dimensionality" and the "streaming setting" provide two unique sets of challenges, we observe that the existing mining algorithms only address them separately. Thus, our plan is to propose a novel algorithm, which keeps track of the subspaces of interest in high-dimensional data streams over time. We quantify the relevance of subspaces via a so-called "contrast" measure, which we are able to maintain incrementally in an efficient way. Furthermore, we propose a set of heuristics to adapt the search for the relevant subspaces as the data and the underlying distribution evolves.
We show that our approach is beneficial as a feature selection method and as such can be applied to extend a range of knowledge discovery tasks, e.g., "outlier detection", in high-dimensional data-streams.
- Neuen Vortrag erstellen