Aus IPD-Institutsseminar
Wechseln zu: Navigation, Suche
Termin (Alle Termine)
Datum Fr 28. Februar 2020, 11:30 Uhr
Dauer 30 min
Raum Raum 348 (Gebäude 50.34)
Vorheriger Termin Fr 21. Februar 2020
Nächster Termin Fr 6. März 2020


Vortragende(r) Jonas Bernhard
Titel Analyse von Zeitreihen-Kompressionsmethoden am Beispiel von Google N-Gram
Vortragstyp Bachelorarbeit
Betreuer(in) Martin Schäler
Kurzfassung Temporal text corpora like the Google Ngram Data Set usually incorporate a vast number of words and expressions, called ngrams, and their respective usage frequencies over the years. The large quantity of entries complicates working with the data set, as transformations and queries are resource and time intensive. However, many use cases do not require the whole corpus to have a sufficient data set and achieve acceptable query results. We propose various compression methods to reduce the total number of ngrams in the corpus. Specially, we propose compression methods that, given an input dictionary of target words, find a compression tailored for queries on a specific topic. Additionally, we utilize time-series compression methods for quick estimations about the properties of ngram usage frequencies. As basis for our compression method design and experimental validation serve CHQL (Conceptual History Query Language) queries on the Google Ngram Data Set.
Neuen Vortrag erstellen