CGFLEX: A Flexible Framework for Causal Graph-based Data Synthesis: Unterschied zwischen den Versionen

Aus SDQ-Institutsseminar
(Die Seite wurde neu angelegt: „{{Vortrag |vortragender=paul giza |email=paul.giza@web.de |vortragstyp=Proposal |betreuer=Bela Böhnke |termin=Institutsseminar/2023-04-14 |vortragsmodus=in Pr…“)
 
Keine Bearbeitungszusammenfassung
 
(2 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
{{Vortrag
{{Vortrag
|vortragender=paul giza
|vortragender=Paul Giza
|email=paul.giza@web.de
|email=paul.giza@web.de
|vortragstyp=Proposal
|vortragstyp=Masterarbeit
|betreuer=Bela Böhnke
|betreuer=Bela Böhnke
|termin=Institutsseminar/2023-04-14
|termin=Institutsseminar/2023-04-14
|vortragsmodus=in Präsenz
|vortragsmodus=in Präsenz
|kurzfassung=Algorithms that can extract dependencies from data and represent them as causal graphs must also be tested. Often, only very few data is available for this and simulations are expensive and time-consuming. Another problem is that even when data is available, the ground truth about the underlying dependencies is usually not known. One solution to this problem is to generate synthetic datasets and use them to evaluate the results of said algorithms.
|kurzfassung=Algorithms that extract dependencies from data and represent them as causal graphs must also be tested. For such tests, data with a known ground truth is required, but this is rarely available. Generating data under controlled conditions through simulations is expensive and time-consuming. A solution to this problem is to create synthetic datasets, where dependencies are predefined, to evaluate the results of these algorithms.
This work is concerned with building a framework for the synthesis of data. The synthesis process within such a framework would be to first generate a random dependency graph, and then in a second step populate this graph with random dependencies. From this construct, data sets could then be sampled. Furthermore, the user should be able to influence the size and structure of the dependency graph by controlling input values. And by defining the types of dependencies, it is possible to influence the complexity of the graph. Thus one receives an instrument for improvement and comparison of mentioned algorithms under various circumstances.
 
This work focuses on building a framework for the synthesis of data. In the framework, the synthesis process begins with generating a random dependency graph, specifically a directed acyclic graph. Each node in the graph, except the source nodes, has parent nodes and represents a variable. In the next step, each node is populated with predefined random dependencies. A dependency is a model that determines the value of a variable based on its parent variables. From this structure, datasets can be sampled. Users can control the properties of the causal graph through various parameters and choose from multiple types of dependencies, representing different complexity levels.
 
Additionally, the sampling process allows for interactivity by enabling the exchange of dependencies during the sampling process. Dependencies can be exchanged with fixed values, probability distributions, or time series functions. This flexibility provides a robust tool for improving and comparing the mentioned algorithms under various conditions.
}}
}}

Aktuelle Version vom 8. Dezember 2023, 11:16 Uhr

Vortragende(r) Paul Giza
Vortragstyp Masterarbeit
Betreuer(in) Bela Böhnke
Termin Fr 14. April 2023
Vortragsmodus in Präsenz
Kurzfassung Algorithms that extract dependencies from data and represent them as causal graphs must also be tested. For such tests, data with a known ground truth is required, but this is rarely available. Generating data under controlled conditions through simulations is expensive and time-consuming. A solution to this problem is to create synthetic datasets, where dependencies are predefined, to evaluate the results of these algorithms.

This work focuses on building a framework for the synthesis of data. In the framework, the synthesis process begins with generating a random dependency graph, specifically a directed acyclic graph. Each node in the graph, except the source nodes, has parent nodes and represents a variable. In the next step, each node is populated with predefined random dependencies. A dependency is a model that determines the value of a variable based on its parent variables. From this structure, datasets can be sampled. Users can control the properties of the causal graph through various parameters and choose from multiple types of dependencies, representing different complexity levels.

Additionally, the sampling process allows for interactivity by enabling the exchange of dependencies during the sampling process. Dependencies can be exchanged with fixed values, probability distributions, or time series functions. This flexibility provides a robust tool for improving and comparing the mentioned algorithms under various conditions.