https://sdq.kastel.kit.edu/index.php?title=Review_of_data_efficient_dependency_estimation&feed=atom&action=historyReview of data efficient dependency estimation - Versionsgeschichte2024-03-28T15:12:58ZVersionsgeschichte dieser Seite in SDQ-InstitutsseminarMediaWiki 1.39.6https://sdq.kastel.kit.edu/mediawiki-institutsseminar/index.php?title=Review_of_data_efficient_dependency_estimation&diff=2082&oldid=prevFj1267 am 16. Februar 2022 um 11:45 Uhr2022-02-16T11:45:16Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="de">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Nächstältere Version</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Version vom 16. Februar 2022, 12:45 Uhr</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l6">Zeile 6:</td>
<td colspan="2" class="diff-lineno">Zeile 6:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>|termin=Institutsseminar/2022-02-25 Zusatztermin</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>|termin=Institutsseminar/2022-02-25 Zusatztermin</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>|vortragsmodus=online</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>|vortragsmodus=online</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>|kurzfassung=The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance.Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>|kurzfassung=The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance. Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">Many dependency estimation algorithms perform poorly in a real world setting because they do not consider multivariate dependencies. Multivariate dependencies are very common and occur, in the material science example where the properties of the synthesized material depend on many variables.</del></div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">Also, dependency estimation algorithms are often not robust against errors in the data. But data is error-prone, take for instance data about the health of a patient for a clinical study, which is hard to measure accurately.</del></div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">Furthermore, many algorithms are too complex to be used by a non expert. The parameters of an algorithm need to be intuitive to use, and the result should be interpretable. Only then people outside of academia can apply the algorithm without mistakes.</del></div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing <del style="font-weight: bold; text-decoration: none;">the above-mentioned </del>challenges. We partly developed the criteria our self as well as took them from relevant publications. <del style="font-weight: bold; text-decoration: none;">Many </del>of the <del style="font-weight: bold; text-decoration: none;">existing </del>criteria <del style="font-weight: bold; text-decoration: none;">where </del>only <del style="font-weight: bold; text-decoration: none;">formulated </del>qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing <ins style="font-weight: bold; text-decoration: none;">these </ins>challenges <ins style="font-weight: bold; text-decoration: none;">and more</ins>. We partly developed the criteria our self as well as took them from relevant publications. <ins style="font-weight: bold; text-decoration: none;">The existing publications formulated many </ins>of the criteria only qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>From 14 selected criteria, <del style="font-weight: bold; text-decoration: none;">the </del>focus <del style="font-weight: bold; text-decoration: none;">will be </del>on data efficiency and uncertainty estimation<del style="font-weight: bold; text-decoration: none;">. These criteria </del>are essential for lowering the cost of dependency estimation<del style="font-weight: bold; text-decoration: none;">. The expected result of this bachelor's thesis is to identify an algorithm that fulfils all 14 criteria.</del></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>From 14 selected criteria, <ins style="font-weight: bold; text-decoration: none;">we </ins>focus on <ins style="font-weight: bold; text-decoration: none;">criteria concerning </ins>data efficiency and uncertainty estimation<ins style="font-weight: bold; text-decoration: none;">, because they </ins>are essential for lowering the cost of dependency estimation<ins style="font-weight: bold; text-decoration: none;">, but </ins>we <ins style="font-weight: bold; text-decoration: none;">will also check other </ins>criteria <ins style="font-weight: bold; text-decoration: none;">relevant </ins>for the <ins style="font-weight: bold; text-decoration: none;">application of algorithms</ins>.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">In the comparison </del>we <del style="font-weight: bold; text-decoration: none;">include a qualitative analysis by checking general </del>criteria<del style="font-weight: bold; text-decoration: none;">, that increase the usability </del>for <del style="font-weight: bold; text-decoration: none;">non experts, such criteria are interpretability, and intuitiveness.</del></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">As a result</ins>, <ins style="font-weight: bold; text-decoration: none;">we will rank </ins>the algorithms <ins style="font-weight: bold; text-decoration: none;">in the </ins>different <ins style="font-weight: bold; text-decoration: none;">aspects given by the criteria, and thereby identify potential for improvement </ins>of the <ins style="font-weight: bold; text-decoration: none;">current </ins>algorithms.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">We also analyse if </del>the <del style="font-weight: bold; text-decoration: none;">algorithm is an anytime algorithm and if it uses incremental computation to enable early stopping and increase data efficiency</del>.</div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">Another criterion is guided sampling</del>, <del style="font-weight: bold; text-decoration: none;">which can lead to more data efficiency.</del></div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">To apply </del>the algorithms <del style="font-weight: bold; text-decoration: none;">to </del>different <del style="font-weight: bold; text-decoration: none;">kinds </del>of <del style="font-weight: bold; text-decoration: none;">datasets, we also analyse if </del>the algorithms <del style="font-weight: bold; text-decoration: none;">are multivariate, general-purpose, and non-parametric</del>.</div></td><td colspan="2" class="diff-side-added"></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>We also conduct a quantitative analysis <del style="font-weight: bold; text-decoration: none;">of </del>the dependency estimation algorithms that performed well in the qualitative analysis <del style="font-weight: bold; text-decoration: none;">by experiment on well-established and representative datasets</del>.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">We do this in two steps, first we check general criteria in a qualitative analysis. For this we check if the algorithm is capable of guided sampling, if it is an anytime algorithm and if it uses incremental computation to enable early stopping, which all leads to more data efficiency.</ins></div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>We also conduct a quantitative analysis <ins style="font-weight: bold; text-decoration: none;">on well-established and representative datasets for </ins>the dependency estimation algorithms<ins style="font-weight: bold; text-decoration: none;">, </ins>that performed well in the qualitative analysis.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In these experiments we evaluate more criteria:</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In these experiments we evaluate more criteria:</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>}}</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>}}</div></td></tr>
</table>Fj1267https://sdq.kastel.kit.edu/mediawiki-institutsseminar/index.php?title=Review_of_data_efficient_dependency_estimation&diff=2081&oldid=prevFj1267: Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Maximilian Georg |email=maximilian.georg@student.kit.edu |vortragstyp=Proposal |betreuer=Bela Böhnke |termin=Institutsseminar/2022-02-…“2022-02-16T11:12:28Z<p>Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Maximilian Georg |email=maximilian.georg@student.kit.edu |vortragstyp=Proposal |betreuer=Bela Böhnke |termin=Institutsseminar/2022-02-…“</p>
<p><b>Neue Seite</b></p><div>{{Vortrag<br />
|vortragender=Maximilian Georg<br />
|email=maximilian.georg@student.kit.edu<br />
|vortragstyp=Proposal<br />
|betreuer=Bela Böhnke<br />
|termin=Institutsseminar/2022-02-25 Zusatztermin<br />
|vortragsmodus=online<br />
|kurzfassung=The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance.Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.<br />
There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.<br />
Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.<br />
Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.<br />
<br />
Many dependency estimation algorithms perform poorly in a real world setting because they do not consider multivariate dependencies. Multivariate dependencies are very common and occur, in the material science example where the properties of the synthesized material depend on many variables.<br />
Also, dependency estimation algorithms are often not robust against errors in the data. But data is error-prone, take for instance data about the health of a patient for a clinical study, which is hard to measure accurately.<br />
Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.<br />
As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.<br />
Furthermore, many algorithms are too complex to be used by a non expert. The parameters of an algorithm need to be intuitive to use, and the result should be interpretable. Only then people outside of academia can apply the algorithm without mistakes.<br />
<br />
In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing the above-mentioned challenges. We partly developed the criteria our self as well as took them from relevant publications. Many of the existing criteria where only formulated qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.<br />
<br />
From 14 selected criteria, the focus will be on data efficiency and uncertainty estimation. These criteria are essential for lowering the cost of dependency estimation. The expected result of this bachelor's thesis is to identify an algorithm that fulfils all 14 criteria.<br />
In the comparison we include a qualitative analysis by checking general criteria, that increase the usability for non experts, such criteria are interpretability, and intuitiveness.<br />
We also analyse if the algorithm is an anytime algorithm and if it uses incremental computation to enable early stopping and increase data efficiency.<br />
Another criterion is guided sampling, which can lead to more data efficiency.<br />
To apply the algorithms to different kinds of datasets, we also analyse if the algorithms are multivariate, general-purpose, and non-parametric.<br />
<br />
We also conduct a quantitative analysis of the dependency estimation algorithms that performed well in the qualitative analysis by experiment on well-established and representative datasets.<br />
In these experiments we evaluate more criteria:<br />
The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.<br />
}}</div>Fj1267