Suche mittels Attribut

Diese Seite stellt eine einfache Suchoberfläche zum Finden von Objekten bereit, die ein Attribut mit einem bestimmten Datenwert enthalten. Andere verfügbare Suchoberflächen sind die Attributsuche sowie der Abfragengenerator.

Suche mittels Attribut

Eine Liste aller Seiten, die das Attribut „Kurzfassung“ mit dem Wert „Kurzfassung“ haben. Weil nur wenige Ergebnisse gefunden wurden, werden auch ähnliche Werte aufgelistet.

Hier sind 183 Ergebnisse, beginnend mit Nummer 1.

Zeige (vorherige 500 | nächste 500) (20 | 50 | 100 | 250 | 500)


    

Liste der Ergebnisse

    • Exploring The Robustness Of The Natural Language Inference Capabilties Of T5  + (Large language models like T5 perform exceLarge language models like T5 perform excellently on various NLI benchmarks. However, it has been shown that even small changes in the structure of these tasks can significantly reduce accuracy. I build upon this insight and explore how robust the NLI skills of T5 are in three scenarios. First, I show that T5 is robust to some variations in the MNLI pattern, while others degenerate performance significantly. Second, I observe that some other patterns that T5 was trained on can be substituted for the MNLI pattern and still achieve good results. Third, I demonstrate that the MNLI pattern translate well to other NLI datasets, even improving accuracy by 13% in the case of RTE. All things considered, I conclude that the robustness of the NLI skills of T5 really depend on which alterations are applied.y depend on which alterations are applied.)
    • Theory-Guided Data Science for Lithium-Ion Battery Modeling  + (Lithium-ion batteries are driving innovatiLithium-ion batteries are driving innovation in the evolution of electromobility and renewable energy. These complex, dynamic systems require reliable and accurate monitoring through Battery Management Systems to ensure the safety and longevity of battery cells. Therefore an accurate prediction of the battery voltage is essential which is currently realized by so-called Equivalent Circuit (EC) Models. </br></br>Although state-of-the-art approaches deliver good results, they are hard to train due to the high number of variables, lacking the ability to generalize, and need to make many simplifying assumptions. In contrast to theory-based models, purely data-driven approaches require large datasets and are often unable to produce physically consistent results. Theory-Guided Data Science (TGDS) aims at using scientific knowledge to improve the effectiveness of Data Science models in scientific discovery. This concept has been very successful in several domains including climate science and material research. </br></br>Our work is the first one to apply TGDS to battery systems by working together closely with domain experts. We compare the performance of different TGDS approaches against each other as well as against the two baselines using only theory-based EC-Models and black-box Machine Learning models.els and black-box Machine Learning models.)
    • Attention Based Selection of Log Templates for Automatic Log Analysis  + (Log analysis serves as a crucial preprocesLog analysis serves as a crucial preprocessing step in text log data analysis, including anomaly detection in cloud system monitoring. However, selecting an optimal log parsing algorithm tailored to a specific task remains problematic.</br></br>With many algorithms to choose from, each requiring proper parameterization, making an informed decision becomes difficult. Moreover, the selected algorithm is typically applied uniformly across the entire dataset, regardless of the specific data analysis task, often leading to suboptimal results.</br></br>In this thesis, we evaluate a novel attention-based method for automating the selection of log parsing algorithms, aiming to improve data analysis outcomes. We build on the success of a recent Master Thesis, which introduced this attention-based method and demonstrated its promising results for a specific log parsing algorithm and dataset. The primary objective of our work is to evaluate the effectiveness of this approach across different algorithms and datasets. across different algorithms and datasets.)
    • Metamodel Evolution in the Context of a MOF-Based Metamodeling Infrastructure  + (Lorem ipsum dolor sit amet, consetetur sadLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.ta sanctus est Lorem ipsum dolor sit amet.)
    • Evaluation of Automated Feature Generation Methods  + (Manual feature engineering is a time consuManual feature engineering is a time consuming and costly activity, when developing new Machine Learning applications, as it involves manual labor of a domain expert. Therefore, efforts have been made to automate the feature generation process. However, there exists no large benchmark of these Automated Feature Generation methods. It is therefore not obvious which method performs well in combination with specific Machine Learning models and what the strengths and weaknesses of these methods are. </br>In this thesis we present an evaluation framework for Automated Feature Generation methods, that is integrated into the scikit-learn framework for Python. We integrate nine Automated Feature Generation methods into this framework.</br>We further evaluate the methods on 91 datasets for classification problems. The datasets in our evaluation have up to 58 features and 12,958 observations. As Machine Learning models we investigate five models including state of the art models like XGBoost.ding state of the art models like XGBoost.)
    • Surrogate Model Based Process Parameters Optimization of Textile Forming  + (Manufacturing optimization is crucial for Manufacturing optimization is crucial for organizations to remain competitive in the market. However, complex processes, such as textile forming, can be challenging to optimize, requiring significant resources. Surrogate-based optimization is an efficient method that uses simplified models to guide the search for optimal parameter combinations of manufacturing processes. Moreover, incorporating uncertainty estimates into the model can further speed up the optimization process, which can be achieved by using Bayesian deep neural networks. Additionally, convolutional neural networks can take advantage of spatial information in the images that are part of the textile forming parameters. In this work, a Bayesian deep convolutional surrogate model is proposed that uses all available process parameters to predict the shear angle of a textile element. By incorporating background information into the surrogate model, it is expected to predict detailed process results, leading to greater efficiency and increased product quality. efficiency and increased product quality.)
    • Streaming Model Analysis - Synergies from Stream Processing and Incremental Model Analysis  + (Many modern applications take a potentiallMany modern applications take a potentially infinite stream of events as input to interpret and process the data. The established approach to handle such tasks is called Event Stream Processing. The underlying technologies are designed to process this stream efficiently, but applications based on this approach can become hard to maintain, as the application grows. A model-driven approach can help to manage increasing complexity and changing requirements. This thesis examines how a combination of Event Stream Processing and Model-Driven Engineering can be used to handle an incoming stream of events. An architecture that combines these two technologies is proposed and two case studies have been performed. The DEBS grand challenges from 2015 and 2016 have been used to evaluate applications based on the proposed architecture towards their performance, scalability and maintainability. The result showed that they can be adapted to a variety of change scenarios with an acceptable cost, but that their processing speed is not competitive.their processing speed is not competitive.)
    • Empirical Identification of Performance Influences of Configuration Options in High-Performance Applications  + (Many modern high-performance applications Many modern high-performance applications are highly-configurable software systems that provide hundreds or even thousands of configuration options. System administrators or application users need to understand all these options and their impacts on the software performance to choose suitable configuration values. To understand the influence of configuration options on the run-time characteristics of a software system, users can use performance prediction models, but building performance prediction models for highly-configurable high-performance applications is expensive. However, not all configuration options, which a software system offers, are performance-relevant. Removing these performance-irrelevant configuration options from the modeling process can reduce the construction cost. In this thesis, we explore and analyze two different approaches to empirically identify configuration options that are not performance-relevant and can be removed from the performance prediction model. The first approach reuses existing performance modeling methods to create much cheaper prediction models by using fewer samples and then analyzing the models to identify performance-irrelevant configuration options. The second approach uses white-box knowledge acquired through dynamic taint analysis to systematically construct the minimal number of required experiments to detect performance-irrelevant configuration options. In the evaluation with a case study, we show that the first approach identifies performance-irrelevant configuration options but also produces misclassifications. The second approach did not perform to our expectations. Further improvement is necessary.tations. Further improvement is necessary.)
    • Enabling the Information Transfer between Architecture and Source Code for Security Analysis  + (Many software systems have to be designed Many software systems have to be designed and developed in a way that specific security requirements are guaranteed. Security can be specified on different views of the software system that contain different kinds of information about the software system. Therefore, a security analysis on one view must assume security properties of other views. A security analysis on another view can be used to verify these assumptions. We provide an approach for enabling the information transfer between a static architecture analysis and a static, lattice-based source code analysis. This approach can be used to reduce the assumptions in a component-based architecture model. In this approach, requirements under which information can be transferred between the two security analyses are provided. We consider the architecture and source code security analysis as black boxes. Therefore, the information transfer between the security analyses is based on a megamodel consisting of the architecture model, the source code model, and the source code analysis results. The feasibility of this approach is evaluated in a case study using Java Object-sensitive ANAlysis and Confidentiality4CBSE. The evaluation shows that information can be transferred between an architecture and a source code analysis. The information transfer reveals new security violations which are not found using only one security analysis.ot found using only one security analysis.)
    • Auswirkungen von Metamodellen auf Modellanalysen  + (Metamodelle sind das zentrale Artefakt beiMetamodelle sind das zentrale Artefakt bei der modellgetriebenen Softwareentwicklung. Obwohl viele Qualitätsattribute und Evaluierungsmechanismen für Metamodelle bekannt sind, ist es noch nicht empirisch untersucht, welche Auswirkungen Metamodelle auf andere Artefakten haben. Die gegenwärtige Ausarbeitung beschäftigt sich mit der Auswirkung von Metamodellen auf andere Artefakte der Softwareentwicklung. Genauer wird untersucht, inwieweit die Qualitätsattribute von Metamodellen die Modellanalysen und die Modelltransformationen beeinflussen. Zu diesem Zweck werden verschiedene Artefakte analysiert – die Ergebnisse aus Metamodell-Metriken, Code-Metriken von Modellanalysen und ATL-Transformationen, sowie manuellen Bewertungen von Metamodellen. Die Daten werden analysiert, Korrelationen werden bestimmt und Abhängigkeiten werden aufgedeckt.immt und Abhängigkeiten werden aufgedeckt.)
    • Enabling Architectural Performability Analyses for Microservices via Design Pattern Completions  + (Microservices architectures have gained poMicroservices architectures have gained popularity over the recent years, especially since global players in the internet economy changed to this architectural style. Many architectural patterns for recurring problems were identified, such as the Service Discovery for service registration or Client-side Load Balancing for load distribution.</br>Architectural analyses with the Palladio framework allow for the investigation of the attainment of these requirements during design time. The Architectural Templates method combines architecture models with architectural patterns and styles and allows for design-time analyses.</br>In this thesis, we create a Microservices Architectural Templates catalog, containing microservices Architectural Templates. A selection of widely used patterns is analyzed and conceptually mapped to the Architectural Templates method.</br>A case study, conducted with a sample application representing a customer relationship management application, shows that software architects can profit from the provided templates by automatic model completions and accurate analyses results.completions and accurate analyses results.)
    • Differentially Private Event Sequences over Infinite Streams  + (Mit Smart Metern erfasste Datenströme stelMit Smart Metern erfasste Datenströme stellen eine Gefahr für die Privatheit dar, sodass Bedarf für Privatheitsverfahren besteht. Aktueller Stand der Technik für Datenströme ist w-event differential privacy. Dies wurde bisher v.a. für die Publikation von Histogram-Queries verwendet. Ziel dieser Arbeit ist die eingehende experimentelle Analyse der Mechanismen, mit dem Fokus darauf zu beurteilen, wie gut diese Mechanismen sich für die Publikation von Sum-Queries, wie sie im Smart Meter Szenario gebraucht werden, eignen. Die Arbeit besteht aus drei Teilen: (1) Reproduktion der in der Literatur propagierten guten Ergebnisse der wichtigsten w-event DP Mechanismen für Histogram-Queries, (2) Evaluierung deren Qualität bei Anwendung auf Smart Meter Daten (Sum-Queries), (3) Evaluierung der Qualität zweier Mechanismen bzgl. der Gewährleistung von Pan-Privacy, einer erweiterten Garantie. Während wir in (1) die Ergebnisse größtenteils nicht reproduzieren konnten, erzielten wir in (2) gute Ergebnisse. Bzgl. (3) gelang es uns, die theoretische Qualitätsanalyse aus der Literatur zu bestätigen.tsanalyse aus der Literatur zu bestätigen.)
    • Modellierung und Simulation von dynamischen Container-basierten Software-Architekturen in Palladio  + (Mit dem Palladio Komponentenmodell (PCM) lMit dem Palladio Komponentenmodell (PCM) lassen sich Softwaresysteme modellieren und simulieren. Moderne verteilte Software-Systeme werden jedoch nicht mehr einfach statisch deployed, sondern es wird ein gewünschter Zustand definiert, der mithilfe einer Kontrollschleife dann eingehalten werden soll. Das passiert dann bspw. durch das Starten oder Stoppen von Containern und Pods. </br>In dieser Arbeit wurde eine Erweiterung des PCM um die Konzepte von Containerorchestrierungswerkzeugen wie Kubernetes erarbeitet und umgesetzt. Zusätzlich wurde ein Konzept erarbeitet um dynamische Containerbasierte Systeme zu simulieren. Es wurde dabei insbesondere die Allokation bzw. Reallokation von Pods zur Simulationszeit betrachtet. Abschließend wurde die Modellerweiterung evaluiert.end wurde die Modellerweiterung evaluiert.)
    • Tradeoff zwischen Privacy und Utility für Short Term Load Forecasting  + (Mit der Etablierung von Smart Metern gehenMit der Etablierung von Smart Metern gehen verschiedene Vor- und Nachteile einher. Einerseits bieten die Smart Meter neue Möglichkeiten Energieverbräuche akkurater vorherzusagen (Forecasting) und sorgen damit für eine bessere Planbarkeit des Smart Grids. Andererseits können aus Energieverbrauchsdaten viele private Informationen extrahiert werden, was neue potentielle Angriffsvektoren auf die Privatheit der Endverbraucher impliziert. Der Schutz der Privatheit wird in der Literatur durch verschiedene Perturbations-Methoden umgesetzt. Da Pertubation die Daten verändert, sorgt dies jedoch für weniger akkurate Forecasts. Daher gilt es ein Tradeoff zu finden. In dieser Arbeit werden verschiedene gegebene Techniken zur Perturbation hinsichtlich ihrer Privacy (Schutz der Privatheit) und Utility (Akkuratheit der Forecasts) experimentell miteinander verglichen. Hierzu werden verschiedene Datensätze, Forecasting-Algorithmen und Metriken zur Bewertung von Privacy und Utility herangezogen. Die Arbeit kommt zum Schluss, dass die so genannte Denoise- und WeakPeak-Technik zum Einstellen eines Tradeoffs zwischen Privacy und Utility besonders geeignet ist.rivacy und Utility besonders geeignet ist.)
    • Einbindung eines EDA-Programms zur Erstellung elektronischer Leiterplatten in das Vitruvius-Framework  + (Mithilfe der modellgetriebenen SoftwareentMithilfe der modellgetriebenen Softwareentwicklung kann im Entwicklungsprozess eines Software-Systems, dieses bzw. dessen Teile und Abstraktionen durch Modelle beschrieben werden. Diese Modelle können untereinander in Abhängigkeitsbeziehungen stehen sowie über redundante Informationen verfügen. Um Inkonsistenzen zu vermeiden, werden Tools zur automatisierten Konsistenzhaltung eingesetzt.</br>In dieser Arbeit wird das EDA-Programm Eagle, das zur Erstellung elektronischer Schaltpläne und Leiterplatten genutzt wird, in das Vitruvius-Framework eingebunden. Bestandteile sind hierbei das Ableiten eines Ecore-Metamodells, das die Schaltplandatei von Eagle beschreibt, das Etablieren von Transformationen zwischen Ecore-Modellen und Schaltplandateien sowie das Extrahieren von Änderungen zwischen zwei chronologisch aufeinanderfolgenden Schaltplandateien. Die extrahierten Änderungen werden in das Vitruvius-Framework eingespielt, wo sie durch das Framework zu in Konsistenzbeziehung stehenden Ecore-Modellen propagiert werden. Zudem wird ein Verfahren eingesetzt, um Änderungen in der Schaltplandatei einem eindeutigen elektronischen Bauteil zuordnen zu können. Dies ist erforderlich, um Bauteile im Kontext mit anderen Programmen zu verfolgen, da die Eigenschaften eines Bauteils in verschiedenen Programmen variieren können.verschiedenen Programmen variieren können.)
    • Automated Extraction of Stateful Power Models for Cyber Foraging Systems  + (Mobile devices are strongly resource-constMobile devices are strongly resource-constrained in terms of computing and battery capacity. Cyber-foraging systems circumvent these constraints by offloading a task to a more powerful system in close proximity. Offloading itself induces additional workload and thus additional power consumption on the mobile device. Therefore, offloading systems must decide whether to offload or to execute locally. Power models, which estimate the power consumption for a given workload can be helpful to make an informed decision.</br></br>Recent research has shown that various hardware components such as wireless network interface cards (WNIC), cellular network interface cards or GPS modules have power states, that is, the power consumption behavior of a hardware component depends on the current state. Power models that consider power states</br>(stateful power models) can be modeled as Power State Machines (PSM). For systems with multiple power states, stateful models proved to be more accurate than models that do not consider power states (stateless models).</br></br>Manually generating PSMs is time-consuming and limits the practicability of PSMs. Therefore, in this thesis, we explore the possibility of automatically generating PSMs. The contribution of this thesis is twofold: (1) We introduce an automated measurementbased profiling approach (2) and we introduce a step-based approach, which, provided with profiling data, automatically extracts PSMs along with tail states and state transitions.</br></br>We evaluate the automated PSM extraction in a case study on an offloading speech recognition system. We compare the power consumption prediction accuracy of the generated PSM with the prediction accuracy of a stateless regression based model.</br>Because we measure the power consumption of the whole system, we use along with all WiFi power models the same CPU power model in order to predict the power consumption of the whole system. We find that a slightly adapted version of the</br>generated PSM predicts the power consumption with a mean error of approx. 3% and an error of approx. 2% in the best case. In contrast, the regression model produces a mean error of</br>approx. 19% and an error of approx. 18% in the best case. an error of approx. 18% in the best case.)
    • Inkrementelle Modellreduktion zur Verkürzung der Testzyklen in der Transformationsentwicklung  + (Modellgetriebene Softwareentwicklung (MDD)Modellgetriebene Softwareentwicklung (MDD) ist ein Paradigma der Softwareentwicklung, in dem das Modell eine zentrale Rolle spielt. In der MDD wird das Problemfeld durch das Model abstrakt und repräsentativ beschrieben. Im Laufe der Entwicklung wird das Modell durch Modelltransformation schrittweise konkretisiert und schließlich in Programmcode umgewandelt. Je umfangreicher und komplexer das Problemfelds ist, desto größer ist die Anzahl der Modellelemente und desto komplexer ist der Zusammenhang zwischen den Modellelementen. Aus diesem Grund ist die Transformation eines solch großen Modells zeitaufwendig und fehleranfällig. </br></br>Es werden in der Entwicklung mehrmals Test durchgeführt, um die Korrektheit des Modells und der Transformation zu gewährleisten. Die große Anzahl der Elemente im Modell verlangsamt den Test und erschwert das Finden der Fehlerursache im Modell und in der Transformation. Daher wurde im Rahmen dieser Bachelorarbeit untersucht, ob ein Ausschnitt des Modells existiert, welcher folgende Eigenschaften hat: Dieser Ausschnitt soll nur Teile des originalen Modells enthalten. Weiter sollen mit diesem Ausschnitt alle Fehler des vollständigen Modells repräsentiert werden können. Die Ursache und Korrektur des fehlerhaften Modells und der fehlerhaften Transformation werden im Rahmen dieser Arbeit nicht untersucht. Die Arbeit konzentriert sich auf das Erstellen und Untersuchen dieses Ausschnitts des Modells.ntersuchen dieses Ausschnitts des Modells.)
    • Anytime Tradeoff Strategies with Multiple Targets  + (Modern applications typically need to findModern applications typically need to find solutions to complex problems under limited time and resources. In settings, in which the exact computation of indicators can either be infeasible or economically undesirable, the use of “anytime” algorithms, which can return approximate results when interrupted, is particularly beneficial, since they offer a natural way to trade computational power for result accuracy.</br>However, modern systems typically need to solve multiple problems simultaneously. E.g. in order to find high correlations in a dataset, one needs to examine each pair of variables. This is challenging, in particular if the number of variables is large and the data evolves dynamically.</br></br>This thesis focuses on the following question: How should one distribute resources at anytime, in order to maximize the overall quality of multiple targets? </br>First, we define the problem, considering various notions of quality and user requirements. Second, we propose a set of strategies to tackle this problem. Finally, we evaluate our strategies via extensive experiments. our strategies via extensive experiments.)
    • Outlier Analysis in Live Systems from Application Logs  + (Modern computer applications tend to generModern computer applications tend to generate massive amounts of logs and have become so complex that it is often difficult to explain why applications failed. Locating outliers in application logs can help explain application failures. Outlier detection in application logs is challenging because (1) the log is unstructured text streaming data. (2) labeling application logs is labor-intensive and inefficient.</br>Logs are similar to natural languages. Recent deep learning algorithm Transformer Neural Network has shown outstanding performance in Natural Language Processing (NLP) tasks. Based on these, we adapt Transformer Neural Network to detect outliers from applications logs In an unsupervised way. We compared our algorithm against state-of-the-art log outlier detection algorithms on three widely used benchmark datasets. Our algorithm outperformed state-of-the-art log outlier detection algorithms.-the-art log outlier detection algorithms.)
    • Subspace Search in Data Streams  + (Modern data mining often takes place on hiModern data mining often takes place on high-dimensional data streams, which evolve at a very fast pace: On the one hand, the "curse of dimensionality" leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in a few low-dimensional subspaces. On the other hand, data streams are non-stationary and virtually unbounded. Hence, algorithms operating on data streams must work incrementally and take concept drift into account. </br></br>While "high-dimensionality" and the "streaming setting" provide two unique sets of challenges, we observe that the existing mining algorithms only address them separately. Thus, our plan is to propose a novel algorithm, which keeps track of the subspaces of interest in high-dimensional data streams over time. We quantify the relevance of subspaces via a so-called "contrast" measure, which we are able to maintain incrementally in an efficient way. Furthermore, we propose a set of heuristics to adapt the search for the relevant subspaces as the data and the underlying distribution evolves.</br></br>We show that our approach is beneficial as a feature selection method and as such can be applied to extend a range of knowledge discovery tasks, e.g., "outlier detection", in high-dimensional data-streams.ection", in high-dimensional data-streams.)
    • Bewertung verschiedener Parallelisierungsstrategien im Hinblick auf Leistungsfähigkeit von paralleler Programmausführung  + (Moderne Prozessoren erreichen eine LeistunModerne Prozessoren erreichen eine Leistungssteigerung durch Hinzufügen mehrerer Kerne. Dadurch muss bei der Softwareentwicklung darauf geachtet werden, die Programmabläufe zu parallelisieren. Einflussfaktoren, die die Leistungsfähigkeit paralleler Programmausführung beeinflussen können, wurden bereits kategorisiert. Der Einfluss der gewählten Parallelisierungsstrategie ist dabei unbekannt. </br>Im Rahmen der Bachelorarbeit wurde der Einfluss der gewählten Parallelisierungsstrategie auf die Leistungsfähigkeit von Software untersucht. Dazu wurden unterschiedliche Hardwareanforderungen genutzt. Mit ihnen wurden einzelne Arbeitspakete generiert. Diese wurden durch verschiedene Parallelisierungsstrategien ausgeführt. Die verwendeten Parallelisierungsstrategien sind: Java Threads, Java ParallelStreams, OpenMp und Akka Actor. Bei jeder Ausführung wurden die Laufzeit und das Cacheverhalten gemessen. Zudem wurden die Experimente auf verschiedenen dezidierten Servern und dem BwUniCluster durchgeführt. Die Auswertungen erfolgten mittels Beschleunigungskurven und der Cache Miss Rate. Die Ergebnisse zeigen, dass sich die Parallelisierungsstrategien bei den verwendeten Arbeitspaketen nur in geringem Maße unterscheiden.aketen nur in geringem Maße unterscheiden.)
    • Integrating Architecture-based Confidentiality Analysis with Code-based Information Flow Analysis  + (Moderne Softwaresysteme müssen einer VielzModerne Softwaresysteme müssen einer Vielzahl von Sicherheitsanforderungen gerecht werden. Diese Anforderungen scheinen im Laufe der Zeit immer strenger zu werden. Heutzutage führt ein Softwaresystem, das Vertraulichkeitsanforderungen nicht erfüllt, oft zur unbeabsichtigten Offenlegung sensibler Daten. Dies ist oft mit finanziellen Kosten verbunden, da die DSGVO Bußgelder eingeführt und erhöht hat, kann aber auch den Ruf eines Unternehmens beeinträchtigen und zu Kundenverlusten führen. Viele Sicherheitslücken können aus Diskrepanzen zwischen der Architekturplanung und der Implementierung des Codes entstehen. Aus diesem Grund untersucht diese Arbeit die Integration einer statischen, architekturbasierten Vertraulichkeitsanalyse mit einer statischen, codebasierten Informationsflussanalyse. Durch die Kombination dieser beiden Analysen möchten wir zeigen, dass wir eine Diskrepanz zwischen Design und Implementierung identifizieren können. Der in dieser Arbeit gewählte Ansatz behandelt die Architekturplanung als das beabsichtigte Verhalten des Systems. Es werden die erforderlichen Artefakte generiert, um eine codebasierte Analyse durchzuführen und zu überprüfen, ob die auf der Architektur definierten Eigenschaften auf die Implementierung anwendbar sind. In einer kleinen Studie haben wir die Durchführbarkeit des Ansatzes evaluiert. Zusammenfassend zielt diese Arbeit darauf ab, die Lücke zwischen der architekturellen Sicht und der Codesicht zu überbrücken, indem Vertraulichkeitseigenschaften in beiden verbunden werden.seigenschaften in beiden verbunden werden.)
    • Rekonstruktion von Komponentenmodellen für Qualitätsvorhersagen auf der Grundlage heterogener Artefakte in der Softwareentwicklung  + (Moderne Softwaresysteme werden oftmals nicModerne Softwaresysteme werden oftmals nicht mehr als monolithische Anwendungen konstruiert. Verteilte Architekturen liegen im Trend. Der Einsatz von Technologien wie Docker und Spring bringt, neben dem Quelltext, zusätzliche Konfigurationsdateien mit ein. Eine Rekonstruktion der Softwarearchitektur nur anhand des Quelltextes wird dadurch erschwert. Zu Beginn dieser Arbeit wurden einige wissenschaftliche Arbeiten untersucht, die sich mit dem Thema Rekonstruktion von Softwarearchitekturen beschäftigen. Jedoch konnte keine Arbeit gefunden werden, welche sowohl heterogene Softwareartefakte unterstützt als auch ein für die Qualitätsvorhersage geeignetes Modell generiert.</br></br>Aufgrund dessen stellt diese Arbeit einen neuen Ansatz vor, der mehrere heterogene Softwareartefakte zur Rekonstruktion eines Architekturmodells miteinbezieht. Genauer wird in dieser Arbeit der Ansatz als Prototyp für die Artefakte Java-Quelltext, Dockerfiles, Docker-Compose-Dateien sowie Spring-Konfigurationsdateien umgesetzt. Als Zielmodell kommt das Palladio-Komponentenmodell zum Einsatz, welches sich für Analysen und Simulationen hinsichtlich Performanz und Verlässlichkeit eignet. Es wird näher untersucht, inwiefern die Informationen der Artefakte zusammengeführt werden können. Der Ansatz sieht es vor, die Artefakte zuerst in Modelle zu transformieren. Für diese Transformationen werden zwei unterschiedliche Vorgehensweisen betrachtet. Zuerst soll Java-Quelltext mithilfe von JDT in ein bestehendes Metamodell übertragen werden. Für die übrigen Artefakte wird eine Xtext-Grammatik vorgeschlagen, welche ein passendes Metamodell erzeugen kann. Die Architektur des Ansatzes wurde außerdem so gestaltet, dass eine Anpassung oder Erweiterung bezüglich der unterstützten Artefakte einfach möglich ist.</br></br>Zum Abschluss wird die prototypische Implementierung beschrieben und evaluiert. Dafür wurden zwei Fallstudien ausgewählt und mithilfe des Prototyps das Architekturmodell der Projekte extrahiert. Die Ergebnisse wurden anhand von vorher definierten Metriken anschließend untersucht. Dadurch konnte gezeigt werden, dass der Ansatz funktioniert und durch die heterogenen Artefakte ein Mehrwert zur Rekonstruktion des Architekturmodells beigetragen werden kann.rchitekturmodells beigetragen werden kann.)
    • Monitoring Complex Systems with Domain Knowledge: Adapting Contextual Bandits to Tracing Data  + (Monitoring in complex computing systems isMonitoring in complex computing systems is crucial to detect malicious states or errors in program execution. Due to the computational complexity, it is not feasible to monitor all data streams in practice. We are interested in monitoring pairs of highly correlated data streams. However we can not compute the measure of correlation for every pair of data streams at each timestep.</br></br>Picking highly correlated pairs, while exploring potentially higher correlated ones is an instance of the exploration / exploitation problem. Bandit algorithms are a family of online learning algorithms that aim to optimize sequential decision making and balance exploration and exploitation. A contextual bandit additional uses contextual information to decide better.</br></br>In our work we want to use a contextual bandit algorithm to keep an overview over highly correlated pairs of data streams. The context in our work contains information about the state of the system, given as execution traces.</br>A key part of our work is to explore and evaluate different representations of the knowledge encapsulated in traces.</br>Also we adapt state-of-the-art contextual bandit algorithms to the use case of correlation monitoring.to the use case of correlation monitoring.)
    • Integrating Structured Background Information into Time-Series Data Monitoring of Complex Systems  + (Monitoring of time series data is increasiMonitoring of time series data is increasingly important due to massive data generated by complex systems, such as industrial production lines, meteorological sensor networks, or cloud computing centers. Typical time series monitoring tasks include: future value forecasting, detecting of outliers or computing the dependencies.</br></br>However, the already existing methods for time series monitoring tend to ignore the background information such as relationships between components or process structure that is available for almost any complex system. Such background information gives a context to the time series data, and can potentially improve the performance of time series monitoring tasks.</br></br>In this bachelor thesis, we show how to incorporate structured background information to improve three different time series monitoring tasks. We perform the experiments on the data from the cloud computing center, where we extract background information from system traces. Additionally, we investigate different representations and quality of background information and conclude that its usefulness is independent from a concrete time series monitoring task.om a concrete time series monitoring task.)
    • Pattern Matching for Microservices in a Container-Based Architecture  + (Multiple containers as packages of softwarMultiple containers as packages of software code can interact with each other in a network and build together a container-based architecture. Huge architectures are hard to understand without any knowledge about the application or the applied underlying technologies. Therefore, this master thesis uses the approach of design pattern detection to reduce the amount of complexity of one architecture representation to multiple smaller pattern instances. So, a user can understand the depicted pattern instances in a short period of time by knowing the general patterns in advance.y knowing the general patterns in advance.)
    • Studienplanung mit Hilfe von Workflow-Verifikation: Fokus Dozentensicht  + (Nach der Entwicklung eines InformationssysNach der Entwicklung eines Informationssystems im Rahmen einer studentischen Teamarbeit am Lehrstuhl "Systeme der Informationsverwaltung", das den Studierenden bei der Studienplanung unterstützt, soll dieses System erweitert werden, sodass es auch den Dozenten bei der Einplanung ihrer Lehrveranstaltungen in das Lehrangebot des jeweiligen Modulhandbuchs unterstützen kann. In dieser Arbeit wurde eine Anforderungsanalyse durchgeführt und konzipiert, wie das existierende System erweitert werden kann. Der Lehrstuhl hat bereits umfangreiche Erfahrung in datengestützter Verifikation von Prozessabläufen unter Nutzung von Petri Netzen. Da ein Studienplan als Ablauf seiner Lehrveranstaltungen als Prozess allerdings mit involvierten Daten modelliert werden kann, wurden in dieser Arbeit Verifikationsmethoden untersucht und kombiniert, um eine Datenwert-basierte Verifikation von Petri-Netz-Modellen zu ermöglichen. Anhand der Ergebnisse wurden Tests durchgeführt, um zu untersuchen, inwiefern solche Verifikationsmethoden die Studienpläne auf Korrektheit überprüfen können. Die Tests und die Untersuchungen haben gezeigt, dass ein Einsatz von Verifikationsmethoden für Petri-Netze zur Unterstützung eines solchen Systems unter bestimmten Einschränkungen ermöglicht werden kann.en Einschränkungen ermöglicht werden kann.)
    • Modellierung und Simulation von verteilter und wiederverwendbarer nachrichtenbasierter Middleware  + (Nachrichtenbasierte Middleware (MOM) wird Nachrichtenbasierte Middleware (MOM) wird in verschiedenen Domänen genutzt. Es gibt eine Vielzahl von verschiedenen MOMs, die jeweils unterschiedliche Ziele oder Schwerpunkte haben. Währende die einen besonderen Wert auf Performance oder auf Verfügbarkeit legen, möchten andere allseitig einsetzbar sein. Außerdem bieten MOMs eine hohe Konfigurierbarkeit an. Das Ziel dieser Masterarbeit ist es, den Softwarearchitekten bei der Wahl und der Konfiguration einer MOM bereits in der Designphase zu unterstützen. Existierende Modellierungs- und Vorhersagetechniken vernachlässigen den Einfluss von Warteschlangen. Dadurch können bestimmte Effekte der MOM nicht abgebildet werden, zum Beispiel, das Ansteigen der Latenz einer Nachricht, wenn die Warteschlange gefüllt ist. Die Beiträge der Masterarbeit sind: Auswahl und Ausmessen einer MOM, um Effekte und Ressourcenanforderungen zu untersuchen; Performance-Modellierung einer MOM mit Warteschlangen mit anschließender Kalibrierung; Eine Modeltransformation um bereits existierende Modell-Elemente wiederzuverwenden. Der Ansatz wurde mithilfe des SPECjms2007 Benchmarks evaluiert.ilfe des SPECjms2007 Benchmarks evaluiert.)
    • Automatisierte Gewinnung von Nachverfolgbarkeitsverbindungen zwischen Softwarearchitektur und Quelltext  + (Nachverfolgbarkeitsverbindungen zwischen ANachverfolgbarkeitsverbindungen zwischen Architektur und Quelltext können das Wissen über ein System erweitern. Aufgrund des Erstellungsaufwands existieren in Softwareprojekten oft keine oder nur unvollständige Nachverfolgbarkeitsinformationen. Diese Arbeit untersucht einen Ansatz mit zwei Schritten, um automatisiert Nachverfolgbarkeitsverbindungen zwischen Architekturmodellelementen und Quelltext zu generieren. Damit die Erstellung von Nachverfolgbarkeitsverbindungen für verschiedene Programmiersprachen und Architektur-Metamodelle vereinheitlicht wird, werden im ersten Schritt aus den vorliegenden Artefakten Modelle erstellt. Der Quelltext wird dabei in ein von der konkreten Programmiersprache unabhängiges Modell überführt. Dafür wird ein Metamodell verwendet, das auf dem von der OMG spezifizierten KDM basiert. Für den zweiten Schritt werden auf den erstellten Modellen arbeitende Heuristiken und Aggregationen definiert. Diese werden genutzt, um die Nachverfolgbarkeitsverbindungen zu generieren. Die Heuristiken nutzen zum Beispiel Paket-, Pfad-, Namen- und Methoden-Informationen. Die Evaluation des Ansatzes nutzt einen dafür erstellten Goldstandard mit fünf Fallstudien. Es werden Nachverfolgbarkeitsverbindungen für PCM, UML, Java und Shell generiert. Für den Mikro-Durchschnitt des F1-Maßes wird ein Wert von 99,11 % erreicht. Fließt jede Komponente und Schnittstelle in gleichem Maße in den Wert ein, beträgt das F1-Maß 93,71 %. Insgesamt können mit dem Ansatz dieser Arbeit also sehr gute Ergebnisse erzielt werden. Für die TEAMMATES-Fallstudie wird mithilfe mehrerer Quelltextversionen der Einfluss der Konsistenz auf die Ergebnisse untersucht. Der Mikro-Durchschnitt des F1-Maßes ist für die konsistentere Version um 6,05 Prozentpunkte höher. Die Konsistenz kann also die Qualität der Ergebnisse beeinflussen. die Qualität der Ergebnisse beeinflussen.)
    • Entity Recognition in Software Documentation Using Trace Links to Informal Diagrams  + (Natural Language Software Architecture DocNatural Language Software Architecture Documentation ( NLSAD ) and Software Architecture Model ( SAM) provide information about a software systems design and qualities. Inconsistencies between these artifacts can negatively impact the comprehension and evolution of the system. ArDoCo is an approach that was proposed in prior work by Keim et al. to find such inconsistencies and relies on Traceability Link Recovery (TLR) between entities in the NLSAD and SAM . ArDoCo searches for Unmentioned Model Elements (UMEs) in the model and Missing Model Elements (MMEs) in the text using the linkage information. ArDoCo’s approach shows promising results but has room for improvement regarding precision due to falsely identified textual entities. This work proposes using informal diagrams from the Software Architecture Documentation (SAD) to improve this. The approach performs an additional TLR between the textual entities and the diagram entities. According to heuristics, the linkage of textual entities and diagram entities is utilized to increase or decrease the confidence in textual entities. The Diagram Text TLR and its impact on ArDoCo’s performance are evaluated separately using the same data set as previous work by Keim et al. The data set was extended to include informal diagrams. The Diagram Text TLR achieves a good F1-score with Optical Character Recognition (OCR) of 0.54. The approach improves the MME detection (0.77→0.94 accuracy) by lowering the amount of falsely identified textual entities (0.39→0.69 precision) with a negligible impact on recall. The UME detection and ArDoCo ’s NLSAD to SAM are slightly positively impacted and continue to perform excellently. The results show that using informal diagrams to improve entity recognition in the text is promising. Room for improvement exists in dealing with issues related to OCR and diagram element processing.ted to OCR and diagram element processing.)
    • Bestimmung von Aktionsidentität in gesprochener Sprache  + (Natürliche Sprache enthält Aktionen, die aNatürliche Sprache enthält Aktionen, die ausgeführt werden können.</br>Innerhalb eines Diskurses kommt es häufig vor, dass Menschen eine Aktion mehrmals beschreiben.</br>Dies muss nicht immer bedeuten, dass diese Aktion auch mehrmals ausgeführt werden soll.</br>Diese Bachelorarbeit untersucht, wie erkannt werden kann, ob sich eine Nennung einer Aktion auf eine bereits genannte Aktion bezieht.</br>Es wird ein Vorgehen erarbeitet, das feststellt, ob sich mehrere Aktionsnennungen in gesprochener Sprache auf dieselbe Aktionsidentität beziehen.</br>Bei diesem Vorgehen werden Aktionen paarweise verglichen.</br>Das Vorgehen wird als Agent für die Rahmenarchitektur PARSE umgesetzt und evaluiert.</br>Das Werkzeug erzielt ein F1-Maß von 0,8, wenn die Aktionen richtig erkannt werden und Informationen über Korreferenz zwischen Entitäten zur Verfügung stehen.z zwischen Entitäten zur Verfügung stehen.)
    • Performanzmodellierung von Apache Cassandra im Palladio-Komponentenmodell  + (NoSQL-Datenbankmanagementsysteme werden alNoSQL-Datenbankmanagementsysteme werden als Back-End für Software im Big-Data-Bereich verwendet, da sie im Vergleich zu relationalen Datenbankmanagementsystemen besser skalieren, kein festes Datenbankschema benötigen und in virtuellen Systemen einfach eingesetzt werden können. Apache Cassandra wurde aufgrund seiner Verbreitung und seiner Lizensierung als Open-Source-Projekt als Beispiel für NoSQL-Datenbankmanagementsysteme ausgewählt. Existierende Modelle von Apache Cassandra betrachten dabei nur die maximal mögliche Anzahl an Anfragen an Cassandra und deren Durchsatz und Latenz. Diese Anzahl zu reduzieren erhöht die Latenz der einzelnen Anfragen. Das in dieser Bachelorarbeit erstellte Modell soll unter anderem diesen Effekt abbilden.</br>Die Beiträge der Arbeit sind das Erstellen und Parametrisieren eines Modells von Cassandra im Palladio-Komponentenmodell und das Evaluieren des Modells anhand von Benchmarkergebnissen. Zudem wird für dieses Ziel eine Vorgehensweise entwickelt, die das Erheben der notwendigen Daten sowie deren Auswertung und Evaluierung strukturiert und soweit möglich automatisiert und vereinfacht.</br>Die Evaluation des Modells erfolgt durch automatisierte Simulationen, deren Ergebnisse mit den Benchmarks verglichen werden. Dadurch konnte die Anwendbarkeit des Modells für einen Thread und eine beliebige Anzahl Anfragen bei gleichzeitiger Verwendung von einer oder mehreren verschiedenen Operationen, abgesehen von der Scan-Operation, gezeigt werden.en von der Scan-Operation, gezeigt werden.)
    • Analysis of Classifier Performance on Aggregated Energy Status Data  + (Non-intrusive load monitoring (NILM) algorNon-intrusive load monitoring (NILM) algorithms aim at disaggregating consumption curves of households to the level of single appliances. However, there is no conventional way of quantifying and representing the tradeoff between the quality of analyses, such as the accuracy of the disaggregated consumption curves, and the load on the available computing resources. Thus, it is hard to plan the underlying infrastructure and resources for the analysis system and to find the optimal configuration of the system. This thesis introduces a system that assesses the quality of different analyses and their runtime behavior. This assessment is done based on varying configuration parameters and changed characteristics of the input dataset. Varied characteristics are the granularity of the data and the noisiness of the data. We demonstrate that the collected runtime behavior data can be used to choose reasonable characteristics of the input data set.ble characteristics of the input data set.)
    • Performancevorhersage für Container-Anwendungen (PdF)  + (Nowadays distributed applications are ofteNowadays distributed applications are often not statically deployed on virtual machines. Instead, a desired state is defined declaratively. A control loop then tries to create the desired state in the cluster. Predicting the impact on the performance of a system using these deployment techniques is difficult. This paper introduces a method to predict the performance impact of the usage of containers and container orchestration in the deployment of a system. Our proposed approach enables system simulation and experimentation with various mechanisms of container orchestration, including autoscaling and container scheduling. We validated this approach using a micro-service reference application across different scenarios. Our findings suggest, that the simulation could effectively mimic most features of container orchestration tools, and the performance prediction of containerized applications in dynamic scenarios could be improved significantly.scenarios could be improved significantly.)
    • Enabling Consistency between Software Artefacts for Software Adaption and Evolution  + (Nowadays, software systems are evolving atNowadays, software systems are evolving at a pace never seen before. As a result, emerging inconsistencies between different software artifacts are almost inevitable. Currently, there are already approaches for automated consistency maintenance between source code and architecture models. However, these approaches have various limitations. Therefore, in this thesis, we present a comprehensive approach for supporting the consistency preservation between software artifacts with special focus on software evolution and adaptation. At design-time, source code analysis and consistency rules are used, while at run-time, monitoring data is used as input for a transformation pipeline. In contrast to already existing approaches, the automated derivation of the system composition is supported. Ultimately, self-validations were included as a central component of the approach. In a case study based evaluation the accuracy of the models and the performance of the approach was measured. In addition, the scalability of the transformations within the pipeline was investigated.ions within the pipeline was investigated.)
    • Injection Molding Simulation based on Graph Neural Networks (GNNs)  + (Numerical filling simulations are an imporNumerical filling simulations are an important tool for the development of injection molding parts. Existing simulations rely on numerical solvers based on the finite element method. These solvers are reliable and precise, but very computationally expensive even on simple part geometries.</br>In this thesis, we aim to develop a faster injection molding simulation based on Graph Neural Networks (GNNs) as a surrogate model. Our approach learns a simulation as a composition of three functions: an encoder, a processor and a decoder. The encoder takes in a graph representation of a 3D geometry of an injection molding part and returns a numeric embedding of each node in the graph. The processor updates the embeddings of each node multiple times based on its neighbors. The decoder then decodes the final embeddings of each node into physically meaningful variables, say, the fill state of the node.</br>Our model can predict the progression of the flow front during a time step with a fixed size. To simulate a full mold filling process, our model is applied sequentially until the entire mold is filled. Our architecture is applicable to any kind of material, geometry and injection process parameters. We evaluate our architecture by its accuracy and runtime when predicting node properties. We also evaluate our models transfer learning ability on a real world injection molding part.ty on a real world injection molding part.)
    • Optimizing Parametric Dependencies for Incremental Performance Model Extraction  + (Often during the development phase of a soOften during the development phase of a software, engineers are facing different implementation alternatives. In order to test several options without investing the resources in implementing each one of them, a so-called performance model comes in practice. By using a performance models the developers can simulate the system in diverse scenarios and conditions. To minimize the differences between the real system and its model, i.e. to improve the accuracy of the model, parametric dependencies are introduced. They express a relation between the input arguments and the performance model parameters of the system. The latter could be loop iteration count, branch transition probabilities, resource demands or external service call arguments.</br>Existing works in this field have two major shortcomings - they either do not perform incremental calibration of the performance model (updating only changed parts of the source code since the last commit), or do not consider more complex dependencies than linear. </br>This work is part of the approach for the continuous integration of performance models. Our aim is to identify parametric dependencies for external service calls, as well as, to optimize the existing dependencies for the other types of performance model parameters. We propose using two machine learning algorithms for detecting initial dependencies and then refining the mathematical expressions with a genetic programming algorithm. Our contribution also includes feature selection of the candidates for a dependency and consideration not only of input service arguments but also the data flow i.e., the return values of previous external calls. return values of previous external calls.)
    • Automatically detecting Performance Regressions  + (One of the most important aspects of softwOne of the most important aspects of software engineering is system performance. Common approaches to verify acceptable performance include running load tests on deployed software. However, complicated workflows and requirements like the necessity of deployments and extensive manual analysis of load test results cause tests to be performed very late in the development process, making feedback on potential performance regressions available much later after they were introduced.</br></br>With this thesis, we propose PeReDeS, an approach that integrates into the development cycle of modern software projects, and explicitly models an automated performance regression detection system that provides feedback quickly and reduces manual effort for setup and load test analysis. PeReDeS is embedded into pipelines for continuous integration, manages the load test execution and lifecycle, processes load test results and makes feedback available to the authoring developer via reports on the coding platform. We further propose a method for detecting deviations in performance on load test results, based on Welch's t-test. The method is adapted to suit the context of performance regression detection, and is integrated into the PeReDeS detection pipeline. We further implemented our approach and evaluated it with an user study and a data-driven study to evaluate the usability and accuracy of our method. the usability and accuracy of our method.)
    • Evaluating architecture-based performance prediction for MPI-based systems  + (One research field of High Performance ComOne research field of High Performance Computing (HPC) is computing clusters. Computing clusters are distributed memory systems where different machines are connected through a network. To enable the machines to communicate with each other they need the ability to pass messages to each other through the network. The Message Passing Interface (MPI) is the standard in implementing parallel systems for distributed memory systems. To enable software architects in predicting the performance of MPI-based systems several approaches have been proposed. However, those approaches depend either on an existing implementation of a program or are tailored for specific programming languages or use cases. In our approach, we use the Palladio Component Model (PCM) that allows us to model component-based architectures and to predict the performance of the modeled system. We modeled different MPI functions in the PCM that serve as reusable patterns and a communicator that is required for the MPI functions. The expected benefit is to provide patterns for different MPI functions that allow a precise modelation of MPI-based systems in the PCM. And to obtain a precise performance prediction of a PCM instance. performance prediction of a PCM instance.)
    • Batch query strategies for one-class active learning  + (One-class classifiers learn to distinguishOne-class classifiers learn to distinguish normal objects from outliers. These classifiers are therefore suitable for strongly imbalanced class distributions with only a small fraction of outliers. Extensions of one-class classifiers make use of labeled samples to improve classification quality. As this labeling process is often time-consuming, one may use active learning methods to detect samples where obtaining a label from the user is worthwhile, with the goal of reducing the labeling effort to a fraction of the original data set. In the case of one-class active learning this labeling process consists of sequential queries, where the user labels one sample at a time. While batch queries where the user labels multiple samples at a time have potential advantages, for example parallelizing the labeling process, their application has so far been limited to binary and multi-class classification. In this thesis we explore whether batch queries can be used for one-class classification. We strive towards a novel batch query strategy for one-class classification by applying concepts from multi-class classification to the requirements of one-class active learning.requirements of one-class active learning.)
    • Performance Modeling of Distributed Computing  + (Optimizing resource allocation in distribuOptimizing resource allocation in distributed computing systems is crucial for enhancing system efficiency and reliability. Predicting job execution metadata, based on resource demands and platform characteristics, plays a key role in this optimization process.</br>Distributed computing simulators are utilized for this purpose to model and predict system behaviors.</br>Among the various simulators developed in recent decades, this thesis specifically focuses on the state-of-the-art simulator DCSim. DCSim simulates the nodes and links of the configured platform, generates the workloads according to configured parameter distributions, and performs the simulations. The simulated job execution metadata is accurate, yet the simulations demand computational resources and time that increase superlinearly with the number of nodes simulated.</br></br>In this thesis, we explore the application of Recurrent Neural Networks and Transformer models for predicting job execution metadata within distributed computing environments.</br>We focus on data preparation, model training, and evaluation for handling numerical sequences of varying lengths.</br>This approach enhances the scalability of predictive systems by leveraging deep neural networks to interpret and forecast job execution metadata based on simulated data or historical data.</br></br>We assess the models across four scenarios of increasing complexity, evaluating their ability to generalize for unseen jobs and platforms.</br>We examine the training duration and the amount of data necessary to achieve accurate predictions and discuss the applicability of such models to overcome the scalability challenges of DCSim.</br>The key findings of this work demonstrate that the models are capable of generalizing across sequences of lengths encountered during training but fall short in generalizing across different platforms.n generalizing across different platforms.)
    • Density-Based Outlier Detection Benchmark on Synthetic Data  + (Outlier detection algorithms are widely usOutlier detection algorithms are widely used in application fields such as image processing and fraud detection. Thus, during the past years, many different outlier detection algorithms were developed. While a lot of work has been put into comparing the efficiency of these algorithms, comparing methods in terms of effectiveness is rather difficult. One reason for that is the lack of commonly agreed-upon benchmark data.</br>In this thesis the effectiveness of density-based outlier detection algorithms (such as KNN, LOF </br>and related methods) on entirely synthetically generated data are compared, using its underlying density as ground truth.ng its underlying density as ground truth.)
    • High-Dimensional Neural-Based Outlier Detection  + (Outlier detection in high-dimensional spacOutlier detection in high-dimensional spaces is a challenging task because of consequences of the curse of dimensionality. Neural networks have recently gained in popularity for a wide range of applications due to the availability of computational power and large training data sets. Several studies examine the application of different neural network models, such an autoencoder, self-organising maps and restricted Boltzmann machines, for outlier detection in mainly low-dimensional data sets. In this diploma thesis we investigate if these neural network models can scale to high-dimensional spaces, adapt the useful neural network-based algorithms to the task of high-dimensional outlier detection, examine data-driven parameter selection strategies for these algorithms, develop suitable outlier score metrics for these models and investigate the possibility of identifying the outlying dimensions for detected outliers.outlying dimensions for detected outliers.)
    • Bachelorarbeit: Local Outlier Factor for Feature‐evolving Data Streams  + (Outlier detection is a core task of data sOutlier detection is a core task of data stream analysis. As such, many algorithms targeting this problem exist, but tend to treat the data as so-called row stream, i.e., observations arrive one at a time with a fixed number of features. However, real-world data often has the form of a feature-evolving stream: Consider the task of analyzing network data in a data center - here, nodes may be added and removed at any time, changing the features of the observed stream. While highly relevant, most existing outlier detection algorithms are not applicable in this setting. Further, increasing the number of features, resulting in high-dimensional data, poses a different set of problems, usually summarized as "the curse of dimensionality".</br></br>In this thesis, we propose FeLOF, addressing this challenging setting of outlier detection in feature-evolving and high-dimensional data. Our algorithms extends the well-known Local Outlier Factor algorithm to the feature-evolving stream setting. We employ a variation of StreamHash random hashing projections to create a lower-dimensional feature space embedding, thereby mitigating the effects of the curse of dimensionality. To address non-stationary data distributions, we employ a sliding window approach. FeLOF utilizes efficient data structures to speed up search queries and data updates.</br></br>Extensive experiments show that our algorithm achieves state-of-the-art outlier detection performance in the static, row stream and feature-evolving stream settings on real-world benchmark data. Additionally, we provide an evaluation of our StreamHash adaptation, demonstrating its ability to cope with sparsely populated high-dimensional data. sparsely populated high-dimensional data.)
    • Density-Based Outlier Detection Benchmark on Synthetic Data (Thesis)  + (Outlier detection is a popular topic in reOutlier detection is a popular topic in research, with a number of different approaches developed. Evaluating the effectiveness of these approaches however is a rather rarely touched field. The lack of commonly accepted benchmark data most likely is one of the obstacles for running a fair comparison of unsupervised outlier detection algorithms. This thesis compares the effectiveness of twelve density-based outlier detection algorithms in nearly 800.000 experiments over a broad range of algorithm parameters using the probability density as ground truth.g the probability density as ground truth.)
    • Subspace Generative Adversarial Learning for Unsupervised Outlier Detection  + (Outlier detection is an important yet chalOutlier detection is an important yet challenging task, especially for unlabeled, high-dimensional, datasets. Due to their self-supervised generative nature, Generative Adversarial Networks (GAN) have proven themselves to be one of the most powerful deep learning methods for outlier detection. However, most state-of-the-art GANs for outlier detection share common limitations. Oftentimes we only achieve great results if the model’s hyperparameters are properly tuned or the underlying network structure is adjusted. This optimization is not possible in practice when the data is unlabeled. If not tuned properly, it is not unusual that a state-of-the-art GAN method is outperformed by simpler shallow methods.</br>We propose using a GAN architecture with feature ensemble learning to address hyperparameter sensibility and architectural dependency. This follows the success of feature ensembling in mitigating these problems inside other areas of Deep Learning. This thesis will study the optimization problem, training, and tuning of feature ensemble GANs in an unsupervised scenario, comparing it to other deep generative methods in a similar setting.p generative methods in a similar setting.)
    • Neural-Based Outlier Detection in Data Streams  + (Outlier detection often needs to be done uOutlier detection often needs to be done unsupervised with high dimensional data in data streams. “Deep structured energy-based models” (DSEBM) and “Variational Denoising Autoencoder” (VDA) are two promising approaches for outlier detection. They will be implemented and adapted for usage in data streams. Finally, their performance will be shown in experiments including the comparison with state of the art approaches.mparison with state of the art approaches.)
    • Adaptive Variational Autoencoders for Outlier Detection in Data Streams  + (Outlier detection targets the discovery ofOutlier detection targets the discovery of abnormal data patterns. Typical scenarios, such as are fraud detection and predictive maintenance are particularly challenging, since the data is available as an infinite and ever evolving stream. In this thesis, we propose Adaptive Variational Autoencoders (AVA), a novel approach for unsupervised outlier detection in data streams.</br></br>Our contribution is two-fold: (1) we introduce a general streaming framework for training arbitrary generative models on data streams. Here, generative models are useful to capture the history of the stream. (2) We instantiate this framework with a Variational Autoencoder, which adapts its network architecture to the dimensionality of incoming data.</br></br>Our experiments against several benchmark outlier data sets show that AVA outperforms the state of the art and successfully adapts to streams with concept drift.ully adapts to streams with concept drift.)
    • Scenario Discovery with Active Learning  + (PRIM (Patient Rule Induction Method) is anPRIM (Patient Rule Induction Method) is an algorithm used for discovering scenarios, by creating hyperboxes in the input space. Yet PRIM alone usually requires large datasets and computational simulations can be expensive. Consequently, one wants to obtain scenarios while reducing the number of simulations. It has been shown, that combining PRIM with machine learning models, can reduce the number of necessary simulation runs by around 75%.</br>In this thesis, I analyze nine different active learning sampling strategies together with several machine learning models, in order to find out if active learning can systematically improve PRIM even further, and if out of those strategies and models, a most beneficial combination of sampling method and intermediate machine learning model exists for this purpose.ne learning model exists for this purpose.)
    • Patient Rule Induction Method with Active Learning  + (PRIM (Patient Rule Induction Method) is anPRIM (Patient Rule Induction Method) is an algorithm for discovering scenarios from simulations, by creating hyperboxes, that are human-comprehensible. Yet PRIM alone requires relatively large datasets and computational simulations are usually quite expensive. Consequently, one wants to obtain a plausible scenario, with a minimal number of simulations. It has been shown, that combining PRIM with ML models, which generalize faster, can reduce the number of necessary simulation runs by around 75%.</br>We will try to reduce the number of simulation runs even further, using an active learning approach to train an intermediate ML model. </br>Additionally, we extend the previously proposed methodology to not only cover classification but also regression problems. A preliminary experiment indicated, that the combination of these methods, does indeed help reduce the necessary runs even further. In this thesis, I will analyze different AL sampling strategies together with several intermediate ML models to find out if AL can systematically improve existing scenario discovery methods and if a most beneficial combination of sampling method and intermediate ML model exists for this purpose.rmediate ML model exists for this purpose.)
    • A Parallelizing Compiler for Adaptive Auto-Tuning  + (Parallelisierende Compiler und Auto-Tuner Parallelisierende Compiler und Auto-Tuner sind zwei der vielen Technologien, die Entwick-</br>lern das Schreiben von leistungsfähigen Anwendungen für moderne heterogene Systeme</br>erleichtern können. In dieser Arbeit stellen wir einen parallelisierenden Compiler vor, der</br>Parallelität in Programmen erkennen und parallelen Code für heterogene Systeme erzeu-</br>gen kann. Außerdem verwendet der vorgestellte Compiler Auto-Tuning, um eine optimale</br>Partitionierung der parallelisierten Codeabschnitte auf mehrere Plattformen zur Laufzeit</br>zu finden, welche die Ausführungszeit minimiert. Anstatt jedoch die Parallelisierung ein-</br>mal für jeden parallelen Abschnitt zu optimieren und die gefundenen Konfigurationen so</br>lange zu behalten wie das Programm ausgeführt wird, sind Programme, die von unserem</br>Compiler generiert wurden, in der Lage zwischen verschiedenen Anwendungskontexten zu</br>unterscheiden, sodass Kontextänderungen erkannt und die aktuelle Konfiguration für je-</br>den vorkommenden Kontext individuell angepasst werden kann. Zur Beschreibung von</br>Kontexten verwenden wir sogenannte Indikatoren, die bestimmte Laufzeiteigenschaften</br>des Codes ausdrücken und in den Programmcode eingefügt werden, damit sie bei der Aus-</br>führung ausgewertet und vom Auto-Tuner verwendet werden können. Darüber hinaus</br>speichern wir gefundene Konfigurationen und die zugehörigen Kontexte in einer Daten-</br>bank, sodass wir Konfigurationen aus früheren Läufen wiederverwenden können, wenn die</br>Anwendung erneut ausgeführt wird.</br>Wir evaluieren unseren Ansatz mit der Polybench Benchmark-Sammlung. Die Ergeb-</br>nisse zeigen, dass wir in der Lage sind, Kontextänderungen zur Laufzeit zu erkennen und</br>die Konfiguration dem neuen Kontext entsprechend anzupassen, was im Allgemeinen zu</br>niedrigeren Ausführungszeiten führt.en zu niedrigeren Ausführungszeiten führt.)
    • Calibrating Performance Models for Particle Physics Workloads  + (Particle colliders are a primary method ofParticle colliders are a primary method of conducting experiments in particle physics, as they allow to both create short-lived, high-energy particles and observe their properties. The world’s largest particle collider, the Large Hadron Collider (subsequently referred to as LHC), is operated by the European Organization for Nuclear Research (CERN) near Geneva. The operation of this kind of accelerator requires the storage and computationally intensive analysis of large amounts of data. The Worldwide LHC Computing Grid (WLCG), a global computing grid, is being run alongside the LHC to serve this purpose.</br></br>This Bachelor’s thesis aims to support the creation of an architecture model and simulation for parts of the WLCG infrastructure with the goal of accurately being able to simulate and predict changes in the infrastructure such as the replacement of the load balancing strategies used to distribute the workload between available nodes.bute the workload between available nodes.)
    • Adaptive Monitoring for Continuous Performance Model Integration  + (Performance Models (PMs) can be used to prPerformance Models (PMs) can be used to predict software performance and evaluate the alternatives at the design stage. Building such models manually is a time consuming and not suitable for agile development process where quick releases have to be generated in short cycles. To benefit from model-based performance prediction during agile software development the developers tend to extract PMs automatically. Existing approaches that extract PMs based on reverse-engineering and/or measurement techniques require to monitor and analyze the whole system after each iteration, which will cause a high monitoring overhead.</br>The Continuous Integration of Performance Models (CIPM) approach address this problem by updating the PMs and calibrate it incrementally based on the adaptive monitoring of the changed parts of the code.</br></br>In this work, we introduced an adaptive monitoring approach for performance model integration, which instruments automatically only the changed parts of the source code using specific pre-defined probes types. Then it monitors the system adaptively. The resulting measurements are used by CIPM to estimate PM parameters incrementally.</br></br>The evaluation confirmed that our approach can reduce the monitoring overhead to 50%.can reduce the monitoring overhead to 50%.)
    • (Freiwillige Teilnahme) Abschlussvortrag Praxis der Forschung SS23 I  + (Performancevorhersage für Container-AnwendPerformancevorhersage für Container-Anwendungen</br>Abstract: Nowadays distributed applications are often not statically deployed on virtual machines. Instead, a desired state is defined declaratively. A control loop then tries to create the desired state in the cluster. Predicting the impact on the performance of a system using these deployment techniques is difficult. This paper introduces a method to predict the performance impact of the usage of containers and container orchestration in the deployment of a system. Our proposed approach enables system simulation and experimentation with various mechanisms of container orchestration, including autoscaling and container scheduling. We validated this approach using a micro-service reference application across different scenarios. Our findings suggest, that the simulation could effectively mimic most features of container orchestration tools, and the performance prediction of containerized applications in dynamic scenarios could be improved significantly.scenarios could be improved significantly.)
    • Tuning of Explainable Artificial Intelligence (XAI) tools in the field of text analysis  + (Philipp Weinmann will present his plan forPhilipp Weinmann will present his plan for his Bachelor thesis with the title: Tuning of Explainable Artificial Intelligence (XAI) tools in the field of text analysis: He will present a global introduction to explainers for Artificial Intelligence in the context of NLP. We will then explore in details one of these tools: Shap, a perturbation based local explainer and talk about evaluating shap-explanations.d talk about evaluating shap-explanations.)
    • Explainable Artificial Intelligence for Decision Support  + (Policy makers face the difficult task to mPolicy makers face the difficult task to make far-reaching decisions that impact the life of the the entire population based on uncertain parameters that they have little to no control</br>over, such as environmental impacts. Often, they use scenarios in their decision making process. Scenarios provide a common and intuitive way to communicate and characterize different uncertain outcomes in many decision support applications,</br>especially in broad public debates. However, they often fall short of their potential, particularly when applied for groups with diverse</br>interests and worldviews, due to the difficulty of choosing a small number of scenarios to summarize the entire range of uncertain future outcomes. Scenario discovery addresses these problems by using statistical or data-mining algorithms to find easy-to-interpret, policy-relevant regions in the space of uncertain input parameters of computer simulation models. One of many approaches to scenario discovery is subgroup discovery, an approach from the domain of explainable Artificial Intelligence.</br></br>In this thesis, we test and evaluate multiple different subgroup discovery methods for their applicabilty to scenario discovery applications.abilty to scenario discovery applications.)
    • Symbolic Performance Modeling  + (Predicting software performance under diffPredicting software performance under different configurations is a challenging task due to the large amount of possible configurations. Performance-influence models help stakeholders understand how configuration options and their interactions influence the performance of a program. A crucial part of the performance modeling process is the design of an experiment set that delivers performance measurements which are used as input for a machine learning algorithm that learns the performance model. An optimal experiment set should contain the minimal amount of experiments that produces a sufficiently accurate performance model.</br></br>The topic of this thesis is Symbolic Performance Modeling, a new white-box approach to the analysis of the configuration options' influence on the software's performance. The approach utilizes taint analysis to determine where in the source code configuration options influence the software's performance and symbolic execution to determine whether the influence is significant. We assume that only loop constructs with non-constant iteration counts change the asymptotic behavior of the program. The Feature Taint Analysis provided by VaRA is used to determine which configuration options influence loops, while the Path Tracing provided by PhASAR is used to construct all control-flow paths leading to the loops and their respective path conditions. The SMT Solver Z3 is then used to derive value ranges from the path conditions for the configuration options which influence the loop constructs. We determine the significance of a configuration option's influence based on the size of its value range.</br></br>We implement the proof-of-concept tool Symbolic Performance Modeling Value Generator to evaluate the approach with regard to its capabilities to analyze real-world applications and its performance. From the insights gained during the evaluation, we define limitations of the current implementation and propose improvements for future work. and propose improvements for future work.)
    • Enhancing Non-Invasive Human Activity Recognition by Fusioning Electrical Load and Vibrational Measurements  + (Professional installation of stationary seProfessional installation of stationary sensors burdens the adoption of Activity Recognition Systems in households. This can be circumvented by utilizing sensors that are cheap, easy to set up and adaptable to a variety of homes. Since 72% of European consumers will have Smart Meters by 2020, it provides an omnipresent basis for Activity Recognition. </br>This thesis investigates, how a Smart Meter’s limited recognition of appliance involving activities can be extended by Vibration Sensors. We provide an experimental setup to aggregate a dedicated dataset with a sampling frequency of 25,600 Hz. We evaluate the impact of combining a Smart Meter and Vibration Sensors on a system’s accuracy, by means of four developed Activity Recognition Systems. This results in the quantification of the impact. We found out that through combining these sensors, the accuracy of an Activity Recognition System rather strives towards the highest accuracy of a single underlying sensor, than jointly surpassing it.rlying sensor, than jointly surpassing it.)
    • Evidence-based Token Abstraction for Software Plagiarism Detection  + (Programming assignments for students are tProgramming assignments for students are target of plagiarism. Especially for graded assignments, instructors want to detect plagiarism among the students. For larger courses, however, manual inspection of all submissions is a resourceful task. For this purpose, there are numerous tools that can help detect plagiarism in submissions. Many well-known plagiarism detection tools are token-based detectors. In an abstraction step, they map source code to a list of tokens, and such lists are then compared with each other. While there is much research in the area of comparison algorithms, the mapping is often only considered superficially. In this work, we conduct two experiments that address the issue of token abstraction. For that, we design different token abstractions and explain their differences. We then evaluate these abstractions using multiple datasets. We show that different abstractions have pros and cons, and that a higher abstraction level does not necessarily perform better. These findings are useful when adding support for new programming languages and for improving existing plagiarism detection tools. Furthermore, the results can be helpful to choose abstractions tailored to specific requirements.actions tailored to specific requirements.)
    • Theory-Guided Data Science for Battery Voltage Prediction: A Systematic Guideline  + (Purely data-driven Data Science approachesPurely data-driven Data Science approaches tend to underperform when applied to scientific problems, especially when there is little data available. Theory-guided Data Science (TGDS) incorporates existing problem specific domain knowledge in order to increase the performance of Data Science models. It has already proved to be successful in scientific disciplines like climate science or material research.</br></br>Although there exist many TGDS methods, they are often not comparable with each other, because they were originally applied to different types of problems. Also, it is not clear how much domain knowledge they require. There currently exist no clear guidelines on how to choose the most suitable TGDS method when confronted with a concrete problem.</br></br>Our work is the first one to compare multiple TGDS methods on a time series prediction task. We establish a clear guideline by evaluating the performance and required domain knowledge of each method in the context of lithium-ion battery voltage prediction. As a result, our work could serve as a starting point on how to select the right TGDS method when confronted with a concrete problem.d when confronted with a concrete problem.)
    • Using Architectural Design Space Exploration to Quantify Cost-to-Quality Relationship  + (QUPER ist eine Methode um bei einer ReleasQUPER ist eine Methode um bei einer Release-Plannung, bei der eine bestimmte Qualitätsanforderung zentral ist, das Fällen von Entscheidungen einfacher zu machen. Die Methode ist genau dann äußerst hilfreich, wenn das Softwareprojekt mehrere konkurrierende Produkte auf dem Markt hat und eine bestimmte Qualitätsanforderung den Wert der Software für den Kunden stark beeinflusst. QUPER benötigt allerdings Schätzungen des Entwicklungsteams und ist somit stark von der Erfahrung dessen abhängig. Das Palladio Component Model in Kombination mit PerOpteryx können dabei helfen, diese groben Schätzungen durch genauere Information für ein kommendes Release zu ersetzen: Mit einem gegebenen Palladio-Modell und einer potentiellen Verbesserung für die Software kann uns PerOpteryx die genaue Verbesserung der Qualitätsanforderung geben. In dieser Arbeit werden zuerst die QUPER-Methode allein und dann QUPER mit Hilfe von PerOpteryx auf zwei exemplarische Softwareprojekte angewandt und die Ergebnisse verglichen.e angewandt und die Ergebnisse verglichen.)
    • Modularization approaches in the context of monolithic simulations  + (Quality characteristics of a software systQuality characteristics of a software system such as performance or reliability can determine</br>its success or failure. In traditional software engineering, these characteristics can</br>only be determined when parts of the system are already implemented and past the design</br>process. Computer simulations allow to determine estimations of quality characteristics</br>of software systems already during the design process. Simulations are build to analyse</br>certain aspects of systems. The representation of the system is specialised for the specific analysis. This specialisation often results in a monolithic design of the simulation.</br>Monolithic structures, however, can induce reduced maintainability of the simulation and</br>decreased understandability and reusability of the representations of the system. The</br>drawbacks of monolithic structures can be encountered by the concept of modularisation,</br>where one problem is divided into several smaller sub-problems. This approach allows an</br>easier understanding and handling of the sub-problems.</br>In this thesis an approach is provided to describe the coupling of newly developed</br>and already existing simulations to a modular simulation. This approach consists of a</br>Domain-Specific Language (DSL) developed with model-driven technologies. The DSL</br>is applied in a case-study to describe the coupling of two simulations. The coupling of</br>these simulations with an existing coupling approach is implemented according to the</br>created description. An evaluation of the DSL is conducted regarding its completeness to</br>describe the coupling of several simulations to a modular simulation. Additionally, the</br>modular simulation is examined regarding the accuracy of preserving the behaviour of the</br>monolithic simulation. The results of the modular simulation and the monolithic version</br>are compared for this purpose. The created modular simulation is additionally evaluated</br>in regard to its scalability by analysis of the execution times when multiple simulations</br>are coupled. Furthermore, the effect of the modularisation on the simulation execution</br>times is evaluated.</br>The obtained evaluation results show that the DSL can describe the coupling of the two</br>simulations used in the case-study. Furthermore, the results of the accuracy evaluation</br>suggest that problems in the interaction of the simulations with the coupling approach exist.</br>However, the results also show that the overall behaviour of the monolithic simulation is</br>preserved in its modular version. The analysis of the execution times suggest, that the</br>modular simulation experiences an increase in execution time compared to the monolithic</br>version. Also, the results regarding the scalability show that the execution time of the</br>modular simulation does not increase exponentially with the number of coupled simulations.ly with the number of coupled simulations.)
    • Parametrisierung der Spezifikation von Qualitätsannotationen in Software-Architekturmodellen  + (Qualitätseigenschaften von komponentenbasiQualitätseigenschaften von komponentenbasierten Software-Systemen hängen sowohl von den eingesetzten Komponenten, als auch von ihrem eingesetzten Kontext ab. Während die kontextabhängige Parametrisierung für einzelne Qualitätsanalysemodelle, wie z.B. Performance, bereits fundiert wissenschaftlich analysiert wurde, ist dies für andere Qualitätsattribute, insbesondere für qualitativ beschreibende Modelle, noch ungeklärt. Die vorgestellte Arbeit stellt die Qualitätseffekt-Spezifikation vor, die eine kontextabhängige Analyse und Transformation beliebiger Qualitätsattribute erlaubt. Der Ansatz enthält eine eigens entworfene domänenspezifischen Sprache zur Modellierung von Auswirkungen in Abhängigkeit des Kontextes und dazu entsprechende Transformation der Qualitätsannotationen. Transformation der Qualitätsannotationen.)
    • Generalized Monte Carlo Dependency Estimation with improved Convergence  + (Quantifying dependencies among variables iQuantifying dependencies among variables is a fundamental task in data analysis. It allows to understand data and to identify the variables required to answer specific questions. Recent studies have positioned Monte Carlo Dependency Estimation (MCDE) as a state-of-the-art tool in this field.</br>MCDE quantifies dependencies as the average discrepancy between marginal and conditional distributions. In practice, this value is approximated with a dependency estimator. However, the original implementation of this estimator converges rather slowly, which leads to suboptimal results in terms of statistical power. Moreover, MCDE is only able to quantify dependencies among univariate random variables, but not multivariate ones. In this thesis, we make 2 major improvements to MCDE. First, we propose 4 new dependency estimators with faster convergence. We show that MCDE equipped with these new estimators achieves higher statistical power. Second, we generalize MCDE to GMCDE (Generalized Monte Carlo Dependency Estimation) to quantify dependencies among multivariate random variables. We show that GMCDE inherits all the desirable properties of MCDE and demonstrate its superiority against the state-of-the-art dependency measures with experiments.-art dependency measures with experiments.)
    • Adaptives Online-Tuning für kontinuierliche Zustandsräume  + (Raytracing ist ein rechenintensives VerfahRaytracing ist ein rechenintensives Verfahren zur Erzeugung photorealistischer Bilder. Durch die automatische Optimierung von Parametern, die Einfluss auf die Rechenzeit haben, kann die Erzeugung von Bildern beschleunigt werden. Im Rahmen der vorliegenden Arbeit wurde der Auto-Tuner libtuning um ein generalisiertes Reinforcement Learning-Verfahren erweitert, das in der Lage ist, bestimmte Charakteristika der zu zeichnenden Frames bei der Auswahl geeigneter Parameterkonfigurationen zu berücksichtigen. Die hierfür eingesetzte Strategie ist eine ε-gierige Strategie, die für die Exploration das Nelder-Mead-Verfahren zur Funktionsminimierung aus libtuning verwendet. Es konnte gezeigt werden, dass ein Beschleunigung von bis zu 7,7 % in Bezug auf die gesamte Rechenzeit eines Raytracing-Anwendungsszenarios dieser Implementierung gegenüber der Verwendung von libtuning erzielt werden konnte.ndung von libtuning erzielt werden konnte.)
    • Integration of Reactions and Mappings in Vitruvius  + (Realizing complex software projects is oftRealizing complex software projects is often done by utilizing multiple programming or modelling languages. Separate parts of the software are relevant to certain development tasks or roles and differ in their representation. These separate representations are related and contain redundant information. Such redundancies exist for example with an implementation class for a component description, which has to implement methods with signatures as specified by the component. Whenever redundant information is affected in a development update, other representations that contain redundant information have to be updated as well. This additional development effort is required to keep the redundant information consistent and can be costly.</br></br>Consistency preservation languages can be used to describe how consistency of representations can be preserved, so that in use with further development tools the process of updating redundant information is automated. However, such languages vary in their abstraction level and expressiveness. Consistency preservation languages with higher abstraction specify what elements of representations are considered consistent in a declarative manner. A language with less abstraction concerns how consistency is preserved after an update using imperative instructions. A common trade-off in the decision for selecting a fitting language is between expressiveness and abstraction. Higher abstraction on the one hand implies less specification effort, on the other hand it is restricted in expressiveness compared to a more specific language.</br></br>In this thesis we present a concept for combining two consistency specification languages of different abstraction levels. Imperative constructs of a less abstract language are derived from declarative consistency expressions of a language of higher abstraction and combined with additional imperative constructs integrated into the combined language. The combined language grants the benefits of the more abstract language and enables realizing parts of the specification without being restricted in expressiveness. As a consequence a developer profits from the advantages of both languages, as previously a specification that can not be completely expressed with the more abstract language has to be realized entirely with the less abstract language.</br></br>We realize the concepts by combining the Reactions and Mappings language of the VITRUVIUS project. The imperative Reactions language enables developers to specify</br>triggers for certain model changes and repair logic. As a more abstract language, Mappings specify consistency with a declarative description between elements of two representations and what conditions for the specific elements have to apply. We research the limits of expressiveness of the declarative description and depict, how scenarios are supported that require complex consistency specifications. An evaluation with a case study shows the applicability of the approach, because an existing project, prior using the Reactions language, can be realized with the combination concept. Furthermore, the compactness of the preservation specification is increased.e preservation specification is increased.)
    • On the semantics of similarity in deep trajectory representations  + (Recently, a deep learning model (t2vec) foRecently, a deep learning model (t2vec) for trajectory similarity computation has been proposed. Instead of using the trajectories, it uses their deep representations to compute the similarity between them. At this current state, we do not have a clear idea how to interpret the t2vec similarity values, nor what they are exactly based on. This thesis addresses these two issues by analyzing t2vec on its own and then systematically comparing it to the the more familiar traditional models.</br></br>Firstly, we examine how the model’s parameters influence the probability distribution (PDF) of the t2vec similarity values. For this purpose, we conduct experiments with various parameter settings and inspect the abstract shape and statistical properties of their PDF. Secondly, we consider that we already have an intuitive understanding of the classical models, such as Dynamic Time Warping (DTW) and Longest Common Subsequence (LCSS). Therefore, we use this intuition to analyze t2vec by systematically comparing it to DTW and LCSS with the help of heat maps.o DTW and LCSS with the help of heat maps.)
    • Implementation and Evaluation of CHQL Operators in Relational Database Systems to Query Large Temporal Text Corpora  + (Relational database management systems havRelational database management systems have an important place in the informational revolution. Their release on the market facilitates the storing and analysis of data. In the last years, with the release of large temporal text corpora, it was proven that domain experts in conceptual history could also benefit from the performance of relational databases. Since the relational algebra behind them lacks special functionality for this case, the Conceptual History Query Language (CHQL) was developed. The first result of this thesis is an original implementation of the CHQL operators in a relational database, which is written in both SQL and its procedural extension. Secondly, we improved substantially the performance with the trigram indexes. Lastly, the query plan analysis reveals the problem behind the query optimizers choice of inefficient plans, that is the inability of predicting correctly the results from a stored function.rectly the results from a stored function.)
    • Analysis and Visualization of Semantics from Massive Document Directories  + (Research papers are commonly classified inResearch papers are commonly classified into categories, and we can see the existing contributions as a massive document directory, with sub-folders. However, research typically evolves at an extremely fast pace; consider for instance the field of computer science. It can be difficult to categorize individual research papers, or to understand how research communities relate to each other.</br>In this thesis we will analyze and visualize semantics from massive document directories. The results will be displayed using the arXiv corpus, which contains domain-specific (computer science) papers of the past thirty years. The analysis will illustrate and give insight about past trends of document directories and how their relationships evolve over time. how their relationships evolve over time.)
    • Anforderung-zu- Quelltextrückverfolgbarkeit mittels Wort- und Quelltexteinbettungen  + (Rückverfolgbarkeitsinformationen helfen EnRückverfolgbarkeitsinformationen helfen Entwickler beim Verständnis von Softwaresystemen und dienen als Grundlage für weitere Techniken wie der Abdeckungsanalyse. In dieser Arbeit wird untersucht, wie Einbettungen für die automatische Rückverfolgbarkeit zwischen Anforderungen und Quelltext eingesetzt werden können. Dazu werden verschiedene Möglichkeiten betrachtet, die Anforderungen und den Quelltext mit Einbettungen zu repräsentieren und anschließend aufeinander abzubilden, um Rückverfolgbarkeitsverbindungen zwischen ihnen zu erzeugen. Für eine Klasse existieren beispielsweise viele Optionen, welche Informationen bzw. welche Klassenelemente zur Berechnung einer Quelltexteinbettung berücksichtigt werden. Für die Abbildung werden zwischen den Einbettungen durch eine Metrik Ähnlichkeitswerte berechnet, mit deren Hilfe Aussagen über die Existenz einer Rückverfolgbarkeitsverbindung zwischen ihren repräsentierten Artefakten getroffen werden können.</br>In der Evaluation wurden die verschiedenen Möglichkeiten für die Einbettung und Abbildung untereinander und mit anderen Arbeiten verglichen. Bezüglich des F1-Wertes erzeugen Quelltexteinbettungen mit Klassennamen, Methodensignaturen und -kommentaren sowie Abbildungsverfahren, die die Word Mover’s Distance als Ähnlichkeitsmetrik nutzen, die besten projektübergreifenden Ergebnisse. Das beste Verfahren erreicht auf dem Projekt LibEST, welches aus 14 Quelltext- und 52 Anforderungsartefakten besteht, einen F1-Wert von 60,1%. Die beste projektübergreifende Konfiguration erzielt einen durchschnittlichen F1-Wert von 39%. einen durchschnittlichen F1-Wert von 39%.)
    • Bestimmung der semantischen Funktion von Quelltextabschnitten  + (Rückverfolgbarkeitsinformationen zwischen Rückverfolgbarkeitsinformationen zwischen Quelltext und Anforderungen ermöglichen es Werkzeugen Programmierer besser bei der Navigation und der Bearbeitung von Quelltext zu unterstützen. Um solche Verbindungen automatisiert herstellen zu können, muss die Semantik der Anforderungen und des Quelltextes verstanden werden. Im Rahmen dieser Arbeit wird ein Verfahren zur Beschreibung der geteilten Semantik von Gruppierungen von Programmelementen entwickelt. Das Verfahren basiert auf dem statistischen Themenmodell LDA und erzeugt eine Menge von Schlagwörtern als Beschreibung dieser Semantik. Es werden natürlichsprachliche Inhalte im Quelltext der Gruppierungen analysiert und genutzt, um das Modell zu trainieren. Um Unsicherheiten in der Wahl der Parameter von LDA auszugleichen und die Robustheit der Schlagwortmenge zu verbessern, werden mehrere LDA-Modelle kombiniert. Das entwickelte Verfahren wurde im Rahmen einer Nutzerstudie evaluiert. Insgesamt wurde eine durchschnittliche Ausbeute von 0.73 und ein durchschnittlicher F1-Wert von 0.56 erreicht.chschnittlicher F1-Wert von 0.56 erreicht.)
    • Improving Document Information Extraction with efficient Pre-Training  + (SAP Document Information Extraction (DOX) SAP Document Information Extraction (DOX) is a service to extract logical entities from scanned documents based on the well-known Transformer architecture. The entities comprise header information such as document date or sender name, and line items from tables on the document with fields such as line item quantity. The model currently needs to be trained on a huge number of labeled documents, which is impractical. Also, this hinders the deployment of the model at large scale, as it cannot easily adapt to new languages or document types. Recently, pretraining large language models with self-supervised learning techniques have shown good results as a preliminary step, and allow reducing the amount of labels required in follow-up steps. However, to generalize self-supervised learning to document understanding, we need to take into account different modalities: text, layout and image information of documents. How to do that efficiently and effectively is unclear yet. The goal of this thesis is to come up with a technique for self-supervised pretraining within SAP DOX. We will evaluate our method and design decisions against SAP data as well as public data sets. Besides the accuracy of the extracted entities, we will measure to what extent our method lets us lower label requirements.r method lets us lower label requirements.)
    • Wichtigkeit von Merkmalen für die Klassifikation von SAT-Instanzen (Proposal)  + (SAT gehört zu den wichtigsten NP-schweren SAT gehört zu den wichtigsten NP-schweren Problemen der theoretischen Informatik, weshalb die Forschung vor allem daran interessiert ist, besonders effiziente Lösungsverfahren dafür zu finden. Deswegen wird eine Klassifizierung vorgenommen, indem ähnliche Probleminstanzen zu Instanzfamilien gruppiert werden, die man mithilfe von Verfahren des maschinellen Lernens automatisieren will. Die Bachelorarbeit beschäftigt sich unter anderem mit folgenden Themen: Mit welchen (wichtigsten) Eigenschaften kann eine Instanz einer bestimmten Familie zugeordnet werden? Wie erstellt man einen guten Klassifikator für dieses Problem? Welche Gemeinsamkeiten haben Instanzen, die oft fehlklassifiziert werden? Wie sieht eine sinnvolle Familieneinteilung aus?eht eine sinnvolle Familieneinteilung aus?)
    • Verification of Access Control Policies in Software Architectures  + (Security in software systems becomes more Security in software systems becomes more important as systems becomes more complex and connected. Therefore, it is desirable to to conduct security analysis on an architectural level. A possible approach in this direction are data-based privacy analyses. Such approaches are evaluated on case studies. Most exemplary systems for case studies are developed specially for the approach under investigation. Therefore, it is not that simple to find a fitting a case study. The thesis introduces a method to create usable case studies for data-based privacy analyses. The method is applied to the Community Component Modeling Example (CoCoME). The evaluation is based on a GQM plan and shows that the method is applicable. Also it is shown that the created case study is able to check if illegal information flow is present in CoCoME. Additionally, it is shown that the provided meta model extension is able to express the case study.tension is able to express the case study.)
    • Beyond Similarity - Dimensions of Semantics and How to Detect them  + (Semantic similarity estimation is a widelySemantic similarity estimation is a widely used and well-researched area. Current state-of-the-art approaches estimate text similarity with large language models. However, semantic similarity estimation often ignores fine-grain differences between semantic similar sentences. This thesis proposes the concept of semantic dimensions to represent fine-grain differences between two sentences. A workshop with domain experts identified ten semantic dimensions. From the workshop insights, a model for semantic dimensions was created. Afterward, 60 participants decided via a survey which semantic dimensions are useful to users. Detectors for the five most useful semantic dimensions were implemented in an extendable framework. To evaluate the semantic dimensions detectors, a dataset of 200 sentence pairs was created. The detectors reached an average F1 score of 0.815.tors reached an average F1 score of 0.815.)
    • Faster Feedback Cycles via Integration Testing Strategies for Serverless Edge Computing  + (Serverless computing allows software enginServerless computing allows software engineers to develop applications in the cloud without having to manage the infrastructure. The infrastructure is managed by the cloud provider. Therefore, software engineers treat the underlying infrastructure as a black box and focus on the business logic of the application. This lack of inside knowledge leads to an increased testing difficulty as applications tend to be dependent on the infrastructure and other applications running in the cloud environment. While isolated unit and functional testing is possible, integration testing is a challenge, as reliable results are often only achieved after deploying to the deployment environment because infrastructure specifics and other cloud services are only available in the actual cloud environment. This leads to a laborious development process. For this reason, this thesis deals with creating testing strategies for serverless edge computing to reduce feedback cycles and speed up development time. For evaluation, the developed testing strategies are applied to Lambda@Edge in AWS.ategies are applied to Lambda@Edge in AWS.)
    • Influence of Load Profile Perturbation and Temporal Aggregation on Disaggregation Quality  + (Smart Meters become more and more popular.Smart Meters become more and more popular. With Smart Meter, new privacy issues arise. A prominent privacy issue is disaggregation, i.e., the determination of appliance usages from aggregated Smart Meter data. The goal of this thesis is to evaluate load profile perturbation and temporal aggregation techniques regarding their ability to prevent disaggregation. To this end, we used a privacy operator framework for temporal aggregation and perturbation, and the NILM TK framework for disaggregation. We evaluated the influence on disaggregation quality of the operators from the framework individually and in combination. One main observation is that the de-noising operator from the framework prevents disaggregation best.he framework prevents disaggregation best.)
    • Modelling and Enforcing Access Control Requirements for Smart Contracts  + (Smart contracts are software systems emploSmart contracts are software systems employing the underlying blockchain technology to handle transactions in a decentralized and immutable manner. Due to the immutability of the blockchain, smart contracts cannot be upgraded after their initial deploy. Therefore, reasoning about a contract’s security aspects needs to happen before the deployment. One common vulnerability for smart contracts is improper access control, which enables entities to modify data or employ functionality they are prohibited from accessing. Due to the nature of the blockchain, access to data, represented through state variables, can only be achieved by employing the contract’s functions. To correctly restrict access on the source code level, we improve the approach by Reiche et al. who enforce access control policies based on a model on the architectural level.</br>This work aims at correctly enforcing role-based access control (RBAC) policies for Solidity smart contract systems on the architectural and source code level. We extend the standard RBAC model by Sandhu, Ferraiolo, and Kuhn to also incorporate insecure information flows and authorization constraints for roles. We create a metamodel to capture the concepts necessary to describe and enforce RBAC policies on the architectural level. The policies are enforced in the source code by translating the model elements to formal specifications. For this purpose, an automatic code generator is implemented. To reason about the implemented smart contracts on the source code level, tools like solc-verify and Slither are employed and extended. Furthermore, we outline the development process resulting from the presented approach.</br>To evaluate our approach and uncover problems and limitations, we employ a case study using the three smart contract software systems Augur, Fizzy and Palinodia. Additionally, we apply a metamodel coverage analysis to reason about the metamodel’s and the generator’s completeness. Furthermore, we provide an argumentation concerning the approach’s correct enforcement.</br>This evaluation shows how a correct enforcement can be achieved under certain assumptions and when information flows are not considered. The presented approach can detect 100% of manually introduced violations during the case study to the underlying RBAC policies. Additionally, the metamodel is expressive enough to describe RBAC policies and contains no unnecessary elements, since approximately 90% of the created metamodel are covered by the implemented generator. We identify and describe limitations like oracles or public variables.itations like oracles or public variables.)
    • Methodology for Evaluating a Domain-Specific Model Transformation Language  + (Sobald ein System durch mehrere Modelle beSobald ein System durch mehrere Modelle beschrieben wird, können sich diese verschiedenen Beschreibungen auch gegenseitig widersprechen. Modelltransformationen sind ein geeignetes Mittel, um das selbst dann zu vermeiden, wenn die Modelle von mehreren Parteien parallel bearbeitet werden. Es gibt mittlerweile reichhaltige Forschungsergebnisse dazu, Änderungen zwischen zwei Modellen zu transformieren. Allerdings ist die Herausforderung, Modelltransformationen zwischen mehr als zwei Modellen zu entwickeln, bislang unzureichend gelöst. Die Gemeinsamkeiten-Sprache ist eine deklarative, domänenspezifische Programmiersprache, mit der multidirektionale Modelltransformationen programmiert werden können, indem bidirektionale Abbildungsspezifikationen kombiniert werden. Da sie bis jetzt jedoch nicht empirisch validiert wurde, stellt es eine offene Frage dar, ob die Sprache dazu geeignet ist, realistische Modelltransformationen zu entwickeln, und welche Vorteile die Sprache gegenüber einer alternativen Programmiersprache für Modelltransformationen bietet.</br></br>In dieser Abschlussarbeit entwerfe ich eine Fallstudie, mit der die Gemeinsamkeiten-Sprache evaluiert wird. Ich bespreche die Methodik und die Validität dieser Fallstudie. Weiterhin präsentiere ich Kongruenz, eine neue Eigenschaft für bidirektionale Modelltransformationen. Sie stellt sicher, dass die beiden Richtungen einer Transformation zueinander kompatibel sind. Ich leite aus praktischen Beispielen ab, warum wir erwarten können, dass Transformationen normalerweise kongruent sein werden. Daraufhin diskutiere ich die Entwurfsentscheidungen hinter einer Teststrategie, mit der zwei Modelltransformations- Implementierungen, die beide dieselbe Konsistenzspezifikation umsetzen, getestet werden können. Die Teststrategie beinhaltet auch einen praktischen Einsatzzweck von Kongruenz. Zuletzt stelle ich Verbesserungen der Gemeinsamkeiten-Sprache vor.</br></br>Die Beiträge dieser Abschlussarbeit ermöglichen gemeinsam, eine Fallstudie zu Programmiersprachen für Modelltransformationen umzusetzen. Damit kann ein besseres Verständnis der Vorteile dieser Sprachen erzielt werden. Kongruenz kann die Benutzerfreundlichkeit beliebiger Modelltransformationen verbessern und könnte sich als nützlich herausstellen, um Modelltransformations-Netzwerke zu konstruieren. Die Teststrategie kann auf beliebige Akzeptanztests für Modelltransformationen angewendet werden. Modelltransformationen angewendet werden.)
    • Modeling of Security Patterns in Palladio  + (Software itself and the contexts, it is usSoftware itself and the contexts, it is used in, typically evolve over time. Analyzing and ensuring security of evolving software systems in contexts, that are also evolving, poses many difficulties. In my thesis I declared a number of goals and propose processes for the elicitation of attacks, their prerequisites and mitigating security patterns for a given architecture model and for annotation of it with security-relevant information. I showed how this information can be used to analyze the systems security, in regards of modeled attacks, using an attack validity algorithm I specify. Process and algorithm are used in a case study on CoCoME in order to show the applicability of each of them and to analyze the fulfillment of the previously stated goals. Security catalog meta-models and instances of catalogs containing a number of elements have been provided.g a number of elements have been provided.)
    • Multi-model Consistency through Transitive Combination of Binary Transformations  + (Software systems are usually described thrSoftware systems are usually described through multiple models that address different development concerns. These models can contain shared information, which leads to redundant representations of the same information and dependencies between the models. These representations of shared information have to be kept consistent, for the system description to be correct. The evolution of one model can cause inconsistencies with regards to other models for the same system. Therefore, some mechanism of consistency restoration has to be applied after changes occurred. Manual consistency restoration is error-prone and time-consuming, which is why automated consistency restoration is necessary. Many existing approaches use binary transformations to restore consistency for a pair of models, but systems are generally described through more than two models. To achieve multi-model consistency preservation with binary transformations, they have to be combined through transitive execution.</br></br>In this thesis, we explore transitive combination of binary transformations and we study what the resulting problems are. We develop a catalog of six failure potentials that can manifest in failures with regards to consistency between the models. The knowledge about these failure potentials can inform a transformation developer about possible problems arising from the combination of transformations. One failure potential is a consequence of the transformation network topology and the used domain models. It can only be avoided through topology adaptations. Another failure potential emerges, when two transformations try to enforce conflicting consistency constraints. This can only be repaired through adaptation of the original consistency constraints. Both failure potentials are case-specific and cannot be solved without knowing which transformations will be combined. Furthermore, we develop two transformation implementation patterns to mitigate two other failure potentials. These patterns can be applied by the transformation developer to an individual transformation definition, independent of the combination scenario. For the remaining two failure potentials, no general solution was found yet and further research is necessary.</br></br>We evaluate the findings with a case study that involves two independently developed transformations between a component-based software architecture model, a UML class diagram and its Java implementation. All failures revealed by the evaluation could be classified with the identified failure potentials, which gives an initial indicator for the completeness of our failure potential catalog. The proposed patterns prevented all failures of their targeted failure potential, which made up 70% of all observed failures, and shows that the developed implementation patterns are applicable and help to mitigate issues occurring from transitively combining binary transformations.sitively combining binary transformations.)
    • Abstrakte und konsistente Vertraulichkeitsspezifikation von der Architektur bis zum Code  + (Software-Systeme können sensible InformatiSoftware-Systeme können sensible Informationen verarbeiten. Um ihre Vertraulichkeit zu gewährleisten, können sowohl das Architekturmodell, als auch seine Implementierung hinsichtlich des Informationsflusses untersucht werden. Dazu wird eine Vertraulichkeitsspezifikation definiert. Beide Modellebenen besitzen eine Repräsentation der gleichen Spezifikation. Wird das System weiterentwickelt, kann sie sich auf beiden Ebenen verändern und dementsprechend widersprüchliche Aussagen enthalten. Möchte man die Vertraulichkeit der Informationen verifizieren, müssen die Spezifikationselemente im Quellcode in einem zusätzlichen Schritt in eine weitere Sprache übersetzt werden. Die Bachelorarbeit beschäftigt sich mit der Transformation der unterschiedlichen Repräsentationen der Vertraulichkeitsspezifikation eines Software-Systems. Das beinhaltet ein Abbildungskonzept zur Konsistenzhaltung der Vertraulichkeitsspezifikation und die Übersetzung in eine Sprache, die zur Verifikation benutzt werden kann. die zur Verifikation benutzt werden kann.)
    • Automatisiertes GUI-basiertes Testen einer Passwortmanager-Applikation mit Neuroevolution  + (Software-Testing ist essenziell zur GewährSoftware-Testing ist essenziell zur Gewährleistung der Qualität und Funktionalität von Softwareprodukten. Es existieren sowohl manuelle als auch automatisierte Methoden. Allerdings weisen sowohl automatisierte Verfahren als auch menschliche und skriptbasierte Tests bezüglich Kosteneffizienz und Zeitaufwand Einschränkungen auf. Monkey-Testing, gekennzeichnet durch zufällige Klicks auf der Benutzeroberfläche, berücksichtigt dabei oft nicht ausreichend die Logik der Applikation.</br></br>Diese Bachelorarbeit konzentriert sich auf die automatisierte neuroevolutionäre Testmethode, die neuronale Netze als Testagenten nutzt und diese mittels evolutionärer Algorithmen über mehrere Generationen hinweg verfeinert. Zur Evaluierung dieser Agenten und zum Vergleich mit Monkey-Testing wurde eine simulierte Version einer Passwort-Manager Applikation eingesetzt. Dabei wurde eine Belohnungsstruktur innerhalb der simulierten Anwendung implementiert. Die Ergebnisse verdeutlichen, dass das neuroevolutionäre Testverfahren im Hinblick auf die erzielten Belohnungen im Vergleich zum Monkey-Testing signifikant besser performt. Dies führt zu einer besseren Berücksichtigung der Anwendungslogik im Testprozess.tigung der Anwendungslogik im Testprozess.)
    • GUI-basiertes Testen einer Lernplattform-Anwendung durch Nutzung von Neuroevolution  + (Software-Testing ist notwendig, um die QuaSoftware-Testing ist notwendig, um die Qualität und Funktionsfähigkeit von Softwareartefakten sicherzustellen. Es gibt sowohl automatisierte als auch manuelle Testverfahren. Allerdings sind automatisierte Verfahren, sowie menschliches Testen und skriptbasiertes Testen in Bezug auf Zeitaufwand und Kosten weniger gut skalierbar. Monkey-Testing, das durch zufällige Klicks auf der Benutzeroberfläche gekennzeichnet ist, berücksichtigt die Applikationslogik oft nicht ausreichend.</br>Der Fokus dieser Bachelorarbeit liegt auf dem automatisierten neuroevolutionären Testverfahren, das neuronale Netze als Testagenten verwendet und sie mithilfe evolutionärer Algorithmen über mehrere Generationen hinweg verbessert. Um das Training der Agenten zu ermöglichen und den Vergleich zum Monkey-Testing zu ermöglichen, wurde eine simulierte Version der Lernplattform Anki implementiert. Zur Beurteilung der Testagenten wurde eine Belohnungsstruktur in der simulierten Anwendung entwickelt.</br>Die Ergebnisse zeigen, dass das neuroevolutionäre Testverfahren im Vergleich zum Monkey-Testing in Bezug auf erreichte Belohnungen signifikant besser abschneidet. Dadurch wird die Applikationslogik im Testprozess besser berücksichtigt.ogik im Testprozess besser berücksichtigt.)
    • Entity Linking für Softwarearchitekturdokumentation  + (Softwarearchitekturdokumentationen enthaltSoftwarearchitekturdokumentationen enthalten Fachbegriffe aus der Domäne der Softwareentwicklung. Wenn man diese Begriffe findet und zu den passenden Begriffen in einer Datenbank verknüpft, können Menschen und Textverarbeitungssysteme diese Informationen verwenden, um die Dokumentation besser zu verstehen. Die Fachbegriffe in Dokumentationen entsprechen dabei Entitätserwähnungen im Text.</br>In dieser Ausarbeitung stellen wir unser domänenspezifisches Entity-Linking-System vor. Das System verknüpft Entitätserwähnungen innerhalb von Softwarearchitekturdokumentationen zu den zugehörigen Entitäten innerhalb einer Wissensbasis. </br>Das System enthält eine domänenspezifische Wissensbasis, ein Modul zur Vorverarbeitung und ein Entity-Linking-System.erarbeitung und ein Entity-Linking-System.)
    • Entwicklung einer Entwurfszeit-DSL zur Formalisierung von Runtime Adaptationsstrategien für SAS zum Zweck der Strategie-Optimierung  + (Softwaresysteme der heutigen Zeit werden zSoftwaresysteme der heutigen Zeit werden zunehmend komplexer und unterliegen immer</br>mehr variierenden Bedingungen. </br>Dadurch gewinnen selbst-adaptive Systeme an Bedeutung, da diese sich neuen Bedingungen dynamisch anpassen können, indem sie Veränderungen an sich selbst vornehmen. </br>Domänenspezifische Modellierungssprachen (DSL) zur Formalisierung von Adaptionsstrategien stellen ein wichtiges Mittel dar, um den Entwurf von Rückkopplungsschleifen selbst-adaptiver Softwaresysteme zu modellieren und zu optimieren. </br>Hiermit soll eine Bachelorarbeit vorgeschlagen werden, die sich mit der Fragestellung befasst, wie eine Optimierung von Adaptionsstrategien in einer DSL zur Entwurfszeit beschrieben werden kann. zur Entwurfszeit beschrieben werden kann.)
    • Preventing Code Insertion Attacks on Token-Based Software Plagiarism Detectors  + (Some students tasked with mandatory prograSome students tasked with mandatory programming assignments lack the time or dedication to solve the assignment themselves. Instead, they plagiarize a peer’s solution by slightly modifying the code. However, there exist numerous tools that assist in detecting these kinds of plagiarism. These tools can be used by instructors to identify plagiarized programs. The most used type of plagiarism detection tools is token-based plagiarism detectors. They are resilient against many types of obfuscation attacks, such as renaming variables or whitespace modifications. However, they are susceptible to inserting lines of code that do not affect the program flow or result.</br>The current working assumption was that the successful obfuscation of plagiarism takes more effort and skill than solving the assignment itself. This assumption was broken by automated plagiarism generators, which exploit this weakness. This work aims to develop mechanisms against code insertions that can be directly integrated into existing token-based plagiarism detectors. For this, we first develop mechanisms to negate the negative effect of many types of code insertion. Then we implement these mechanisms prototypically into a state-of-the-art plagiarism detector. We evaluate our implementation by running it on a dataset consisting of real student submissions and automatically generated plagiarism. We show that with our mechanisms, the similarity rating of automatically generated plagiarism increases drastically. Consequently, the plagiarism generator we use fails to create usable plagiarisms.we use fails to create usable plagiarisms.)
    • Software Plagiarism Detection on Intermediate Representation  + (Source code plagiarism is a widespread proSource code plagiarism is a widespread problem in computer science education. To counteract this, software plagiarism detectors can help identify plagiarized code. Most state-of-the-art plagiarism detectors are token-based. It is common to design and implement a new dedicated language module to support a new programming language. This process can be time-consuming, furthermore, it is unclear whether it is even necessary. In this thesis, we evaluate the necessity of dedicated language modules for Java and C/C++ and derive conclusions for designing new ones. To achieve this, we create a language module for the intermediate representation of LLVM. For the evaluation, we compare it to two existing dedicated language modules in JPlag. While our results show that dedicated language modules are better for plagiarism detection, language modules for intermediate representations show better resilience to obfuscation attacks. better resilience to obfuscation attacks.)
    • Portables Auto-Tuning paralleler Anwendungen  + (Sowohl Offline- als auch Online-Tuning steSowohl Offline- als auch Online-Tuning stellen gängige Lösungen zur automatischen Optimierung von parallelen Anwendungen dar. Beide Verfahren haben ihre individuellen Vor- und Nachteile: das Offline-Tuning bietet minimalen negativen Einfluss auf die Laufzeiten der Anwendung, die getunten Parameterwerte sind allerdings nur auf im Voraus bekannter Hardware verwendbar. Online-Tuning hingegen bietet dynamische Parameterwerte, die zur Laufzeit der Anwendung und damit auf der Zielhardware ermittelt werden, dies kann sich allerdings negativ auf die Laufzeit der Anwendung ausüben.</br>Wir versuchen die Vorteile beider Ansätze zu verschmelzen, indem im Voraus optimierte Parameterkonfigurationen auf der Zielhardware, sowie unter Umständen mit einer anderen Anwendung, verwendet werden. Wir evaluieren sowohl die Hardware- als auch die Anwendungsportabilität der Konfigurationen anhand von fünf Beispielanwendungen.ionen anhand von fünf Beispielanwendungen.)
    • DomainML: A modular framework for domain knowledge-guided machine learning  + (Standard, data-driven machine learning appStandard, data-driven machine learning approaches learn relevant patterns solely from data. In some fields however, learning only from data is not sufficient. A prominent example for this is healthcare, where the problem of data insufficiency for rare diseases is tackled by integrating high-quality domain knowledge into the machine learning process.</br></br>Despite the existing work in the healthcare context, making general observations about the impact of domain knowledge is difficult, as different publications use different knowledge types, prediction tasks and model architectures. It further remains unclear if the findings in healthcare are transferable to other use-cases, as well as how much intellectual effort this requires.</br></br>With this Thesis we introduce DomainML, a modular framework to evaluate the impact of domain knowledge on different data science tasks. We demonstrate the transferability and flexibility of DomainML by applying the concepts from healthcare to a cloud system monitoring. We then observe how domain knowledge impacts the model’s prediction performance across both domains, and suggest how DomainML could further be used to refine both the given domain knowledge as well as the quality of the underlying dataset. as the quality of the underlying dataset.)
    • State of the Art: Multi Actor Behaviour and Dataflow Modelling for Dynamic Privacy  + (State of the Art Vortrag im Rahmen der Praxis der Forschung.)
    • Data-Preparation for Machine-Learning Based Static Code Analysis  + (Static Code Analysis (SCA) has become an iStatic Code Analysis (SCA) has become an integral part of modern software development, especially since the rise of automation in the form of CI/CD. It is an ongoing question of how machine learning can best help improve SCA's state and thus facilitate maintainable, correct, and secure software. However, machine learning needs a solid foundation to learn on. This thesis proposes an approach to build that foundation by mining data on software issues from real-world code. We show how we used that concept to analyze over 4000 software packages and generate over two million issue samples. Additionally, we propose a method for refining this data and apply it to an existing machine learning SCA approach.an existing machine learning SCA approach.)
    • Creating Study Plans by Generating Workflow Models from Constraints in Temporal Logic  + (Students are confronted with a huge amountStudents are confronted with a huge amount of regulations when planning their studies at a university. It is challenging for them to create a personalized study plan while still complying to all official rules. The STUDYplan software aims to overcome the difficulties by enabling an intuitive and individual modeling of study plans. A study plan can be interpreted as a sequence of business process tasks that indicate courses to make use of existing work in the business process domain. This thesis focuses on the idea of synthesizing business process models from declarative specifications that indicate official and user-defined regulations for a study plan. We provide an elaborated approach for the modeling of study plan constraints and a generation concept specialized to study plans. This work motivates, discusses, partially implements and evaluates the proposed approach.ments and evaluates the proposed approach.)
    • A comparative study of subgroup discovery methods  + (Subgroup discovery is a data mining techniSubgroup discovery is a data mining technique that is used to extract interesting relationships in a dataset related to to a target variable. These relationships are described in the form of rules. Multiple SD techniques have been developed over the years. This thesis establishes a comparative study between a number of these techniques in order to identify the state-of-the-art methods. It also analyses the effects discretization has on them as a preprocessing step . Furthermore, it investigates the effect of hyperparameter optimization on these methods. </br></br>Our analysis showed that PRIM, DSSD, Best Interval and FSSD outperformed the other subgroup discovery methods evaluated in this study and are to be considered state-of-the-art . It also shows that discretization offers an efficiency improvement on methods that do not employ internal discretization. It has a negative impact on the quality of subgroups generated by methods that perform it internally. The results finally demonstrates that Apriori-SD and SD-Algorithm were the most positively affected by the hyperparameter optimization.fected by the hyperparameter optimization.)
    • Software Testing  + (TBA)
    • Exploring Modern IDE Functionalities for Consistency Preservation  + (TBA)
    • Exploring the Traceability of Requirements and Source Code via LLMs  + (TBA)
    • Preventing Refactoring Attacks on Software Plagiarism Detection through Graph-Based Structural Normalization  + (TBD)
    • Generation of Checkpoints for Hardware Architecture Simulators  + (TBD)
    • Konzept und Integration eines Deltachain Prototyps  + (TBD)
    • Data-Driven Approaches to Predict Material Failure and Analyze Material Models  + (Te prediction of material failure is usefuTe prediction of material failure is useful in many industrial contexts such as predictive maintenance, where it helps reducing costs by preventing outages. However, failure prediction is a complex task. Typically, material scientists need to create a physical material model to run computer simulations. In real-world scenarios, the creation of such models is ofen not feasible, as the measurement of exact material parameters is too expensive. Material scientists can use material models to generate simulation data. Tese data sets are multivariate sensor value time series. In this thesis we develop data-driven models to predict upcoming failure of an observed material. We identify and implement recurrent neural network architectures, as recent research indicated that these are well suited for predictions on time series. We compare the prediction performance with traditional models that do not directly predict on time series but involve an additional step of feature calculation. Finally, we analyze the predictions to fnd abstractions in the underlying material model that lead to unrealistic simulation data and thus impede accurate failure prediction. Knowing such abstractions empowers material scientists to refne the simulation models. The updated models would then contain more relevant information and make failure prediction more precise. and make failure prediction more precise.)
    • Improving SAP Document Information Extraction via Pretraining and Fine-Tuning  + (Techniques for extracting relevant informaTechniques for extracting relevant information from documents have made significant progress in recent years and became a key task in the digital transformation. With deep neural networks, it became possible to process documents without specifying hard-coded extraction rules or templates for each layout. However, such models typically have a very large number of parameters. As a result, they require many annotated samples and long training times. One solution is to create a basic pretrained model using self-supervised objectives and then to fine-tune it using a smaller document-specific annotated dataset. However, implementing and controlling the pretraining and fine-tuning procedures in a multi-modal setting is challenging. In this thesis, we propose a systematic method that consists in pretraining the model on large unlabeled data and then to fine-tune it with a virtual adversarial training procedure. For the pretraining stage, we implement an unsupervised informative masking method, which improves upon standard Masked-Language Modelling (MLM). In contrast to randomly masking tokens like in MLM, our method exploits Point-Wise Mutual Information (PMI) to calculate individual masking rates based on statistical properties of the data corpus, e.g., how often certain tokens appear together on a document page. We test our algorithm in a typical business context at SAP and report an overall improvement of 1.4% on the F1-score for extracted document entities. Additionally, we show that the implemented methods improve the training speed, robustness and data-efficiency of the algorithm.ness and data-efficiency of the algorithm.)
    • Analyse von Zeitreihen-Kompressionsmethoden am Beispiel von Google N-Grams  + (Temporal text corpora like the Google NgraTemporal text corpora like the Google Ngram dataset usually incorporate a vast number of words and expressions, called ngrams, and their respective usage frequencies over the years. The large quantity of entries complicates working with the dataset, as transformations and queries are resource and time intensive. However, many use-cases do not require the whole corpus to have a sufficient dataset and achieve acceptable results. We propose various compression methods to reduce the absolute number of ngrams in the corpus. Additionally, we utilize time-series compression methods for quick estimations about the properties of ngram usage frequencies. As basis for our compression method design and experimental validation serve CHQL (Conceptual History Query Language) queries on the Google Ngram dataset. The goal is to find compression methods that reduce the complexity of queries on the corpus while still maintaining good results.rpus while still maintaining good results.)
    • Analyse von Zeitreihen-Kompressionsmethoden am Beispiel von Google N-Gram  + (Temporal text corpora like the Google NgraTemporal text corpora like the Google Ngram Data Set usually incorporate a vast number of words and expressions, called ngrams, and their respective usage frequencies over the years. The large quantity of entries complicates working with the data set, as transformations and queries are resource and time intensive. However, many use cases do not require the whole corpus to have a sufficient data set and achieve acceptable query results. We propose various compression methods to reduce the total number of ngrams in the corpus. Specially, we propose compression methods that, given an input dictionary of target words, find a compression tailored for queries on a specific topic. Additionally, we utilize time-series compression methods for quick estimations about the properties of ngram usage frequencies. As basis for our compression method design and experimental validation serve CHQL (Conceptual History Query Language) queries on the Google Ngram Data Set.age) queries on the Google Ngram Data Set.)
    • Implementation and Evaluation of CHQL Operators in Relational Database Systems  + (The IPD defined CHQL, a query algebra thatThe IPD defined CHQL, a query algebra that enables to formalize queries about conceptual history. CHQL is currently implemented in MapReduce which offers less flexibility for query optimization than relational database systems does. The scope of this thesis is to implement the given operators in SQL and analyze performance differences by identifying limiting factors and query optimization on the logical and physical level. At the end, we will provide efficient query plans and fast operator implementations to execute CHQL queries in relational database systems.QL queries in relational database systems.)
    • The Kconfig Variability Framework as a Feature Model  + (The Kconfig variability framework is used The Kconfig variability framework is used to develop highly variable software such as the Linux kernel, ZephyrOS and NuttX. Kconfig allows developers to break down their software in modules and define the dependencies between these modules, so that when a concrete configuration is created, the semantic dependencies between the selected modules are fulfilled, ensuring that the resulting software product can function. Kconfig has often been described as a tool of define software product lines (SPLs), which often occur within the context of feature-oriented programming (FOP). In this paper, we introduce methods to transform Kconfig files into feature models so that the semantics of the model defined in a Kconfig file are preserved. The resulting feature models can be viewed with FeatureIDE, which allows the further analysis of the Kconfig file, such as the detection of redundant dependencies and cyclic dependencies.dant dependencies and cyclic dependencies.)
    • Review of data efficient dependency estimation  + (The amount and complexity of data collecteThe amount and complexity of data collected in the industry is increasing, and data analysis rises in importance. Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.</br>There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.</br>Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.</br>Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.</br></br>Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.</br>As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.</br></br>In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing these challenges and more. We partly developed the criteria our self as well as took them from relevant publications. The existing publications formulated many of the criteria only qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.</br></br>From 14 selected criteria, we focus on criteria concerning data efficiency and uncertainty estimation, because they are essential for lowering the cost of dependency estimation, but we will also check other criteria relevant for the application of algorithms.</br>As a result, we will rank the algorithms in the different aspects given by the criteria, and thereby identify potential for improvement of the current algorithms.</br></br>We do this in two steps, first we check general criteria in a qualitative analysis. For this we check if the algorithm is capable of guided sampling, if it is an anytime algorithm and if it uses incremental computation to enable early stopping, which all leads to more data efficiency.</br></br>We also conduct a quantitative analysis on well-established and representative datasets for the dependency estimation algorithms, that performed well in the qualitative analysis.</br>In these experiments we evaluate more criteria:</br>The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.hich ensures we can rely on an estimation.)
    • Identifying Security Requirements in Natural Language Documents  + (The automatic identification of requiremenThe automatic identification of requirements, and their classification according to their security objectives, can be helpful to derive insights into the security of a given system. However, this task requires significant security expertise to perform. In this thesis, the capability of modern Large Language Models (such as GPT) to replicate this expertise is investigated. This requires the transfer of the model's understanding of language to the given specific task. In particular, different prompt engineering approaches are combined and compared, in order to gain insights into their effects on performance. GPT ultimately performs poorly for the main tasks of identification of requirements and of their classification according to security objectives. Conversely, the model performs well for the sub-task of classifying the security-relevance of requirements. Interestingly, prompt components influencing the format of the model's output seem to have a higher performance impact than components containing contextual information.ponents containing contextual information.)
    • Predicting System Dependencies from Tracing Data Instead of Computing Them  + (The concept of Artificial Intelligence forThe concept of Artificial Intelligence for IT Operations combines big data and machine learning methods to replace a broad range of IT operations including availability and performance monitoring of services. In large-scale distributed cloud infrastructures a service is deployed on different separate nodes. As the size of the infrastructure increases in production, the analysis of metrics parameters becomes computationally expensive. We address the problem by proposing a method to predict dependencies between metrics parameters of system components instead of computing them. To predict the dependencies we use time windowing with different aggregation methods and distributed tracing data that contain detailed information for the system execution workflow. In this bachelor thesis, we inspect the different representations of distributed traces from simple counting of events to more complex graph representations. We compare them with each other and evaluate the performance of such methods. evaluate the performance of such methods.)
    • Change Detection in High Dimensional Data Streams  + (The data collected in many real-world scenThe data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect "when" a drift occurs and neglect the fact that various drifts may occur in different dimensions, i.e., they do not detect "where" a drift occurs. This is particularly problematic when data streams are high-dimensional. </br></br>The goal of this Master’s thesis is to develop and evaluate a framework to efficiently and effectively detect “when” and “where” concept drift occurs in high-dimensional data streams. We introduce stream autoencoder windowing (SAW), an approach based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems compared to benchmarks on real data streams.mpared to benchmarks on real data streams.)
    • Automated Test Selection for CI Feedback on Model Transformation Evolution  + (The development of the transformation modeThe development of the transformation model also comes with the appropriate system-level testing to verify its changes. Due to the complex nature of the transformation model, the number of tests increases as the structure and feature description become more detailed. However, executing all test cases for every change is costly and time-consuming. Thus, it is necessary to conduct a selection for the transformation tests. In this presentation, you will be introduced to a change-based test prioritization and transformation test selection approach for early fault detection.ection approach for early fault detection.)
    • Statistical Generation of High Dimensional Data Streams with Complex Dependencies  + (The evaluation of data stream mining algorThe evaluation of data stream mining algorithms is an important task in current research. The lack of a ground truth data corpus that covers a large number of desireable features (especially concept drift and outlier placement) is the reason why researchers resort to producing their own synthetic data. This thesis proposes a novel framework ("streamgenerator") that allows to create data streams with finely controlled characteristics. The focus of this work is the conceptualization of the framework, however a prototypical implementation is provided as well. We evaluate the framework by testing our data streams against state-of-the-art dependency measures and outlier detection algorithms.measures and outlier detection algorithms.)
    • Statistical Generation of High-Dimensional Data Streams with Complex Dependencies  + (The extraction of knowledge from data streThe extraction of knowledge from data streams is one of the most crucial tasks of modern day data science. Due to their nature data streams are ever evolving and knowledge derrived at one point in time may be obsolete in the next period. The need for specialized algorithms that can deal with high-dimensional data streams and concept drift is prevelant.</br></br>A lot of research has gone into creating these kind of algorithms. The problem here is the lack of data sets with which to evaluate them. A ground truth for a common evaluation approach is missing. A solution to this could be the synthetic generation of data streams with controllable statistical propoerties, such as the placement of outliers and the subspaces in which special kinds of dependencies occur. The goal of this Bachelor thesis is the conceptualization and implementation of a framework which can create high-dimensional data streams with complex dependencies.al data streams with complex dependencies.)
    • Theory-guided Load Disaggregation in an Industrial Environment  + (The goal of Load Disaggregation (or Non-inThe goal of Load Disaggregation (or Non-intrusive Load Monitoring) is to infer the energy consumption of individual appliances from their aggregated consumption. This facilitates energy savings and efficient energy management, especially in the industrial sector.</br></br>However, previous research showed that Load Disaggregation underperforms in the industrial setting compared to the household setting. Also, the domain knowledge available about industrial processes remains unused.</br></br>The objective of this thesis was to improve load disaggregation algorithms by incorporating domain knowledge in an industrial setting. First, we identified and formalized several domain knowledge types that exist in the industry. Then, we proposed various ways to incorporate them into the Load Disaggregation algorithms, including Theory-Guided Ensembling, Theory-Guided Postprocessing, and Theory-Guided Architecture. Finally, we implemented and evaluated the proposed methods.mented and evaluated the proposed methods.)
    • Tuning of Explainable ArtificialIntelligence (XAI) tools in the field of textanalysis  + (The goal of this bachelor thesis was to anThe goal of this bachelor thesis was to analyse classification results using a 2017 published method called shap. Explaining how an artificial neural network makes a decision is an interdisciplinary research subject combining computer science, math, psychology and philosophy. We analysed these explanations from a psychological standpoint and after presenting our findings we will propose a method to improve the interpretability of text explanations using text-hierarchies, without loosing much/any accuracy. Secondary, the goal was to test out a framework developed to analyse a multitude of explanation methods. This Framework will be presented next to our findings and how to use it to create your own analysis. This Bachelor thesis is addressed at people familiar with artificial neural networks and other machine learning methods.tworks and other machine learning methods.)
    • Specifying and Maintaining the Correspondence between Architecture Models and Runtime Observations  + (The goal of this thesis is to provide a geThe goal of this thesis is to provide a generic concept of a correspondence model (CM) to map high-level model elements to corresponding low-level model elements and to generate this mapping during implementation of the high-level model using a correspondence model generator (CGM). In order to evaluate our approach, we implement and integrate the CM for the iObserve project. Further we implement the proposed CMG and integrate it into ProtoCom, the source code generator used by the iObserve project. We first evaluate the feasibility of this approach by checking whether such a correspondence model can be specified as desired and generated by the CGM. Secondly, we evaluate the accuracy of the approach by checking the generated correspondences against a reference model.correspondences against a reference model.)
    • Intelligent Match Merging to Prevent Obfuscation Attacks on Software Plagiarism Detectors  + (The increasing number of computer science The increasing number of computer science students has prompted educators to rely on state-of-the-art source code plagiarism detection tools to deter the submission of plagiarized coding assignments. While these token-based plagiarism detectors are inherently resilient against simple obfuscation attempts, recent research has shown that obfuscation tools empower students to easily modify their submissions, thus evading detection. These tools automatically use dead code insertion and statement reordering to avoid discovery. The emergence of ChatGPT has further raised concerns about its obfuscation capabilities and the need for effective mitigation strategies.</br>Existing defence mechanisms against obfuscation attempts are often limited by their specificity to certain attacks or dependence on programming languages, requiring tedious and error-prone reimplementation. In response to this challenge, this thesis introduces a novel defence mechanism against automatic obfuscation attacks called match merging. It leverages the fact that obfuscation attacks change the token sequence to split up matches between two submissions so that the plagiarism detector discards the broken matches. Match merging reverts the effects of these attacks by intelligently merging neighboring matches based on a heuristic designed to minimize false positives.</br>Our method’s resilience against classic obfuscation attacks is demonstrated through evaluations on diverse real-world datasets, including undergrad assignments and competitive coding challenges, across six different attack scenarios. Moreover, it significantly improves detection performance against AI-based obfuscation. What sets our method apart is its language- and attack-independence while its minimal runtime overhead makes it seamlessly compatible with other defence mechanisms. compatible with other defence mechanisms.)
    • Efficient k-NN Search of Time Series in Arbitrary Time Intervals  + (The k nearest neighbors (k-NN) of a time sThe k nearest neighbors (k-NN) of a time series are the k closest sequences within a</br>dataset regarding a distance measure. Often, not the entire time series, but only specific</br>time intervals are of interest, e.g., to examine phenomena around special events. While</br>numerous indexing techniques support the k-NN search of time series, none of them</br>is designed for an efficient interval-based search. This work presents the novel index</br>structure Time Series Envelopes Index Tree (TSEIT), that significantly speeds up the k-NN</br>search of time series in arbitrary user-defined time intervals. in arbitrary user-defined time intervals.)
    • Reinforcement Learning for Solving the Knight’s Tour Problem  + (The knight’s tour problem is an instance oThe knight’s tour problem is an instance of the Hamiltonian path problem that is a typical NP-hard problem. A knight makes L-shape moves on a chessboard and tries to visit all the squares exactly once. The tour is closed if a knight can finish a complete tour and end on a square that is a neighbourhood of its starting square; Otherwise, it is open. Many algorithms and heuristics have been proposed to solve this problem. The most well-known one is warnsdorff’s heuristic. Warnsdorff’s idea is to move to the square with the fewest possible moves in a greedy fashion. Although this heuristic is fast, it does not always return a closed tour. Also, it only works on boards of certain dimensions. Due to its greedy behaviour, it can get stuck into a local optimum easily. That is similar to the other existing approaches. Our goal in this thesis is to come up with a new strategy based on reinforcement learning. Ideally, it should be able to find a closed tour on chessboards of any size. We will consider several approaches: value-based methods, policy optimization and actor-critic methods. Compared to previous work, our approach is non-deterministic and sees the problem as a single-player game with a tradeoff between exploration and exploitation. We will evaluate the effectiveness and efficiency of the existing methods and new heuristics.f the existing methods and new heuristics.)
    • Discovering data-driven Explanations  + (The main goal knowledge discovery focussesThe main goal knowledge discovery focusses is, an increase of knowledge using some set of data. In many cases it is crucial that results are human-comprehensible. Subdividing the feature space into boxes with unique characteristics is a commonly used approach for achieving this goal. The patient-rule-induction method (PRIM) extracts such "interesting" hyperboxes from a dataset by generating boxes that maximize some class occurrence inside of it. However, the quality of the results varies when applied to small datasets. This work will examine to which extent data-generators can be used to artificially increase the amount of available data in order to improve the accuracy of the results. Secondly, it it will be tested if probabilistic classification can improve the results when using generated data.ove the results when using generated data.)
    • Conception and Design of Privacy-preserving Software Architecture Templates  + (The passing of new regulations like the EuThe passing of new regulations like the European GDPR has clarified that in the future it will be necessary to build privacy-preserving systems to protect the personal data of its users. This thesis will introduce the concept of privacy templates to help software designers and architects in this matter. Privacy templates are at their core similar to design patterns and provide reusable and general architectural structures which can be used in the design of systems to improve privacy in early stages of design. In this thesis we will conceptualize a small collection of privacy templates to make it easier to design privacy-preserving software systems. Furthermore, the privacy templates will be categorized and evaluated to classify them and assess their quality across different quality dimensions.ality across different quality dimensions.)
    • Modellierung und Verifikation von Mehrgüterauktionen als Workflows am Beispiel eines Auktionsdesigns  + (The presentation will be in English. Die ZThe presentation will be in English.</br>Die Zielsetzung in dieser Arbeit war die Entwicklung eines Systems zur Verifikation von Mehrgüterauktionen als Workflows am Beispiel eines Auktionsdesigns. Aufbauend auf diversen Vorarbeiten wurde in dieser Arbeit das Clock-Proxy Auktionsdesign als Workflow modelliert und zur Verifikation mit Prozessverifikationsmethoden vorbereitet. Es bestehen bereits eine Vielzahl an Analyseansätzen für Auktionsdesign, die letztendlich aber auf wenig variierbaren Modellen basieren. Für komplexere Auktionsverfahren, wie Mehrgüterauktionen, die in dieser Arbeit betrachtet wurden, liefern diese Ansätze keine zufriedenstellenden Möglichkeiten. Basierend auf den bereits bestehenden Verfahren wurde ein Ansatz entwickelt, dessen Schwerpunkt auf der datenzentrierten Erweiterung der Modellierung und der Verifikationsansätze liegt. Im ersten Schritt wurden daher die Regeln und Daten in das Workflowmodell integriert. Die Herausforderung bestand darin, den Kontroll-und Datenfluss sowie die Daten und Regeln aus dem Workflowmodell über einen Algorithmus zu extrahieren und bestehende Transformationsalgorithmen hinreichend zu erweitern. Die Evaluation des Ansatzes zeigt, dass die Arbeit mit der entwickelten Software das globale Ziel, einen Workflow mittels Eigenschaften zu verifizieren, erreicht hat.genschaften zu verifizieren, erreicht hat.)
    • Measuring the Privacy Loss with Smart Meters  + (The rapid growth of renewable energy sourcThe rapid growth of renewable energy sources and the increased sales in</br>electric vehicels contribute to a more volatile power grid. Energy suppliers</br>rely on data to predict the demand and to manage the grid accordingly.</br>The rollout of smart meters could provide the necessary data. But on the</br>other hand, smart meters can leak sensitive information about the customer.</br>Several solution were proposed to mitigate this problem. Some depend on</br>privacy measures to calculate the degree of privacy one could expect from a</br>solution. This bachelor thesis constructs a set of experiments which help to</br>analyse some privacy measures and thereby determine, whether the value of</br>a privacy measure increases or decreases with an increase in privacy. or decreases with an increase in privacy.)
    • Standardized Real-World Change Detection Data  + (The reliable detection of change points isThe reliable detection of change points is a fundamental task when analysing data across many fields, e.g., in finance, bioinformatics, and medicine. </br>To define “change points”, we assume that there is a distribution, which may change over time, generating the data we observe. A change point then is a change in this underlying distribution, i.e., the distribution coming before a change point is different from the distribution coming after. The principled way to compare distributions, and to find change points, is to employ statistical tests.</br></br>While change point detection is an unsupervised problem in practice, i.e., the data is unlabelled, the development and evaluation of data analysis algorithms requires labelled data. </br>Only few labelled real world data sets are publicly available and many of them are either too small or have ambiguous labels. Further issues are that reusing data sets may lead to overfitting, and preprocessing (e.g., removing outliers) may manipulate results.</br>To address these issues, van den Burg et al. publish 37 data sets annotated by data scientists and ML researchers and use them for an assessment of 14 change detection algorithms. </br>Yet, there remain concerns due to the fact that these are labelled by hand: Can humans correctly identify changes according to the definition, and can they be consistent in doing so?</br></br>The goal of this Bachelor's thesis is to algorithmically label their data sets following the formal definition and to also identify and label larger and higher-dimensional data sets, thereby extending their work.</br>To this end, we leverage a non-parametric hypothesis test which builds on Maximum Mean Discrepancy (MMD) as a test statistic, i.e., we identify changes in a principled way. </br>We will analyse the labels so obtained and compare them to the human annotations, measuring their consistency with the F1 score. </br>To assess the influence of the algorithmic and definition-conform annotations, we will use them to reevaluate the algorithms of van den Burg et al. and compare the respective performances.. and compare the respective performances.)
    • Standardized Real-World Change Detection Data Defense  + (The reliable detection of change points isThe reliable detection of change points is a fundamental task when analyzing data across many fields, e.g., in finance, bioinformatics, and medicine.</br>To define “change points”, we assume that there is a distribution, which may change over time, generating the data we observe. </br>A change point then is a change in this underlying distribution, i.e., the distribution coming before a change point is different from the distribution coming after. </br>The principled way to compare distributions, and thus to find change points, is to employ statistical tests.</br></br>While change point detection is an unsupervised problem in practice, i.e., the data is unlabeled, the development and evaluation of data analysis algorithms requires labeled data. Only a few labeled real-world data sets are publicly available, and many of them are either too small or have ambiguous labels. Further issues are that reusing data sets may lead to overfitting, and preprocessing may manipulate results. To address these issues, Burg et al. publish 37 data sets annotated by data scientists and ML researchers and assess 14 change detection algorithms on them. </br>Yet, there remain concerns due to the fact that these are labeled by hand: Can humans correctly identify changes according to the definition, and can they be consistent in doing so?n, and can they be consistent in doing so?)
    • Assessing Word Similarity Metrics For Traceability Link Recovery  + (The software development process usually iThe software development process usually involves different artifacts that each describe different parts of the whole software system. Traceability Link Recovery is a technique that aids the development process by establishing relationships between related parts from different artifacts. Artifacts that are expressed in natural language are more difficult for machines to understand and therefore pose a challenge to this link recovery process. A common approach to link elements from different artifacts is to identify similar words using word similarity measures. ArDoCo is a tool that uses word similarity measures to recover trace links between natural language software architecture documentation and formal architectural models. This thesis assesses the effect of different word similarity measures on ArDoCo. The measures are evaluated using multiple case studies. Precision, recall, and encountered challenges for the different measures are reported as part of the evaluation.es are reported as part of the evaluation.)
    • Feedback Mechanisms for Smart Systems  + (The talk will be held remotely from ZurichThe talk will be held remotely from Zurich at https://global.gotomeeting.com/join/935923965 and will be streamed to room 348. You can attend via GotoMeeting or in person in room 348. </br></br>Feedback mechanisms have not yet been sufficiently researched in the context of smart systems. From the research and the industrial perspective, this motivates for investigations on how users could be supported to provide appropriate feedback in the context of smart systems. A key challenge for providing such feedback means in the smart system context might be to understand and consider the needs of smart system users for communicating their feedback.</br></br>Thesis Goal: The goal of this thesis is the creation of innovative feedback mechanisms, that are tailored to a specific context within the domain of smart systems. Already existing feedback mechanisms for software in general and smart systems in particular will be assessed and the users´ needs regarding those mechanisms will be examined. Based on this, improved feedback mechanisms will be developed, either by improving on existing ones or by inventing and implementing new concepts. The overall aim of these innovative feedback mechanisms is to enable smart system users to effectively</br>and efficiently give feedback in the context of smart systems. feedback in the context of smart systems.)
    • Flexible User-Friendly Trip Planning Queries  + (The users of the location-based services oThe users of the location-based services often want to find short routes that pass through multiple Points-of-Interest (PoIs); consequently, developing trip planning queries that can find the shortest routes that passes through user-specified categories has attracted considerable attention. If multiple PoI categories, e.g., restaurant and shopping mall, are in an ordered list (i.e., a category sequence), the trip planning query searches for a sequenced route that passes PoIs that match the user-specified categories in order.</br>Existing approaches find the shortest route based on the user query. A major problem with the existing approaches is that they only take the order of POIs and</br>output the routes which match the sequence perfectly. However, users who they are interested in applying more constraints, like considering the hierarchy of the POIs</br>and the relationship among sequence points, could not express their wishes in the</br>form of query users. Example below, illustrates the problem: </br></br>Example: A user is interested in visiting three department stores (DS) but she needs</br>to have some food after each visit. It is important for the user to visit three different</br>department stores but the restaurants could be the same. How could the user, express her needs to a trip planning system?</br></br>The topic of this bachelor thesis is to design such a language for trip planning system which enables the user to express her needs in the form of user queries in a</br>flexible manner.form of user queries in a flexible manner.)
    • Development and evaluation of efficient kNN search of time series subsequences using the example of the Google Ngram data set  + (There are many data structures and indicesThere are many data structures and indices that speed up kNN queries on time series. The existing indices are designed to work on the full time series only. In this thesis we develop a data structure that allows speeding up kNN queries in an arbitrary time range, i.e. for an arbitrary subsequence. range, i.e. for an arbitrary subsequence.)
    • Evaluation of a Reverse Engineering Approach in the Context of Component-Based Software Systems  + (This thesis aims to evaluate the componentThis thesis aims to evaluate the component architecture generated by component-based software systems after reverse engineering. The evaluation method involves performing a manual analysis of the respective software systems and then comparing the component architecture obtained through the manual analysis with the results of reverse engineering. The goal is to evaluate a number of parameters, with a focus on correctness, related to the results of reverse engineering. This thesis presents the specific steps and considerations involved in manual analysis. It will also perform manual analysis on selected software systems that have already undergone reverse engineering analysis and compare the results to evaluate the differences between reverse engineering and ground truth. In summary, this paper evaluates the accuracy of reverse engineering by contrasting manual analysis with reverse engineering in the analysis of software systems, and provides some direction and support for the future development of reverse engineering.future development of reverse engineering.)
    • Blueprint for the Transition from Static to Dynamic Deployment  + (This thesis defnes a blueprint describing This thesis defnes a blueprint describing a successful ad-hoc deployment with generally applicable rules, thus providing a basis for further developments. The blueprint itself is based on the experience of developing a Continuous Deployment system, the subsequent tests and the continuous user feedback. In order to evaluate the blueprint, the blueprint-based dynamic system was compared with the previously static deployment and a user survey was conducted. The result of the study shows that the rules described in the blueprint have far-reaching consequences and generate an additional value for the users during deployment.nal value for the users during deployment.)
    • Developing a Database Application to Compare the Google Books Ngram Corpus to German News Corpora  + (This thesis focuses on the development of This thesis focuses on the development of a database application that enables a comparative analysis between the Google Books Ngram Corpus(GBNC) and a German news corpora. The GBNC provides a vast collection of books spanning various time periods, while the German news corpora encompass up-to-date linguistic data from news sources. Such comparison aims to uncover insights into language usage patterns, linguistic evolution, and cultural shifts within the German language.</br>Extracting meaningful insights from the compared corpora requires various linguistic metrics, statistical analyses and visualization techniques. By identifying patterns, trends and linguistic changes we can uncover valuable information on language usage evolution over time.</br>This thesis provides a comprehensive framework for comparing the GBNC to other corpora, showcasing the development of a database application that enables not only valuable linguistic analyses but also shed light on the composition of the GBNC by highlighting linguistic similarities and differences.g linguistic similarities and differences.)
    • Feature-Based Time Series Generation  + (To build highly accurate and robust machinTo build highly accurate and robust machine learning algorithms practitioners require data in high quality, quantity and diversity. Available time series data sets often lack in at least one of these attributes. In cases where collecting more data is not possible or too expensive, data-generating methods help to extend existing data. Generation methods are challenged to add diversity to existing data while providing control to the user over what type of data is generated. Modern methods only address one of these challenges. In this thesis we propose a novel generation algorithm that relies on characteristics of time series to enable control over the generation process. We combine classic interpretable features with unsupervised representation learning by modern neural network architectures. Further we propose a measure and visualization for diversity in time series data sets. We show that our approach can create a controlled set of time series as well as adding diversity by recombining characteristics across available instances.haracteristics across available instances.)
    • Assessing Human Understanding of Machine Learning Models  + (To deploy an ML model in practice, a stakeTo deploy an ML model in practice, a stakeholder needs to understand the behaviour and implications of this model. To help stakeholders develop this understanding, researchers propose a variety of technical approaches, so called eXplainable Artificial Intelligence (XAI). Current XAI approaches follow very task- or model-specific objectives. There is currently no consensus on a generic method to evaluate most of these technical solutions. This complicates comparing different XAI approaches and choosing an appropriate solution in practice. To address this problem, we formally define two generic experiments to measure human understanding of ML models. From these definitions we derive two technical strategies to improve understanding, namely (1) training a surrogate model and (2) translating inputs and outputs to effectively perceivable features. We think that most existing XAI approaches only focus on the first strategy. Moreover, we show that established methods to train ML models can also help stakeholders to better understand ML models. In particular, they help to mitigate cognitive biases. In a case study, we demonstrate that our experiments are practically feasible and useful. We suggest that future research on XAI should use our experiments as a template to design and evaluate technical solutions that actually improve human understanding.that actually improve human understanding.)
    • Cost-Efficient Evaluation of ML Classifiers With Feature Attribution Annotations (Final BA Presentation)  + (To evaluate the loss of cognitive ML modelTo evaluate the loss of cognitive ML models, e.g., text or image classifier, accurately, one usually needs a lot of test data which are annotated manually by experts. In order to estimate accurately, the test data should be representative or else it would be hard to assess whether a model overfits, i.e., it uses spurious features of the images significantly to decide on its predictions.With techniques such as Feature Attribution, one can then compare important features that the model sees with their own expectations and can therefore be more confident whether or not he should trust the model. In this work, we propose a method that estimates the loss of image classifiers based on Feature-Attribution techniques. We use the classic approach for loss estimate as our benchmark to evaluate our proposed method. At the end of this work, our analysis reveals that our proposed method seems to have a similar loss estimate to that of the classic approach with a good image classifer and a representative test data. Based on our experiment, we expect that our proposed method could give a better loss estimate than the classic approach in cases where one has a biased test data and an image classifier which overfits.ta and an image classifier which overfits.)
    • Integrated Reliability Analysis of Business Processes and Information Systems  + (Today it is hardly possible to find a busiToday it is hardly possible to find a business process (BP) that does not involve working with an information system (IS). In order to better plan and improve such BPs a lot of research has been done on modeling and analysis of BPs. Given the dependency between BPs and IS such assessment of BPs should take the IS into account. Furthermore, in most assessment of BPs only the functionality, but not the so called non-functional requirements (NFR) are taken into account. This is not adequate, since NFRs influence BPs just as they influence IS. In particular the NFR reliability is interesting for planning of BPs in business environments. Therefore, the presented approach provides an integrated reliability analysis of BPs and IS. The proposed analysis takes humans, device resources and the impact from the IS into account. In order to model reliability information it has to be determined, which metrics will be used for each BP element. Thus a structured literature search on reliability modeling and analysis is conducted in seven resources. Through the structured search 40 papers on modeling and analysis of BP reliability were found. Ten of them were classified as relevant for the topic. The structured search revealed that no approach allows for modeling reliability of activities and resources separate from each other. Moreover, there is no common answer on how to model human resources in BPs. In order to enable such an integrated approach the reliability information of BPs is modeled as an extension of the IntBIIS approach. BP actions get a failure probability and the resources are extended with two reliability related attributes. For device resources the commonly used MTTF and MTTR are added in order to provide reliability information. Roles, that are associated with actor resources, are annotated with MTTF and a newly developed MTTRepl. The next step is a reliability analysis of an BP including the IS. Markov chains and reduction rules are used to analyze the BP reliability. This approach is exemplary implemented with Java in the context of PCM, that already provides analysis for IS. The result of the analysis is the probability of successful execution of the BP including the IS. An evaluation of the implemented analysis presents that it is possible to analyze the reliability of a BP including all resources and the involved IS. The results show that the reliability prediction is more accurate, when BP and IS are assessed through a combined analysis. are assessed through a combined analysis.)
    • Deriving Twitter Based Time Series Data for Correlation Analysis  + (Twitter has been identified as a relevant Twitter has been identified as a relevant data source for modelling purposes in the last decade. In this work, our goal was to model the conversational dynamics of inflation development in Germany through Twitter Data Mining. To accomplish this, we summarized and compared Twitter data mining techniques for time series data from pertinent research. Then, we constructed five models for generating time series from topic-related tweets and user profiles of the last 15 years. Evaluating the models, we observed that several approaches like modelling for user impact or adjusting for automated twitter accounts show promise. Yet, in the scenario of modelling inflation expectation dynamics, these more complex models could not contribute to a higher correlation between German CPI and the resulting time series compared to a baseline approach.me series compared to a baseline approach.)
    • Entwurf und Umsetzung von Zugriffskontrolle in der Sichtenbasierten Entwicklung  + (Um der steigenden Komplexität technischer Um der steigenden Komplexität technischer Systeme zu begegnen, werden in ihrer Entwicklung sichtenbasierte Entwicklungsprozesse eingesetzt. Die dabei definierten Sichten zeigen nur die für ein bestimmtes Informationsbedürfnis relevanten Daten über das System, wie die Architektur, die Implementierung oder einen Ausschnitt davon und reduzieren so die Menge an Informationen und vereinfachen dadurch die Arbeit mit dem System. Neben dem Zweck der Informationsreduktion kann auch eine Einschränkung des Zugriffs aufgrund fehlender Zugriffsberechtigungen notwendig sein. Die Notwendigkeit ergibt sich beispielsweise bei der organisationsübergreifenden Zusammenarbeit zur Umsetzung vertraglicher Vereinbarungen. Um die Einschränkung des Zugriffs umsetzen zu können, wird eine Zugriffskontrolle benötigt. Bestehende Arbeiten nutzen eine Zugriffskontrolle für die Erzeugung einer Sicht. Die Definition weiterer Sichten darauf ist nicht vorgesehen. Außerdem fehlt eine allgemeine Betrachtung einer Integration einer Zugriffskontrolle in einen sichtenbasierten Entwicklungsprozess. Daher stellen wir in dieser Arbeit das Konzept einer Integration einer rollenbasierten Zugriffskontrolle in einen sichtenbasierten Entwicklungsprozess für beliebige Systeme vor. Mit dem Konzept ermöglichen wir die feingranulare Definition und Auswertung von Zugriffsrechten für einzelne Modellelemente für beliebige Metamodelle. Das Konzept implementieren wir prototypisch in Vitruv, einem Framework für sichtenbasierte Entwicklung. Wir evaluieren diesen Prototypen hinsichtlich seiner Funktionalität mithilfe von Fallstudien. Die Zugriffskontrolle konnten wir dabei für verschiedene Fallstudien erfolgreich einsetzen. Außerdem diskutieren wir die Integrierbarkeit des Prototypen in einen allgemeinen sichtenbasierten Entwicklungsprozess.inen sichtenbasierten Entwicklungsprozess.)
    • Konzept eines Dokumentationsassistenten zur Erzeugung strukturierter Anforderungen basierend auf Satzschablonen  + (Um die Qualität und Glaubwürdigkeit eines Um die Qualität und Glaubwürdigkeit eines Produktes zu erhalten, ist ein systematisches Anforderungsmanagement erforderlich, wobei die Merkmale eines Produkts durch Anforderungen beschrieben werden. Deswegen wurde im Rahmen dieser Arbeit ein Konzept für einen Dokumentationsassistenten entwickelt, mit dem Benutzer strukturierte Anforderungen basierend auf den Satzschablonen nach SOPHIST erstellen können. Dies beinhaltet einen linguistischen Aufbereitungsansatz, der semantische Rollen aus freiem Text extrahiert. Während des Dokumentationsprozesses wurden die semantischen Rollen benutzt, um die passendste Satzschablone zu identifizieren und diese als Hilfestellung dem Benutzer aufzuzeigen. Zudem wurde eine weitere Hilfestellung angeboten, nämlich die Autovervollständigung, die mithilfe von Markovketten das nächste Wort vorhersagen kann. Insgesamt wurden rund 500 Anforderungen aus verschiedenen Quellen herangezogen, um die Integrität des Konzepts zu bewerten. Die Klassifizierung der Texteingabe in eine Satzschablone erreicht ein F1-Maß von 0,559. Dabei wurde die Satzschablone für funktionale Anforderungen mit einem F1-Maß von 0,908 am besten identifiziert. Außerdem wurde der Zusammenhang zwischen den Hilfestellungen mithilfe eines Workshops bewertet. Hierbei konnte gezeigt werden, dass die Anwendung des vorliegenden Konzepts, die Vollständigkeit von Anforderungen verbessert und somit die Qualität der zu dokumentierenden Anforderungen steigert.u dokumentierenden Anforderungen steigert.)
    • Überführen von Systemarchitekturmodellen in die datenschutzrechtliche Domäne durch Anwenden der DSGVO  + (Um die im digitalen Raum allgegenwärtigen,Um die im digitalen Raum allgegenwärtigen, personenbezogenen Daten vor Missbrauch zu schützen hat die EU eine Datenschutzgrundverordnung eingeführt. An diese müssen sich sämtliche Unternehmen halten, die mit personenbezogenen Daten im digitalen Raum hantieren. Die Implementierung dieser in Softwaresystemen stellt sich aber durch die Involvierung der juristischen Domäne als aufwändig dar. In dieser Bachelorarbeit wurde daher eine Transformation aus Palladio in ein GDPR-Modell entwickelt, um die Kommunikation der verschiedenen Fachbereiche zu erleichtern.verschiedenen Fachbereiche zu erleichtern.)
    • Kritische Workflows in der Fertigungsindustrie  + (Um mögliche Inkonsistenzen zwischen techniUm mögliche Inkonsistenzen zwischen technischen Modellen und ihren verursachenden Workflows in der Fertigungsindustrie zu identifizieren, wurde der gesamte Fertigungsprozess eines beispielhaften Präzisionsfertigers in einzelne Workflows aufgeteilt. Daraufhin wurden neun Experteninterviews durchgeführt, um mögliche Inkonsistenzen zwischen technischen Modellen zu identifizieren und diese in die jeweiligen verursachenden Workflows zu kategorisieren. Insgesamt wurden 13 mögliche Inkonsistenzen dargestellt und ihre jeweilige Entstehung erläutert. In einer zweiten Interview-Iteration wurden die Experten des Unternehmens erneut zu jeder zuvor identifizierten Inkonsistenz befragt, um die geschätzten Auftrittswahrscheinlichkeiten der Inkonsistenzen und mögliche Auswirkungen auf zuvor durchgeführte, oder darauf folgende Workflows in Erfahrung zu bringen.olgende Workflows in Erfahrung zu bringen.)
    • Modellierung von Annahmen in Softwarearchitekturen  + (Undokumentierte Sicherheitsannahmen könnenUndokumentierte Sicherheitsannahmen können zur Vernachlässigung von Softwareschwachstellen führen, da Zuständigkeit und Bezugspunkte von Sicherheitsannahmen häufig unklar sind. Daher ist das Ziel dieser Arbeit, Sicherheitsannahmen in den komponentenbasierten Entwurf zu integrieren. In dieser Arbeit wurde basierend auf Experteninterviews und Constructive Grounded Theory ein Modell für diesen Zweck abgeleitet. Anhand einer Machbarkeitsstudie wird der Einsatz des Annahmenmodells demonstriert. Einsatz des Annahmenmodells demonstriert.)
    • Komplexe Abbildungen von Formularelementen zur Generierung von aktiven Ontologien  + (Unser heutiges Leben wird zunehmend von AsUnser heutiges Leben wird zunehmend von Assistenzsystemen erleichtert. Hierzu gehören auch die immer häufiger verwendeten intelligenten Sprachassistenten wie Apple's Siri. Statt lästigem Flüge vergleichen auf diversen Internetportalen können Sprachassistenten dieselbe Arbeit tun.</br>Um Informationen verarbeiten und an den passenden Webdienst weiterleiten zu können, muss das Assistenzsystem natürliche Sprache verstehen und formal repräsentieren können. Hierfür werden bei Siri aktive Ontologien (AOs) verwendet, die derzeit mit großem manuellem Aufwand manuell erstell werden müssen. </br>Die am KIT entwickelte Rahmenarchitektur EASIER beschäftigt sich mit der automatischen Generierung von aktiven Ontologien aus Webformularen.</br>Eine Herausforderung bei der Erstellung von AOs aus Webformularen ist die Zuordnung unterschiedlich ausgeprägter Formularelemente mit gleicher Semantik, da semantisch gleiche aber unterschiedlich realisierte Konzepte zu einem AO-Knoten zusammengefasst werden sollen. Es ist daher nötig, semantisch ähnliche Formularelemente identifizieren zu können.</br>Diese Arbeit beschäftigt sich mit der automatischen Identifikation solcher Ähnlichkeiten und der Konstruktion von Abbildungen zwischen Formularelementen.on Abbildungen zwischen Formularelementen.)
    • Erstellung eines Benchmarks zum Anfragen temporaler Textkorpora zur Untersuchung der Begriffsgeschichte und historischen Semantik  + (Untersuchungen innerhalb der BegriffsgeschUntersuchungen innerhalb der Begriffsgeschichte erfahren einen Aufschwung. Anhand neuer technologischer Möglichkeiten ist es möglich große Textmengen maschinengestützt nach wichtigen Belegstellen zu untersuchen. Hierzu wurden die methodischen Arbeitsweisen der Historiker und Linguisten untersucht um bestmöglich deren Informationsbedürfnisse zu befriedigen. Auf dieser Basis wurden neue Anfrageoperatoren entwickelt und diese in Kombination mit bestehenden Operatoren in einem funktionalen Benchmark dargestellt. Insbesondere eine Anfragesprache bietet die nötige Parametrisierbarkeit, um die variable Vorgehensweise der Historiker unterstützen zu können.ise der Historiker unterstützen zu können.)
    • Detecting Outlying Time-Series with Global Alignment Kernels  + (Using outlier detection algorithms, e.g., Using outlier detection algorithms, e.g., Support Vector Data Description (SVDD), for detecting outlying time-series usually requires extracting domain-specific attributes. However, this indirect way needs expert knowledge, making SVDD impractical for many real-world use cases. Incorporating "Global Alignment Kernels" directly into SVDD to compute the distance between time-series data bypasses the attribute-extraction step and makes the application of SVDD independent of the underlying domain.</br></br>In this work, we propose a new time-series outlier detection algorithm, combining "Global Alignment Kernels" and SVDD. Its outlier detection capabilities will be evaluated on synthetic data as well as on real-world data sets. Additionally, our approach's performance will be compared to state-of-the-art methods for outlier detection, especially with regard to the types of detected outliers. regard to the types of detected outliers.)
    • Efficient Verification of Data-Value-Aware Process Models  + (Verification methods detect unexpected behVerification methods detect unexpected behavior of business process models before their execution. In many process models, verification depends on data values. A data value is a value in the domain of a data object, e.g., $1000 as the price of a product. However, verification of process models with data values often leads to state-space explosion. This problem is more serious when the domain of data objects is large. The existing works to tackle this problem often abstract the domain of data objects. However, the abstraction may lead to a wrong diagnosis when process elements modify the value of data objects.</br> </br>In this thesis, we provide a novel approach to enable verification of process models with data values, so-called data-value-aware process models. A distinctive of our approach is to support modification of data values while preserving the verification results. We show the functionality of our approach by conducting the verification of a real-world application: the German 4G spectrum auction model.ion: the German 4G spectrum auction model.)
    • On the Interpretability of Anomaly Detection via Neural Networks  + (Verifying anomaly detection results when wVerifying anomaly detection results when working in on an unsupervised use case is challenging. For large datasets a manual labelling is economical unfeasible. In this thesis we create explanations to help verifying and understanding the detected anomalies. We develop a method to rule generation algorithm that describe frequent patterns in the output of autoencoders. The number of rules is significantly lower than the number of anomalies. Thus, finding explanations for these rules is much less effort compared to finding explanations for every single anomaly. Its performance is evaluated on a real-world use case, where we achieve a significant reduction of effort required for domain experts to understand the detected anomalies but can not specify the usefulness in exact numbers due to the missing labels. Therefore, we also evaluate the approach on benchmark dataset.valuate the approach on benchmark dataset.)
    • A Mobility Case Study Framework for Validating Uncertainty Impact Analyses regarding Confidentiality  + (Vertraulichkeit ist eine wichtige SicherheVertraulichkeit ist eine wichtige Sicherheitsanforderung an Informationssysteme. Bereits im frühen Entwurf existieren Ungewissheiten, sowohl über das System als auch dessen Umgebung, die sich auf die Vertraulichkeit auswirken können. Es existieren Ansätze, die Softwarearchitektinnen und Softwarearchitekten bei der Untersuchung von Ungewissheiten und deren Auswirkung auf die Vertraulichkeit unterstützen und somit den Aufwand reduzieren. Diese Ansätze wurden jedoch noch nicht umfangreich evaluiert. Bei der Evaluierung ist ein einheitliches Vorgehen wichtig, um konsistente Ergebnisse zu erhalten. Obwohl es allgemein Arbeiten in diesem Bereich gibt, sind diese nicht spezifisch genug, um die Anforderung zu erfüllen.</br></br>In dieser Ausarbeitung stellen wir ein Rahmenwerk vor, das diese Lücke schließen soll. Dieses Rahmenwerk besteht aus einem Untersuchungsprozess und einem Fallstudienprotokoll, diese sollen Forschenden helfen, weitere Fallstudien zur Validierung der Ungewissheits-Auswirkungs-Analysen strukturiert durchzuführen und damit auch Ungewissheiten und deren Auswirkung auf Vertraulichkeit zu erforschen. Wir evaluieren unseren Ansatz, indem wir eine Mobilitätsfallstudie durchführen.wir eine Mobilitätsfallstudie durchführen.)
    • Erhaltung des Endanwenderflows in PREEvision durch asynchrone Job-Verarbeitung  + (Viele modellgetriebene EntwicklungsumgebunViele modellgetriebene Entwicklungsumgebungen verfolgen einen rein sequenziellen Ansatz. Modelltransformationen werden sequenziell ausgeführt und zu einem Zeitpunkt darf stets nur eine Modelltransformation ausgeführt werden. Auf entsprechend großen Datenmengen ergeben sich hierdurch jedoch einige Einschränkungen. So kann es dazu kommen, dass Nutzer mehrere Minuten oder sogar Stunden auf den Abschluss einer Modelltransformation warten müssen und die Software währenddessen nicht für Nutzereingaben zur Verfügung steht, selbst wenn die Modelltransformation nur auf einen Teil des Modells zugreift. Dieser Zustand kann jedoch den Nutzerflow unterbrechen, einen mentalen Zustand des Nutzers, der gleichzeitig produktiv ist und als belohnend wahrgenommen wird. </br></br>Eine Möglichkeit, um das Risiko zu minimieren, dass der Nutzerflow unterbrochen wird, ist die Wartezeit für den Nutzer zu verkürzen, indem Modelltransformationen asynchron im Hintergrund ausgeführt werden. Der Nutzer kann dann mit eingeschränkt weiterarbeiten, während die Modelltransformation durchgeführt wird. </br></br>Im Kontext von modellgetriebener Softwareentwicklung findet sich zu Nebenläufigkeit nur wenig Forschung. Zwar gibt es einige Ambitionen, Modelltransformationen zu parallelisieren, jedoch gibt es keine Forschung dazu, Modelltransformationen asynchron auszuführen um weitere Modelltransformationen simultan durchführen zu können. </br></br>Die vorliegende Arbeit stellt am Beispiel der modellgetrieben entwickelten Software PREEvision der Firma Vector Informatik GmbH, Mechanismen und mögliche Implementierungen vor, mit denen simultane Modelltransformationen realisiert werden können. Für vier Operationen in PREEvision wird außerdem beispielhaft beschrieben, wie die Operationen mit Hilfe der vorgestellten Mechanismen so modifiziert werden können, dass diese asynchron ausgeführt werden. Die Prototypen der beschriebenen Modifikationen werden anschließend im Hinblick auf die Unterbrechung des Nutzerflows und die Korrektheit evaluiert. Abschließend zieht die Arbeit ein Fazit über die Anwendbarkeit der vorgestellten Mechanismen und darüber, ob der Nutzer durch die Prototypen seltener auf Wartedialoge warten muss.pen seltener auf Wartedialoge warten muss.)
    • Ein Ansatz zur Wiederherstellung von Nachverfolgbarkeitsverbindungen für natürlichsprachliche Softwaredokumentation und Quelltext  + (Wartbarkeit spielt eine zentrale Rolle fürWartbarkeit spielt eine zentrale Rolle für die Langlebigkeit von Softwareprojekten. Ein wichtiger Teil der Wartbarkeit besteht darin, dass die natürlichsprachliche Dokumentation des Quelltextes einen guten Einblick in das Projekt und seinen dazugehörigen Quelltext liefert. Zur besseren Wartbarkeit dieser beiden Software-Artefakte besteht die Aufgabe dieser Arbeit darin, Verbindungen zwischen den Elementen dieser beiden Artefakte aufzubauen. Diese Verbindungen heißen Trace Links und können für verschiedene Zwecke der Wartbarkeit genutzt werden. Diese Trace Links ermöglichen zum Beispiel die Inkonsistenzerkennung zwischen den beiden Software-Artefakten oder können auch für verschiedene Analysen benutzt werden. Um diese Trace Links nachträglich aus den beiden Software-Artefakten natürlichsprachlicher Dokumentation und Quelltext zu gewinnen, wird das bereits bestehende ArDoCo Framework benutzt und auf das Software-Artefakt Quelltext erweitert. Ebenfalls werden ArDoCos bestehende Entscheidungskriterien auf den neuen Kontext angepasst. Der neuartige Kontext führt zu Herausforderungen bezüglich der Datenmenge, die durch neue Entscheidungskriterien adressiert werden. Dabei zeugen die Ergebnisse dieser Arbeit eindeutige von Potenzial, weswegen weiter darauf aufgebaut werden sollte.gen weiter darauf aufgebaut werden sollte.)
    • Uncertainty-aware Confidentiality Analysis Using Architectural Variations  + (Wenn man Softwaresysteme auf Verletzungen Wenn man Softwaresysteme auf Verletzungen der Vertraulichkeit untersuchen will, führen Ungewissheiten zu falschen Aussagen über die Architektur. Vertraulichkeitsaussagen können zur Entwurfszeit kaum getroffen werden, ohne diese Ungewissheiten zu behandeln. Wir entwickeln einen Kombinationsalgorithmus, der Informationen über die Ungewissheiten bei der Analyse der Architekturszenarien berücksichtigt und daraus eine Aussage über die Vertraulichkeit des Systems treffen kann.</br>Wir evaluieren, ob es möglich ist, ein System mit zusätzlichen Informationen nicht-binär zu bewerten, wie genau der Kombinationsalgorithmus ist und ob die zusätzlichen Informationen so minimal bleiben, dass ein Softwarearchitekt den Kombinationsalgorithmus überhaupt verwenden kann.tionsalgorithmus überhaupt verwenden kann.)
    • Surrogate models for crystal plasticity - predicting stress, strain and dislocation density over time  + (When engineers design structures, prior knWhen engineers design structures, prior knowledge of how they will react to external forces is crucial. Applied forces introduce stress, leading to dislocations of individual molecules that ultimately may cause material failure, like cracks, if the internal strain of the material exceeds a certain threshold. We can observe this by applying increasing physical forces to a structure and measure the stress, strain and the dislocation density curves.</br></br>Finite Elemente Analysis (FEM) enables the simulation of a material deforming under external forces, but it comes with very high computational costs. This makes it unfeasible to conduct a large number of simulations with varying parameters.</br>In this thesis, we use neural network based sequence models to build a data-driven surrogate model that predicts stress, strain and dislocation density curves produced by an FEM-simulation based on the simulation’s input parameters.ased on the simulation’s input parameters.)
    • Context Generation for Code and Architecture Changes Using Large Language Models  + (While large language models have succeededWhile large language models have succeeded in generating code, the struggle is to modify large existing code bases. The Generated Code Alteration (GCA) process is designed, implemented, and evaluated in this thesis. The GCA process can automatically modify a large existing code base, given a natural language task. Different variations and instantiations of the process are evaluated in an industrial case study. The code generated by the GCA process is compared to code written by human developers. The language model-based GCA process was able to generate 13.3 lines per error, while the human baseline generated 65.8 lines per error. While the generated code did not match the overall human performance in modifying large code bases, it could still provide assistance to human developers.ll provide assistance to human developers.)
    • Generating Causal Domain Knowledge for Cloud Systems Monitoring  + (While standard machine learning approachesWhile standard machine learning approaches rely solely on data to learn relevant patterns, in certain fields, this may not be sufficient. Researchers in the Healthcare domain, have successfully applied causal domain knowledge to improve prediction quality of machine learning models, especially for rare diseases. The causal domain knowledge informs the machine learning model about similar diseases, thus improving the quality of the predictions.</br></br>However, some domains, such as Cloud Systems Monitoring, lack readily</br>available causal domain knowledge, and thus the knowledge must be approximated.</br>Therefore, it is important to have a systematic investigation of the processes and</br>design decision that affect the knowledge generation process.</br></br>In this study, we showed how causal discovery algorithms can be employed to generate causal domain knowledge</br>from raw textual logs in the Cloud Systems Monitoring domain. We also</br>investigated the impact of various design choices on the domain knowledge</br>generation process through systematic testing across multiple datasets and</br>shared the insights we gained. To our knowledge, this is the first time such an</br>investigation has been conducted. such an investigation has been conducted.)
    • Model-Based Rule Engine for the Reconstruction of Component-Based Software Architectures for Quality Prediction  + (With architecture models, software developWith architecture models, software developers and architects are able to enhance their documentation and communication, perform architecture analysis, design decisions and finally with PCM, can start quality predictions. However, the manual creation of component architecture models for complex systems is difficult and time consuming. Instead, the automatic generation of architecture models out of existing projects saves time and effort. For this purpose, a new approach is proposed which uses technology specific rule artifacts and a rule engine that transforms the source code of software projects into a model representation, applies the given rules and then automatically generates a static software architecture model. The resulting architecture model is then usable for quality prediction purposes inside the PCM context. The concepts for this approach are presented and a software system is developed, which can be easily extended with new rule artifacts to be useful for a broader range of technologies used in different projects. With the implementation of a prototype, the collection of technology specific rule sets and an evaluation including different reference systems the proposed functionality is proven and a solid foundation for future improvements is given.undation for future improvements is given.)
    • Enabling the Collaborative Collection of Uncertainty Sources Regarding Confidentiality  + (With digitalization in progress, the amounWith digitalization in progress, the amount of sensitive data stored in software systems is increasing. However, the confidentiality of this data can often not be guaranteed, as uncertainties with an impact on confidentiality exist, especially in the early stages of software development. As the consideration of uncertainties regarding confidentiality is still novel, there is a lack of awareness of the topic among software architects. Additionally, the existing knowledge is scattered among researchers and institutions, making it challenging to comprehend and utilize for software architects. Current research on uncertainties regarding confidentiality has focused on analyzing software systems to assess the possibilities of confidentiality violations, as well as the development of methods to classify uncertainties. However, these approaches are limited to the researchers’ observed uncertainties, limiting the generalizability of classification systems, the validity of analysis results, and the development of mitigation strategies. This thesis presents an approach to enable the collection and management of knowledge on uncertainties regarding confidentiality, enabling software architects to comprehend better and identify uncertainties regarding confidentiality. Furthermore, the proposed approach strives to enable collaboration between researchers and practitioners to manage the effort to collect the knowledge and maintain it. To validate this approach, a prototype was developed and evaluated with a user study of 17 participants from software engineering, including 7 students, 5 researchers, and 5 practitioners. Results show that the approach can support software architects in identifying and describing uncertainties regarding confidentiality, even with limited prior knowledge, as they could identify and describe uncertainties correctly in a close-to-real-world scenario in 94.4% of the cases.real-world scenario in 94.4% of the cases.)
    • Automated Classification of Software Engineering Papers along Content Facets  + (With existing search strategies, specific With existing search strategies, specific paper contents can only be searched indirectly. Keywords are used to describe the searched content as accurately as possible but many of the results are not related to what was searched for. A classification of these contents, if automated, could extend the search process and thereby allow to specify the searched content directly and enhance current state of scholarly communication.</br>In this thesis, we investigated the automatic classification of scientific papers in the Software Engineering domain.</br>In doing so, a classification scheme of paper contents with regard to Research Object, Statement, and Evidence was consolidated.</br>We then investigate in a comparative analysis the machine learning algorithms Naïve Bayes, Support Vector Machine, Multi-Layer Perceptron, Logistic Regression, Decision Tree, and BERT applied to the classification task.d BERT applied to the classification task.)
    • Location sharing with secrecy guarantees in mobile social networks  + (With the increasing popularity of locationWith the increasing popularity of location-based services and mobile online social networks (mOSNs), secrecy concerns have become one of the main worries of its users due to location information exposure. Users are required to store their location, i.e., physical position, and the relationships that they have with other users, e.g., friends, to have access to the services offered by these networks. This information, however, is sensitive and has to be protected from unauthorized access.</br>In this thesis, we aim to offer location-based services to users of mOSNs while guaranteeing that an adversary, including the service provider, will not be able to learn the locations of the users (location secrecy) and the relationship existing between them (relationship secrecy). We consider both linking attacks and collusion attacks. We propose two approaches R-mobishare and V-mobishare, which combine existing cryptographic techniques. Both approaches use, among others, private broadcast encryption and homomorphic encryption. Private broadcast encryption is used to protect the relationships existing between users, and homomorphic encryption is used to protect the location of the users. Our system allows users to query their nearby friends. Next, we prove that our proposed approaches fulfill our secrecy guarantees, i.e., location and relationship secrecy. Finally, we evaluate the query performance of our proposed approaches and use real online social networks to compare their performance. The result of our experiments shows that in a region with low population density such as suburbs, our first approach, R-mobishare, performs better than our approach V-mobishare. On the contrary, in a region with high population density such as downtown, our second approach, V-mobishare, perform better than R-mobishare.obishare, perform better than R-mobishare.)
    • Traceability of Telemetry Data in Hybrid Architectures  + (With the rise of Software-as-a-Service proWith the rise of Software-as-a-Service products, the software development landscape transformed to a more agile and data-driven environment. The amount of telemetry data, collected from the users actions, is rapidly increasing and with it the possibilities but also the challenges of using the collected data for quality improvement purposes.</br> </br>LogMeIn Inc. is a global company offering Software-as-a-Service solutions for remote collaboration and IT management. An example product is GoToMeeting which allows to create and join virtual meeting rooms.</br> </br>This Master’s Thesis presents the JoinTracer approach which enables the telemetry-data-based traceability of GoToMeeting join-flows of the GoToMeeting architecture. The approach combines new mechanics and already existing traceability techniques from different traceability communities to leverage synergies and to enable the traceability of individual join-flows.</br>In this work, the JoinTracer approach is designed and implemented as well as evaluated regarding the functionality, performance and acceptance. The results are discussed to analyze the future development and the applicability of this approach to other contexts as well.f this approach to other contexts as well.)
    • Worteinbettungen für die Anforderungsdomäne  + (Worteinbettungen werden in Aufgaben aus deWorteinbettungen werden in Aufgaben aus der Anforderungsdomäne auf vielfältige Weise eingesetzt. In dieser Arbeit werden Worteinbettungen für die Anforderungsdomäne gebildet und darauf geprüft, ob sie in solchen Aufgaben bessere Ergebnisse als generische Worteinbettungen erzielen. Dafür wird ein Korpus von in der Anforderungsdomäne üblichen Dokumenten aufgebaut. Er umfasst 21458 Anforderungsbeschreibungen und 1680 Anwendererzählungen. Verschiedene Worteinbettungsmodelle werden auf ihre Eignung für das Training auf dem Korpus analysiert. Mit dem fastText-Modell, das durch die Berücksichtigung von Teilwörtern seltene Wörter besser darstellen kann, werden die domänenspezifischen Worteinbettungen gebildet. Sie werden durch Untersuchung von Wortähnlichkeiten und Clusteranalysen intrinsisch evaluiert. Die domänenspezifischen Worteinbettungen erfassen einige domänenspezifische Feinheiten besser, die untersuchten generischen Worteinbettungen hingegen stellen manche Wörter besser dar. Um die Vorteile beider Worteinbettungen zu nutzen, werden verschiedene Kombinationsverfahren analysiert und evaluiert. In einer Aufgabe zur Klassifizierung von Sätzen aus Anforderungsbeschreibungen erzielt eine gewichtete Durchschnittsbildung mit einer Gewichtung von 0,7 zugunsten der generischen Worteinbettungen die besten Ergebnisse. Ihr bester Wert ist eine Genauigkeit von 0,83 mittels eines LSTMs als Klassifikator und der Training-Test-Teilung als Testverfahren. Die domänenspezifischen, bzw. generischen Worteinbettungen liefern dabei hingegen lediglich 0,75, bzw. 0,72. dabei hingegen lediglich 0,75, bzw. 0,72.)
    • Bayesian Optimization for Wrapper Feature Selection  + (Wrapper feature selection can lead to highWrapper feature selection can lead to highly accurate classifications. However, the computational costs for this are very high in general. Bayesian Optimization on the other hand has already proven to be very efficient in optimizing black box functions. This approach uses Bayesian Optimization in order to minimize the number of evaluations, i.e. the training of models with different feature subsets. We propose four different ways to set up the objective function for the Bayesian optimization. On 14 different classification datasets the approach is compared against 14 other established feature selection methods, including other wrapper methods, but also filter methods and embedded methods. We use gaussian processes and random forests for the surrogate model. The classifiers which are applied to the selected feature subsets are logistic regression and naive bayes. We compare all the different feature selection methods against each other by comparing their classification accuracies and runtime. Our approach shows to keep up with the most established feature selection methods, but the evaluation also shows that the experimental setup does not value the feature selection enough. Concluding, we give guidelines how an experimental setup can be more appropriate and several concepts are provided of how to develop the Bayesian optimization for wrapper feature selection further.ion for wrapper feature selection further.)
    • Praktikumsbericht: Toolentwicklung zur Bearbeitung und Analyse von High-Speed-Videoaufnahmen  + (Während des Praktikums bestand mein AufgabWährend des Praktikums bestand mein Aufgabengebiet im Rahmen der Weiterentwicklung einer vollumfangreichen Softwareumgebung zur Videobearbeitung und Synchronisation von Motordaten daraus, mich in die Softwareumgebung MATLAB einzuarbeiten und mich daraufhin mit der vorhandenen Software vertraut zu machen, um diese dann in vielerlei Hinsicht aufzufrischen und um neue Funktionen zu erweitern.schen und um neue Funktionen zu erweitern.)
    • Development of an Approach to Describe and Compare Simulators  + (Ziel der Arbeit ist die Beschreibung von SZiel der Arbeit ist die Beschreibung von Simulatoren und deren Vergleich.</br>Damit Simulatoren beschrieben werden können ist es notwendig die Elemente zu identifizieren, die in Summ eine vollständige Beschreibung eines Simulators ermöglicht. Basierend</br>auf der Beschreibung werden dann Vergleichsmöglichkeiten entwickelt, sodass</br>beschriebene Simulatoren miteinander Verglichen werden können. Der Vergleich dient</br>der Ermittlung der Ähnlichkeit von Simulatoren. Da die Ähnlichkeit zwischen Simulatoren</br>nicht allgemeingültig definierbar ist, ist auch Teil der Arbeit diese Ähnlichkeitsmaße</br>zu definieren und zu beschreiben. Im Fokus dieser Arbeit sind diskrete ereignisorientierte Simulatoren.</br>Das übergeordnete Ziel ist das wiederfinden von Simulatoren in bereits bestehenden</br>Simulationen um die Wiederverwendung zu ermöglichen. Daher ist das Ziel die</br>Vergleichsmöglichkeiten dahingehend zu entwickeln, dass auch Teile von Simulationen</br>wiedergefunden werden können. Das entwickelte Tool DesComp implementiert sowohl</br>die Möglichkeit der Beschreibung als auch die notwendigen Verfahren für den Vergleich</br>von Simulatoren. Für die Evaluation der Eignung der entwickelten Verfahren wird eine</br>Fallstudie anhand des Simulators EventSim durchgeführt.hand des Simulators EventSim durchgeführt.)
    • Vorhersage und Optimierung von Konzernsteuerquoten auf Basis der SEC-Edgar-Datenbank  + (Ziel der vorliegenden Arbeit ist die KonzeZiel der vorliegenden Arbeit ist die Konzernsteuerquoten (effecktive tax rate, oder ETR) vorherzusagen durch die Prognosemodelle in Data-Mining. Durch die Analyse und Vergleich von ETR kann die Wettbewerbsfähigkeit verbessert werden und spielt somit eine große Rolle in der Konzernplanung in kommenden Jahren.</br></br>Voraussetzung ist eine verlässliche Grundlage von Beispieldaten aus der realen Steuerungskennzahl (key performance indicator, oder KPI) eines Jahresabschlussberichtes, um ein erfolgreiches Training der Modelle zu ermöglichen. Eine solche Datengrundlage bietet die SEC-Edgar-Datenbank. Ab dem Jahr 1994 uber die Edgar-Datenbank (Electronic Data Gathering, Analysis, and Retrieval) sind alle Dokumente zugänglich. Eine SEC-Filling ist ein formales und standardisiertes Dokument, welches amerikanische Unternehmen seit dem Securities Exchange Act von 1934 bei der SEC einreichen mussen. Laut dem Steuerexperte werden die KPIs durch Python-Skript in der uber mehrerer Jahre (2012-2016) hinweg extrahiert. Wegen der fehlenden Kenntnisse von der Hochladen des Unternehmen sind die 10-K Reporte sehr dunn (mehr als 60% Daten sind fehlende). Zufällige dunne Besetzung (random sparsity) entsteht bei nicht regelmäßiger bzw. schwer vorhersehbarer Belegung.</br></br>Ein bewährter Ansatz fur die fehlende Werte ist Online Spärlichkeit Lineare Regression (OSLR). OSLR stellt einerseits eine Art der Ausfullen der fehlenden Werte dar und dienen andererseits der Beseitigung der Datenqualitätsmängel.</br></br>Die Untersuchung der Anwendbarkeit Multipler lineare Regression, der Zeitreihenanalyse (ARIMA) und kunstlicher neuronaler Netze (KNN) im Bereich der Finanzberichterstattung. Es wird dabei die ETR berucksichtigt. Danach werden diese Methode aus den Bereich Data-Mining verwendet, um die ETR aus Steuerungsgröße vorherzusagen.die ETR aus Steuerungsgröße vorherzusagen.)
    • Modellierung und Export von Multicore-Eigenschaften für Simulationen während der Steuergeräteentwicklung für Fahrzeuge  + (Zukünftige Anwendungen der AutomobilindustZukünftige Anwendungen der Automobilindustrie, wie beispielsweise das autonome Fahren oder die fortschreitende Elektrifizierung der Fahrzeuge, resultieren in einer ständig steigenden Anzahl an Funktionen bzw. einen immer größer werdenden Bedarf an Rechenleistung der elektronischen Steuereinheiten. Damit derartige Anwendungen realisiert werden können, führte die Entwicklung bei sicherheitskritischen, echtzeitfähigen eingebetteten Systemen zu Prozessoren mit mehreren Kernen (Multicore-Prozessoren). Dies reduziert einerseits die Komplexität des Netzwerks innerhalb des Fahrzeugs, jedoch werden aber sowohl die Komplexität der Hardware-Architektur für das Steuergerät als auch die Komplexität der Software-Architektur erhöht, aufgrund des zeitlichen Verhaltens des Systems, der gemeinsamen Ressourcennutzung, des gemeinsamen Speicherzugriffs, etc. Dadurch entstehen auch neue Anforderungen an die Tools des Enwticklungsprozesses von Multicore-Systemen. Um eine nahtlose Toolchain für diesen Entwicklungsprozess zu entwerfen, muss es schon zu einer frühen Phase der Funktionsentwicklung möglich sein, die benötigten Multicore-Eigenschaften des Systems zu modellieren, um diese nachher evaluieren zu können.en, um diese nachher evaluieren zu können.)
    • Multiwort-Bedeutungsaufösung für Anforderungen  + (Zur automatischen Erzeugung von RückverfolZur automatischen Erzeugung von Rückverfolgbarkeitsinformationen muss zunächst die Absicht der Anforderungen verstanden werden. Die Grundvoraussetzung hierfür bildet das Verständnis der Bedeutungen der Worte innerhalb von Anforderungen. Obwohl hierfür bereits klassische Systeme zur Wortbedeutungsauflösung existieren, arbeiten diese meist nur auf Wortebene und ignorieren sogenannte "Multiwort-Ausdrücke" (MWAs), deren Bedeutung sich von der Bedeutung der einzelnen Teilworte unterscheidet. Im Rahmen des INDIRECT-Projektes wird deshalb ein System entwickelt, welches die MWAs mithilfe eines einfach verketteten Zufallsfeldes erkennt und anschließend eine wissensbasierte Bedeutungsauflösung mit den Wissensbasen DBpedia und WordNet 3.1 durchführt. Um das System zu evaluieren wird ein Datensatz aus frei verfügbaren Anforderungen erstellt. Das Teilsystem für die Erkennung von MWAs erreicht dabei maximal einen F1-Wert von 0.81. Die Bedeutungsauflösung mit der Wissensbasis DBpedia erreicht maximal einen F1-Wert von 0.496. Mit der Wissensbasis WordNet 3.1 wird maximal ein F1-Wert von 0.547 erreicht.rd maximal ein F1-Wert von 0.547 erreicht.)
    • Test-Vortrag  + (erat, sed diam voluptua. At vero eos et acerat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet,ulla facilisi. Lorem ipsum dolor sit amet,)
    • Patrick Deubel  + (folgt)
    • Supporting a Knowledge Management System for Software Engineering Research with Large Language Models  + (tba)