Suche mittels Attribut

Diese Seite stellt eine einfache Suchoberfläche zum Finden von Objekten bereit, die ein Attribut mit einem bestimmten Datenwert enthalten. Andere verfügbare Suchoberflächen sind die Attributsuche sowie der Abfragengenerator.

Liste der Ergebnisse

Verbesserung von Code-Qualität mit Hilfe von neuronalen Netzen + (Diese Arbeit untersucht unterschiedliche A … Diese Arbeit untersucht unterschiedliche Ansätze zum Detektieren und Verbessern von Problemen im Code, um die Codequalität zu steigern. Die meisten verwandten Ansätze beschreiben die Vorverarbeitung des Codes, um eine passende Repräsentation mit geeignetem Vokabular zu erhalten, nur lückenhaft. Des Weiteren werden selten Gründe für bestimmte Vorverarbeitungsschritte angeführt. Zusätzlich bleibt es unklar, wie neuronale Netzarchitekturen mit verschiedenen Repräsentationen abschneiden. Diese Arbeit soll diese Wissenslücken schließen. Basierend auf den verschiedenen Codekomponenten, werden verschiedene Kategorien für die Modellierung des Vokabulars erstellt. Die Auswirkungen jedes Modellierungschrittes werden evaluiert. Des Weiteren werden verschiedene Coderepräsentationen darauf getestet, in wie weit neuronale Netze Fehler in diesen Code detektieren können. Die "Sate IV Juliet Test Suite" wird als Datensatz zur Evaluation verwendet da dieser gut gepflegt und deutlich beschriftet ist. Des Weiteren kann er auf viele verschiedene Arten angewandt und vorverarbeitet werden. Die neuronalen Netze werden auf ihre Fertigkeit zur binären und Mehrklassen-Klassifizierung getestet. Diese Art der Evaluierung konnte in keiner verwandten Arbeit festgestellt werden. Zusätzlich werden die verschiedenen AST und sequenziellen Code-Repräsentationen mit den jeweiligen neuronalen Netzarchitekturen evaluiert. Die unterschiedlichen Schritte zur Modellierung des Vokabulars, werden ebenfalls auf diese beiden Code-Repräsentationen angewendet.Abschließend wird eine geeignete Repräsentation, Netzarchitektur und Modellierung des Vokabulars empfohlen.und Modellierung des Vokabulars empfohlen.)
Verarbeitung natürlich-sprachlicher Beziehungsbeschreibungen zwischen Objekten + (Diese Arbeit verfolgt das Ziel, durch das … Diese Arbeit verfolgt das Ziel, durch das Erkennen von Beziehungen zwischen Klassen und Objekten, einen Benutzer des Systems in die Lage zu versetzen, diese für das Beantworten von Fragestellungen auszunutzen. Als Basis hierfür dient das Dialogsystem JustLingo,welches als Erweiterung von Excel konzipiert ist. Im Rahmen dieser Arbeit werden zwei Schritte durchgeführt. In einem ersten Schritt wird JustLingo dazu befähigt, Beschreibungen von Beziehungen zu interpretieren. Dadurch wird es möglich, Modelle zu erzeugen und mit diesen zu arbeiten. Der zweite Schritt ist dann, das Ermöglichen der Verarbeitung von Fragen, welche anhand der generierten Modelle gezielt beantwortet werden können. Neben diesen zwei Aspekten wird JustLingo in die Lage versetzt, Entwurfsmuster aus der Softwaretechnik bzw. ihre Struktur zu erkennen und innerhalb eines Modells nach diesen zu suchen. In einem letzten Schritt wird die erschaffene Erweiterung auf zwei Aspekte, dem der Erkennung und dem der Verwendung hin, evaluiert. Bei dem Erkennen von Beziehungen wurden bei 13 Teilnehmern und 15 Elementen (Klassen und Beziehungen) im Schnitt 94,9% korrekt in ein Modell eingefügt. Die 13 Teilnehmer konnten von 10 Fragen, wobei sie eine der Fragen selbst definieren konnten, durchschnittlich 86,8% beantworten.nnten, durchschnittlich 86,8% beantworten.)
Ein mehrmandantenfähiges natürlichsprachliches Dialogsystem für die Kundenbetreuung + (Diese Arbeit verfolgt das Ziel, ein natürl … Diese Arbeit verfolgt das Ziel, ein natürlichsprachliches Dialogsystem zu entwickeln, welches zur Bearbeitung von Anfragen aus dem Umfeld eines Rechenzentrums verwendet werden kann. Das System ermöglicht die gleichzeitige Verwendung durch mehrere Nutzer, ohne dass sich diese während der Nutzung gegenseitig beeinflussen. Des Weiteren stellt die Einführung eines Kommunikationskanals zwischen Endanwendern und Experten eine wesentliche Erweiterung des Dialogsystems dar. Sie soll es ihm ermöglichen, Anfragen, welche es nur unzureichend beantworten kann, an einen Experten weiterzuleiten. Diese Erweiterung verfolgt das Ziel, die Zahl der erfolgreich gelösten Fragestellungen zu maximieren. Diese Arbeit umfasst die folgenden Schritte: nach einem ersten Grobentwurf und der Überlegung eines möglichen Dialogablaufs kann das System aus den zwei Komponenten dem Backend, welches den Kern des Dialogsystems bildet und für die Verarbeitung von Eingaben sowie die Antwortgenerierung zuständig ist, und dem Frontend, welches die Interaktion mit dem System ermöglicht, implementiert werden. Zuletzt wird es anhand der Anzahl an korrekt beantworteten Anfragen evaluiert. Zusätzlich wird ein Vergleich mit den auf den gleichen Datenbasen agierenden Internetauftritten gezogen. Dazu wurden 25 Teilnehmer eingeladen, an der Evaluation teilzunehmen. Insgesamt konnte das Dialogsystem 135 der 150 Fragen automatisiert beantworten. Weitere 13 Anfragen konnten gelöst werden, indem die Mehrmandantenfähigkeit des Systems ausgenutzt wurde und Experten zurate gezogen wurden. wurde und Experten zurate gezogen wurden.)
Untersuchung von Black Box Modellen zur Entscheidungsfindung in Sentiment Analysen + (Diese Arbeit wird sich mit der Erklärbarke … Diese Arbeit wird sich mit der Erklärbarkeit von Sentimentanalyse befassen. Sentimentanalyse ist ein aktuelles Forschungsthema, das sich mit der automatisierten Auswertung der Stimmung von Texten befasst. Dabei klassifiziert ein Entscheider diese als positiv oder negativ. Jedoch sind die meisten hier angewandten Verfahren des maschinellen Lernens Black Boxes, also für Menschen nicht unmittelbar nachvollziehbar. Die Arbeit hat zum Ziel, verschiedene Verfahren der Sentimentanalyse auf Erklärbarkeit zu untersuchen. Hierbei werden eine Datenbank aus Filmrezensionen sowie Word Embeddings auf Basis des word2vec-Modells verwendet. auf Basis des word2vec-Modells verwendet.)
Towards Differential Privacy for Correlated Time Series + (Differential privacy is the current standa … Differential privacy is the current standard framework in privacy-preserving data analysis. However, it presumes that data values are not correlated. Specifically, adversaries that are aware of data correlations can use this information to infer user’s sensitive information from differential private statistics. However, data correlations are frequent. In particular, values of time series like energy consumption measurements are frequently highly temporally correlated. In this thesis, we first introduce and critically review the notation of dependent differential privacy (DDP) introduced by Liu at. al (2016), which is a differential-privacy like privacy definition for spatially correlated data. Second, we adapt this notation and the respective privacy mechanisms to temporally correlated data. We evaluate our adaption on a real-world energy consumption time series showing that our mechanism outperforms the baseline approach. We conclude this work by stating in which direction the improvements of the mechanism might be done.provements of the mechanism might be done.)
Architecture Extraction for Message-Based Systems from Dynamic Analysis + (Distributed message-based microservice sys … Distributed message-based microservice systems architecture has seen considerable evolution in recent years, making them easier to extend, reuse and manage. But, the challenge lies in the fact that such software systems are constituted of components that are more and more autonomous, distributed, and are deployed with different technologies. On the one hand such systems through their flexible architecture provide a lot of advantages. On the other hand, they are more likely to be changed fast and thus make their architecture less reliable and up-to-date. Architecture reconstruction method can support to obtain the updated architecture at different phases of development life cycle for software systems. However, the existing architecture reconstruction methods do not support the extraction for message-based microservice systems. In our work we try to handle this problem by extending an existing approach of architecture model extraction of message-based microservice systems from their tracing data (source code instrumented) in a way that such systems can be supported. Through our approach, we provide a way to automatically extract performance models for message-based microservice systems through dynamic analysis. We then evaluate our approach with the comparison of extracted model with the manual model with statistical metrics such as precision, recall and F1-score in order to find out the accuracy of our extracted model.d out the accuracy of our extracted model.)
Verknüpfung von Text- und Modellentitäten von Softwarearchitektur-Modellen mithilfe von Wortvektoren + (Dokumentation von Softwarearchitekturen is … Dokumentation von Softwarearchitekturen ist wichtig für die Qualität und Langlebigkeit von Software. Im Verlauf des Lebenszyklus einer Software ändert sich die Architektur meist, was eine Quelle für Inkonsistenzen gegenüber die Architektur beschreibenden Dokumentationstexten sein kann.Um diese automatisiert finden und beheben oder ihnen bestenfalls sogar vorbeugen zu können, bedarf es der Verknüpfung von Text- und Modellentitäten.Dieses Problem wurde in der vorzustellenden Arbeit angegangen. Dabei wurden Wortvektoren verwendet, um Ähnlichkeiten zwischen Wörtern finden zu können.hkeiten zwischen Wörtern finden zu können.)
Erweiterung einer domänenspezifischen Sprache der Simulationskopplung um die Domäne der Cloud Simulationen + (Domänenspezifische Sprachen für Software s … Domänenspezifische Sprachen für Software sind individuell an den jeweiligen Zweck angepasst.Die Sprache Modular Simulation Language modelliert die Kopplung zwischenverschiedenen Simulationen. Die Kopplung von Simulationen dient dem besseren Verständnisund der Austauschbarkeit. In dieser Arbeit wird geprüft, ob die Sprache allenötigen Modelle erhält um universell Kommunikation, Struktur und Kopplung von beliebigenSimulationen darstellen zu können. Dazu wurde die Cloud Simulation Cloud SimPlus ausgewählt und die Kommunikation von zwei Features dieser Simulation modelliert.Während der Modellierung wurden fehlende Elemente der Sprache identifiziert undLösungsvorschläge integriert oder diskutiert. Das Ergebnis zeigt, dass sie Modular SimulationLanguage zum aktuellen Zeitpunkt nicht vollständig ist, das Thema aber weiteruntersucht werden muss. Thema aber weiter untersucht werden muss.)
Feature-based Time Series Generation + (Due to privacy concerns and possible high … Due to privacy concerns and possible high collection costs of real time series data, access to high quality datasets is difficult to achieve for machine learning practitioners. The generation of synthetic time series data enables the study of model robustness against edge cases and special conditions not found in the original data. A requirement to achieve such results in applications when relying on synthetic data is the availability of fine-grained control over the generation to be able to meet the specific needs of the user. Classical approaches relying on autoregressive Models e.g. ARIMA only provide a basic control over composites like trend, cycles, season and error. A promising current approach is to train LSTM Autoencoders or GANs on a sample dataset and learn an unsupervised set of features which in turn can be used and manipulated to generate new data. The application of this approach is limited, due to the not human interpretable features and therefore limited control. We propose various methods to combine handcrafted and unsupervised features to provide the user with enhanced influence of various aspects of the time series data. To evaluate the performance of our work we collected a range of various metrics which were proposed to work well on synthetic data. We will compare these metrics and apply them to different datasets to showcase if we can achieve comparable or improved results.an achieve comparable or improved results.)
Reducing Measurements of Voltage Sensitivity via Uncertainty-Aware Predictions + (Due to the energy transition towards weath … Due to the energy transition towards weather-dependent electricity sources like wind and solar energy, as well as new notable loads like electric vehicle charging, the voltage quality of the electrical grid suffers. So-called Smart Transformers (ST) can use Voltage Sensitivity (VS) information to control voltage, frequency, and phase in order to enhance the voltage quality. Acquiring this VS information is currently costly, since you have to synthetically create an output variability in the grid, disturbing the grid even further. In this thesis, I propose a method based on Kalman Filters and Neural Networks to predict the VS, while giving a confidence interval of my prediction at any given time. The data for my prediction derives from a grid simulation provided by Dr. De Carne from the research center Energy Lab 2.0.e from the research center Energy Lab 2.0.)
Representing dynamic context information in data-flow based privacy analysis + (Durch Industrie 4.0 sind Organisationen in … Durch Industrie 4.0 sind Organisationen in der Lage kleinere Produktionseinheiten, oder individualisierte Produkte kosteneffizienter herzustellen. Dies wird erreicht durch selbstorganisierende Produktions- und Lieferketten, bei der die im Prozess beteiligten Menschen, Maschinen und Organisationen ad-hoc zusammenarbeiten. Um den, durch diese ad-hoc Zusammenarbeit entstehenden, Datenfluss, kontrollieren zu können, reichen aktuelle Zugriffskontrollsysteme nicht aus. Im Zuge dieser Bachelorarbeit wird ein Metamodell vorgestellt, welches in der Lage ist die sich dynamisch ändernden Kontextinformationen von den im Prozess beteiligten Entitäten abzubilden und zur Zugriffskontrolle zu nutzen. Dabei werden Kontexte zum Darstellen von einzelnen Eigenschaften und als Menge zum Definieren eines Zustands in dem sich eine Entität befinden muss um auf ein Datum zugreifen zu dürfen. Des Weiteren wird eine Analyse beschrieben und evaluiert, welche in der Lage ist verbotene Datenzugriffe in einem modelliertem Systemzustand und Datenfluss zu identifizieren.zustand und Datenfluss zu identifizieren.)
Angreifer-Modellierung für Intelligente Stromnetze + (Durch den Umstieg auf erneuerbare Energien … Durch den Umstieg auf erneuerbare Energien und die damit einhergehende Dezentralisierung sowie die immer weiter fortschreitende Digitalisierung des Stromnetzes ergeben sich neue Herausforderungen für den Betrieb eines Stromnetzes. Eine dieser Herausforderungen sind die deutlich erweiterten Angriffsmöglichkeiten, die sich durch den verstärkten Einsatz von Intelligenten Stromzählern und Geräten des Internet der Dinge und deren maßgeblichem Beitrag zur Stromverteilung ergeben. Um diese Angriffsmöglichkeiten in Analysen abbilden zu können, wird in dieser Bachelorarbeit eine Erweiterung der bestehenden Analyse von Angriffen auf Intelligente Stromnetze aus dem Smart Grid Resilience Framework vorgenommen. Zu diesem Zweck erfolgt eine Transformation des bestehenden Modells in eine Netzwerktopologie, auf welcher dann eine Angreiferanalyse ausgeführt wird. Die Evaluation dieser Angreiferanalyse erfolgt dabei anhand der bereits bestehenden Angreiferanalyse des Smart Grid Resilience Frameworks. Weiterhin wird die Genauigkeit der Transformation sowie die Skalierbarkeit von Transformation und Angreiferanalyse evaluiert.sformation und Angreiferanalyse evaluiert.)
Modeling Quality-Tradeoffs for the Optimization of Li-Ion Storage Systems + (Durch den gezielten Einsatz von Energiespe … Durch den gezielten Einsatz von Energiespeichern, wie bspw. Lithium-Ionen-Batterien, können Spitzenlasten in Verbrauchsprofilen und damit unter anderem auch für Großverbraucher von den Spitzenlasten abhängige Energiekosten reduziert werden. Die Planung solcher Energiespeicher wird in der Regel mit Hilfe historischer Daten durchgeführt. Im Zuge dieser Arbeit wurde der Einfluss von Störungen (z.B. durch Sampling) in derartigen Daten auf Peak-Shaving-Ansätze am Beispiel einer Produktionsanlage am KIT Campus Nord genauer betrachtet. Mit den gewonnen Informationen wurden verschiedene Prädiktionsmodelle erzeugt, welche die Abweichung der Ergebnisse auf gestörten Zeitreihen gegenüber ungestörten Zeitreihen vorhersagen. Es konnte festgestellt werden, dass durch die Kombination aus den Ergebnissen und den Vorhersage der Modelle in den meisten Fällen eine Verbesserung der absoluten Abweichung erzielt werden kann. absoluten Abweichung erzielt werden kann.)
Implementierung eines Authority-Mechanismus für UI-Elemente auf Basis von Eclipse E4 + (Durch die Verwendung von Software-Anwendun … Durch die Verwendung von Software-Anwendungen in verschiedenen Kontexten entsteht eine überladene Benutzeroberfläche. Zur Unterteilung der Benutzeroberfläche wird ein Authority-Mechanismus verwendet. Die bisherigen Konzepte für einen Authority-Mechanismus sind entweder nicht in Eclipse 4 RCP umsetzbar oder erfüllen nicht die Anforderungen, die von der Industrie an einen Authority-Mechanismus gestellt werden. Diese Lücke wird mit einem Konzept für einen dynamischen Authority-Mechanismus geschlossen. Durch eine Implementierung des Authority-Mechanismus in einer bestehenden Software-Anwendungen aus der Industrie wird die Kompatibilität des Konzeptes mit den Anforderungen bestätigt.Konzeptes mit den Anforderungen bestätigt.)
Konzepte für kontrollfluss- und modellbasierte Sicherheitsanalyse eines Industrie-4.0-Systems + (Durch die voranschreitende Vernetzung und … Durch die voranschreitende Vernetzung und Digitalisierung vergrößert sich die Angriffsfläche von Industrieanlagen drastisch. Umso wichtiger wird es beim Entwurf von Industrie 4.0 Anlagen, so früh wie möglich eine Betrachtung der Sicherheitsaspekte vorzunehmen. Die Durchführung von Sicherheitsanalysen zur Entwurfszeit sind jedoch Aufwändig und müssen immer händisch von einem Sicherheitsexperten durchgeführt werden. Obwohl bereits Lösungsansätze zur modellbasierten Unterstützung von Sicherheitsanalysen existieren, sind diese nicht auf den Kontext der Industrie 4.0 zugeschnitten. In dieser Bachelorarbeit werden zwei Konzepte für eine modellbasierte Unterstützung von Sicherheitsanalysen im frühen Entwurf von Industrie 4.0 Anlagen vorgestellt. Dabei werden die Sicherheitsanforderungen von Datenflüssen über die gesamte Anlage hinweg betrachtet und weiterhin eine kontextbasierte Sicherheitsanalyse zur Unterstützung vorgeschlagen.tsanalyse zur Unterstützung vorgeschlagen.)
Modeling and Simulation of Message-Driven Self-Adaptive Systems + (Dynamic systems that reconfigure themselve … Dynamic systems that reconfigure themselves use message queues as a common method to achieve decoupling between senders and receivers. Predicting the quality of systems at design time is crucial as changes in later phases of development get way more costly. At the moment, there is no method to represent message queues on an architectural level and predict their quality impact on systems. This work proposes a meta-model for enabling such representation and a simulation interface between a simulation of a component-based architecture description language and a messaging simulation. The interface is implemented for the Palladio simulator SimuLizar and an AMQP simulation. This enables architectural representation of messaging and predicting quality attributes of message-driven self-adaptive systems. The evaluation with a case study shows the applicability of the approach and its prediction accuracy for Point-To-Point communication.accuracy for Point-To-Point communication.)
Extrahieren von Code-Änderungen aus einem Commit für kontinuierliche Integration von Leistungsmodellen + (Ein Leistungsmodell ermöglicht den Softwar … Ein Leistungsmodell ermöglicht den Software-Entwicklern eine frühzeitige Analyse von programmierten Komponenten in Bezug auf Leistungseigenschaften. Um Inkonsistenzen zu vermeiden, soll das Leistungsmodell angepasst werden, sobald Entwickler den Quellcode ändern. Eine Aktualisierung von Leistungsmodellen ist kein triviales Problem. Der Ansatz "kontinuierliche Integration von Leistungsmodellen" (Abkürzung: KILM) führt eine automatische inkrementelle Aktualisierung von Leistungsmodellen durch und bietet somit eine Lösung des Problems an. Ein wichtiger Vorteil von diesem Ansatz ist, dass das Leistungsmodell weder manuell angepasst (aufwändig und fehleranfällig), noch nach jeder Änderung neu aufgebaut (ineffizient und aufwändig) werden muss.In dieser Bachelorarbeit wurde der erste Schritt für die vorgestellte Lösung implementiert: der KILM-Ansatz wird mit Git-Repository verknüpft, Änderungen werden aus Commits extrahiert und auf Code- und Leistungsmodelle angewandt. Die Implementierung wurde in einer Fallstudie evaluiert. Auf einem Projekt wurden unterschiedliche Arten von Änderungen simuliert und die Korrektheit von den aktualisierten Code- und Leistungsmodellen überprüft. Die Ergebnisse bestätigen korrekte Aktualisierung von Code- und Leistungsmodellen in den 96,6 % der durchgeführten Tests.en in den 96,6 % der durchgeführten Tests.)
Aufbau und Konsolidierung einer Konzepthierarchie für Anforderungsbeschreibungen aus unterschiedlichen Wissensquellen + (Ein Problem bei der Anforderungsrückverfol … Ein Problem bei der Anforderungsrückverfolgung ist, dass eine syntaktische Verbindung zwischen Begriffen in Anforderungen und Quelltext oftmals fehlt. Eine Möglichkeit Verknüpfungen dennoch korrekt herzustellen ist die Einbeziehung von Hintergrundwissen, um ein explizites Verständnis der verwendeten Begriffe zu erlangen. Eine in der Computerlinguistik bekannte Quelle für solches Hintergrundwissen über semantische Zusammenhänge ist WordNet. Um jedoch besonders für technische Begriffe eine möglichst vollständige Abdeckung zu erreichen, reicht WordNet alleine als Wissensquelle nicht aus. In dieser Arbeit wird daher ein Ansatz entwickelt, um eine konsolidierte Konzepthierarchie aus mehreren beliebigen Wissensquellen aufzubauen.eren beliebigen Wissensquellen aufzubauen.)
Entwicklung und Analyse von Auto-Encodern für intelligente Agenten zum Erlernen von Atari-Spielen + (Ein neuartiger Ansatz zum Erlernen von Com … Ein neuartiger Ansatz zum Erlernen von Computerspielen ist die Verwendung von neuronalen Netzen mit Gedächtnis (speziell CTRNNs). Die großen Datenmengen in Form roher Pixel-Daten erschweren jedoch das Training. Auto-Encoder können die diese Pixel-Daten der Spielframes soweit komprimieren, dass sie für solche Netze verfügbar werden.Das Ziel dieser Arbeit ist es eine Auto-Encoder-Architektur zu finden, welche Atari-Frames soweit komprimiert, sodasseine möglichst verlustfreie Rekonstruktion möglich ist.Atari-Spiele können so für CTRNNs zugänglich gemacht werden.Dafür wurden zwei unterschiedliche Atari-Spiele ausgewählt, große Datensätze mit geeigneten Spielframes generiertund verschiedene Auto Encoder Architekturen evaluiert.Im Rahmen dieser Arbeit konnte gezeigt werden, dass eine ausreichende Kompression mit akzeptierbaren Qualitätsverlustmöglich ist.zeptierbaren Qualitätsverlust möglich ist.)
Entwurf und Aufbau einer semantischen Repräsentation von Quelltext + (Eine Herausforderung bei der Rückverfolgun … Eine Herausforderung bei der Rückverfolgung von Quelltext zu Anforderungen stellt die Analyse von Quelltext dar. Informationen über semantische Zusammenhänge zwischen Programmelementen sind darin nicht explizit dokumentiert, sondern müssen aus vorhandenen Informationen wie der natürlichen Sprache oder den strukturellen Abhängigkeiten abgeleitet werden. Im Rahmen des Forschungsprojekts INDIRECT wird eine semantische Repräsentation von Quelltext entworfen und umgesetzt, um die darin enthaltenen Informationen bei der Rückverfolgung von Anforderungen nutzen zu können. Die Repräsentation umfasst sowohl syntaktische Informationen als auch semantische Zusammenhänge im Quelltext. Für die Identifikation von semantischen Zusammenhängen wird eine Analyse der Syntax und eine Analyse der lexikalischen Bestandteile im Quelltext durchgeführt. Abschließend erfolgt eine Clusteranalyse auf Basis der gefundenen Zusammenhänge, um Gruppen aus semantisch zusammenhängenden Programmelementen zu identifizieren. Bei der Evaluation wurde eine Abdeckung der gefundenen Programmelementgruppen von bis zu 0,91 erzielt. Die Präzision der gefundenen Cluster betrug bis zu 0,9. Das harmonische Mittel aus der Cluster-Abdeckung und der Cluster-Präzision erreichte einen maximalen Wert von 0,73.n erreichte einen maximalen Wert von 0,73.)
Verfeinerung des Angreifermodells und Fähigkeiten in einer Angriffspfadgenerierung + (Eine Möglichkeit zur Wahrung der Vertrauli … Eine Möglichkeit zur Wahrung der Vertraulichkeit in der Software-Entwicklung ist die frühzeitige Erkennung von potentiellen Schwachstellen und einer darauf folgenden Eindämmung von möglichen Angriffspfaden. Durch Analysen anhand von Software-Architektur Modellen können frühzeitig Angriffspunkte gefunden und bereits vor der Implementierung behoben werden. Dadurch verbessert sich nicht nur die Wahrung von Vertraulichkeit, sondern erhöht auch die Qualität der Software und verhindert kostenintensive Nachbesserungen in späteren Phasen. Im Rahmen dieser Arbeit wird eine Erweiterung hinsichtlich der Vertraulichkeit des Palladio-Komponenten-Modells (PCM) Angreifermodell verfeinert, welches den Umgang mit zusammengesetzten Komponenten ermöglicht, Randfälle der attributbasierten Zugriffskontrolle (ABAC) betrachtet und die Modellierung und Analyse weiterer Aspekte der Mitigation erlaubt. Die Evaluation erfolgte mithilfe einer dafür angepassten Fallstudie, welche eine mobile Anwendung zum Buchen von Flügen modelliert. Das Ergebnis der Evaluation ergab ein zufriedenstellendes F1-Maß.tion ergab ein zufriedenstellendes F1-Maß.)
Impact of Aggregation Methods on Clustering of High-Resolution Energy Data + (Energy data can be used to gain insights i … Energy data can be used to gain insights into production processes. In the industrial domain, sensors have high sampling rates, resulting in large time series. Therefore, aggregation techniques are used to reduce computation times and memory requirements of data mining techniques like clustering. However, it is unclear what effects the aggregation has on clustering results and how these effects could be described. In our work, we propose measures to analyse the impact of aggregation on clustering and evaluate them experimentally. In particular, we aggregate with standard summary statistics and assess the impact using clustering structure measures, internal validity indices, external validity indices and instance-based forecasting. We adapt these evaluation measures and other data mining techniques to our use case. Furthermore, we propose a decision framework which allows to choose an aggregation level and other experimental settings, considering the trade-off between clustering quality and computational cost. Our extensive experiments comprise the cross-product of 6 physical attributes, 7 clustering algorithms, 7 aggregation techniques, 9 aggregation levels and 13 time series dissimilarities. We use real-world data from different machines and sensors of a production site at the KIT Campus North, extracting time series of fixed and variable length. Overall, we find that clustering results become less similar the more the data is aggregated. However, the exact effect and value of evaluation measures depends on the type of aggregate, clusteringalgorithm, dataset and dissimilarity measure.orithm, dataset and dissimilarity measure.)
Entwicklung einer Methode zum Vergleich mehrsprachiger und zeitabhängiger Textkorpora am Beispiel des Google Books Ngram Datensatzes + (Entwicklung einer Methode zum Vergleich mehrsprachiger und zeitabhängiger Textkorpora am Beispiel des Google Books Ngram Datensatzes.)
Bewertung des lokalen und globalen Effekts von Lastverschiebungen von Haushalten + (Erneuerbare Energien wie Photovoltaik-Anla … Erneuerbare Energien wie Photovoltaik-Anlagen stellen für den Privathaushalt eine Möglichkeit dar, eigenen Strom zu produzieren und damit den Geldbeutel sowie die Umwelt zu schonen. Auch in größeren Wohnblocks mit vielen Partien kommen solche Anlagen gemeinschaftlich genutzt zum Einsatz. Der Wunsch, die Nutzung zu optimieren, verleitet dazu, Demand Side Management Strategien zu verwenden. Speziell werden dabei Lastverschiebungen von einzelnen Haushaltsgeräten betrachtet, um die Sonnenenergie besser zu nutzen. Diese Arbeit bewertet verschiedene solcher Lastverschiebungen und ihre lokalen und globalen Effekte auf die Haushalte. Dazu werden verschiedene Modelle für variable Strompreisberechnung, Haushaltssimulation und Umsetzung von Lastverschiebung entworfen und in einem eigens geschriebenen Simulator zur Anwendung gebracht. Ziel dabei ist es, durch verschiedene Experimente, die Auswirkungen auf die Haushalte in ausgewählten Bewertungsmetriken zu erfassen. Es stellt sich heraus, dass es mäßige Sparmöglichkeiten für private Photovoltaik-Nutzer durch Lastverschiebung gibt, die Optimierung jedoch sowohl im lokalen als auch um globalen Bereich aber ein spezifisches Problem darstellt.h aber ein spezifisches Problem darstellt.)
Umsetzung einer architekturellen Informationsflussanalyse auf Basis des Palladio-Komponentenmodells + (Es ist essentiell, dass Softwaresysteme di … Es ist essentiell, dass Softwaresysteme die Vertraulichkeit von Informationen gewährleisten. Das Palladio Component Model (PCM) bietet bereits Werkzeuge zur Beschreibung von Softwarearchitekturen mit dem Ziel der Vorhersage von Qualitätseigenschaften. Es bietet allerdings keine unmittelbare Unterstützung zur Untersuchung der Vertraulichkeit von Dienstbeschreibungen auf Architekturebene. In dieser Arbeit wird eine Analysetechnik zur Überprüfung einer im PCM modellierten Architektur auf Vertraulichkeitseigenschaften entwickelt. Diese Analyse beruht auf der Untersuchung der im PCM erstellen Dienstbeschreibungen. In der Konzeption wird eine vorhandene Analysetechnik als Grundlage herangezogen und für die Verwendung mit dem PCM adaptiert. Dabei wird die Fragestellung nach der Vertraulichkeit durch Modelltransformation auf eine durch Model Checking überprüfbare Eigenschaft reduziert. Die Genauigkeit und Performance des Ansatzes werden anhand einer Fallstudie evaluiert. Durch die entwickelte Analysetechnik wird es Softwarearchitekten ermöglicht, frühzeitig auf Architekturebene eine Vertraulichkeitsanalyse auf komponentenbasierten Modellen durchzuführen.mponentenbasierten Modellen durchzuführen.)
Semi-automatische Generierung von Aktiven Ontologien aus Datenbankschemata + (Es wird prognostiziert, dass in Zukunft di … Es wird prognostiziert, dass in Zukunft die Hälfte der Firmenausgaben für mobile Anwendungen in die Entwicklung von Chatbots oder intelligenten Assistenten fließt.In diesem Bereich benötigt es zur Zeit viel manuelle Arbeit zur Modellierung von Beispielfragen.Diese Beispielfragen werden benötigt, um natürlichsprachliche Anfragen zu verstehen und in Datenbankanfragen umsetzen zu können.Im Rahmen dieser Arbeit wird ein Ansatz vorgestellt, welcher die manuelle Arbeit reduziert.Dazu wird mittels der Daten aus der Datenbank und Formulierungen, inklusive Synonymen, aus Dialogflow (ein intelligenter Assistent von Google) eine Aktive Ontologie erzeugt.Diese Ontologie verarbeitet anschließend die natürlichsprachlichen Anfragen und extrahiert die Parameter, welche für die Anfrage an die Datenbank benötigt werden.Die Ergebnisse der Aktiven Ontologie werden mit den Ergebnissen aus Dialogflow verglichen.Bei der Evaluation fällt auf, dass die Aktiven Ontologien fehleranfällig sind.Es werden zusätzliche, unerwünschte Parameter extrahiert, welche das Ergebnis verschlechtern.Die Übereinstimmungsrate bei einem Eins-zu-Eins-Vergleich mit Dialogflow liegt bei etwa 40%.Zukünftig könnte durch das Hinzufügen einer zusätzlichen selektiven Schicht innerhalb der Aktiven Ontologien die Parameterextraktion verbessert werden.die Parameterextraktion verbessert werden.)
KAMP for Build Avoidance on Generation of Documentation + (Especially in large software systems there … Especially in large software systems there are cases where only a subset of the dependents of a component needs to be built in order to produce sound build results for a certain change scenario; in the context of this work, this is called a build shortcut. The utilization of build shortcuts shortens build times, as the rebuilding of unafected parts is avoided. This thesis is concerned with the question of how the benefts of build shortcuts can be made accessible to a whole team of developers where not every member is a build expert. Our approach is to model the change specifc dependencies in a Palladio Component Model and determine the components to be built for a given change with the change propagation algorithm of the KAMP approach, posing as an example to integrate it into an agile development process.rate it into an agile development process.)
On the Converge of Monte Carlo Dependency Estimators + (Estimating dependency is essential for dat … Estimating dependency is essential for data analysis. For example in biological analysis, knowing the correlation between groups of proteins and genes may help predict genes functions, which makes cure discovery easier.The recently introduced Monte Carlo Dependency Estimation (MCDE) framework defines the dependency between a set of variables as the expected value of a stochastic process performed on them. In practice, this expected value is approximated with an estimator which iteratively performs a set of Monte Carlo simulations. In this thesis, we propose several alternative estimators to approximate this expected value. They function in a more dynamic way and also leverage information from previous approximation iterations. Using both probability theory and experiments, we show that our new estimators converge much faster than the original one.onverge much faster than the original one.)
Quantitative Evaluation of the Expected Antagonism of Explainability and Privacy + (Explainable artificial intelligence (XAI) … Explainable artificial intelligence (XAI) offers a reasoning behind a model's behavior.For many explainers this proposed reasoning gives us more information about the inner workings of the model or even about the training data. Since data privacy is becoming an important issue the question arises whether explainers can leak private data.It is unclear what private data can be obtained from different kinds of explanation.In this thesis I adapt three privacy attacks in machine learning to the field of XAI: model extraction, membership inference and training data extraction. The different kinds of explainers are sorted into these categories argumentatively and I present specific use cases how an attacker can obtain private data from an explanation. I demonstrate membership inference and training data extraction for two specific explainers in experiments. Thus, privacy can be breached with the help of explainers.n be breached with the help of explainers.)
Meta-Modeling the Feature Space + (Feature Selection is an important process … Feature Selection is an important process in Machine Learning to improve model training times and complexity. One state-of-the art approach is Wrapper Feature Selection where subsets of features are evaluated. Because we can not evaluate all 2^n subsets an appropriate search strategy is vital.Bayesian Optimization has already been successfully used in the context of hyperparameter optimization and very specific Feature Selection contexts. We want to look on how to use Bayesian Optimization for Feature Selection and discuss its limitations and possible solutions.ss its limitations and possible solutions.)
Meta-Learning Feature Importance + (Feature Selection ist ein Prozess, der red … Feature Selection ist ein Prozess, der redundante Features aus Datensätzen entfernt. Das resultiert in kürzeren Trainingszeiten und verbessert die Performance von Machine Learning Modellen, weswegen Feature Selection ein wichtiger Bestandteil von Machine Learning Pipelines ist. Die Berechnung der Feature Importance ist jedoch häufig sehr aufwendig und erfordert das Training von Modellen.Ziel dieser Arbeit ist es, einen Meta-Learning Ansatz zu entwickeln, um die Wichtigkeit verschiedener Features für ein Klassifikationsproblem vorherzusagen, ohne vorher ein Modell auf den Daten trainiert zu haben.Meta-Learning ist ein Bereich des maschinellen Lernens, das sich mit der Vorhersage der Performance von verschiedenen Machine Learning Modellen beschäftigt. Für Vorhersagen dieser Art wird ein Meta-Datensatz benötigt, dessen Einträge individuelle Datensätze repräsentieren, die von Meta-Features charakterisiert werden. Die Zielvariablen eines Meta-Datensatzes sind häufig die Performance-Werte verschiedener Klassifikationsmodelle auf den jeweiligen Datensätzen.Im Rahmen dieser Arbeit sollen Meta-Features erarbeitet und implementiert werden, die nicht nur ganze Datensätze, sondern individuelle Features eines Datensatzes charakterisieren. Als Zielvariablen werden Feature Importance Werte verschiedener Verfahren eingesetzt.Erste Ergebnisse zeigen, dass eine positive Korrelation zwischen tatsächlichen und vorhergesagten Feature Importance Werten besteht.esagten Feature Importance Werten besteht.)
Meta-Learning for Feature Importance + (Feature selection is essential to the fiel … Feature selection is essential to the field of machine learning, since its application results in an enhancement in training time as well as prediction error of machine learning models. The main problem of feature selection algorithms is their reliance on feature importance estimation, which requires the training of models and is therefore expensive computationally. To overcome this issue, we propose MetaLFI, a meta-learning system that predicts feature importance for classification tasks prior to model training: We design and implement MetaLFI by interpreting feature importance estimation as a regression task, where meta-models are trained on meta-data sets to predict feature importance for unseen classification tasks. MetaLFI calculates a meta-data set by characterizing base features using meta-features and quantifying their respective importance using model-agnostic feature importance measures as meta-targets. We evaluate our approach using 28 real-world data sets in order to answer essential research questions concerning the effectiveness of proposed meta-features and the predictability of meta-targets. Additionally, we compare feature rankings put out by MetaLFI to other feature ranking methods, by using them as feature selection methods. Based on our evaluation results, we conclude that the prediction of feature importance is a computationally cheap alternative for model-agnostic feature importance measures.odel-agnostic feature importance measures.)
Linking Architectural Analyses Based on Attacker Models + (Fehler in einer Software können unter Umst … Fehler in einer Software können unter Umständen nicht behoben werden, da die Fehlerursache in der Architektur der Software liegt. Um diesen Fall vorzubeugen, gibt es verschiedenste Ansätze diese Fehler frühzeitig zu erkennen und zu eliminieren. Ein Ansatz sind Sicherheitsanalysen auf Architekturebene. Diese spezifizieren den Aspekt der Sicherheit auf unterschiedliche Weise und können dadurch verschiedene Erkenntnisse über die Sicherheit des Systems erhalten. Dabei wäre es praktischer, wenn die Erkenntnisse der Sicherheitsanalysen kombiniert werden können, um ein aussagekräftigeres Ergebnis zu erzielen. In dieser Arbeit wird ein Ansatz zum Kombinieren von zwei Architektur Sicherheitsanalysen vorgestellt. Die erste Analyse erkennt physische Schwachstellen durch einen Angreifer im System. Die zweite Analyse erkennt mögliche Ausbreitungsschritte eines Angreifers im System. Die Analysen werden kombiniert, indem die Ergebnisse der ersten Analyse zum Erstellen der Eingabemodelle für die zweite Analyse genutzt werden. Dafür wird ein Ausgabemetamodell erstellt und ein Parser implementiert, welcher die Ergebnisse der ersten Analyse in eine Instanz des Ausgabemetamodells übersetzt. Daraus werden die benötigten Informationen für die zweite Analyse extrahiert. Die Machbarkeit und der Mehrwert des Ansatzes wird in einer Fallstudie evaluiert. Diese ergab, dass die Übertragung machbar ist und aussagekräftigere Ergebnisse erzielt werden konnten.ftigere Ergebnisse erzielt werden konnten.)
Entwurfszeitanalyse der Fehlerpropagation in komponentenbasierten selbst-adaptiven Software-Systemen + (Fehlerzustände in Software oder Hardware f … Fehlerzustände in Software oder Hardware führen zu Abweichungen bezüglich der bereitgestellten Daten und der Verarbeitungszeit oder direkt zu einem kompletten Ausfall eines Service an einer Software-Komponente. Diese Abweichung von dem korrekten Service führt wiederum dazu, dass andere Komponenten, die diesen inkorrekten Service in Anspruch nehmen, ihren Service ebenfalls nicht korrekt bereitstellen können. Der entstandene Fehler propagiert durch das System, kombiniert sich mit anderen Fehlern, transformiert zu anderen Fehlerarten und hat letztendlich mehr oder weniger schwere Auswirkungen auf den System-Kontext, falls die Propagation nicht durch geeignete Maßnahmen unterbunden wird. Besonders bei sicherheitskritischen Systemen ist es deshalb notwendig die Auswirkungen der möglichen Fehler im System zu analysieren.Die in der Arbeit entwickelte Erweiterung des Palladio-Konzeptes ermöglicht es, diese Analyse schon zur Entwurfszeit anhand eines Modells durchzuführen. Mithilfe der Erweiterung kann analysiert werden, wie oft und in welchem Verhältnis ein Fehler aufgetreten ist, welche Fehlervorkommen miteinander korrelieren und wie schwerwiegend die Auswirkungen der aufgetretenen Fehler für den Systemkontext waren. Neben der Analyse der Fehlerpropagation ermöglicht die Erweiterung die Modellierung von Systemen, die auf das Vorkommen eines Fehlers im Sinne einer Rekonfiguration reagieren. Das Konzept wurde anhand eines sicherheitskritischen Systems aus der Domäne der autonomen Fahrzeuge validiert. Domäne der autonomen Fahrzeuge validiert.)
Towards More Effective Climate Similarity Measures + (Finding dependencies over large distances … Finding dependencies over large distances — known as teleconnections — is an important task in climate science. To find such teleconnections climate scientists usually use Pearson’s Correlation, but often ignore other available similarity measures, mostly because they are not easily comparable: their values usually have different, sometimes even inverted, ranges and distributions. This makes it difficult to interpret their results. We hypothesize that providing the climate scientists with comparable similarity measures would help them find yet uncaptured dependencies in climate. To achieve this we propose a modular framework to present, compare and combine different similarity measures for time series in the climate-related context. We test our framework on a dataset containing the horizontal component of the wind in order to find dependencies to the region around the equator and validate the results qualitatively with climate scientists.lts qualitatively with climate scientists.)
Performance-Modellierung des Mechanik-Lösermoduls in der Multi-Physik-Anwendung Pace3D + (Für Nutzende des Mechanik-Lösermoduls von … Für Nutzende des Mechanik-Lösermoduls von Pace3D ist es schwierig vorherzusagen, wie sich unterschiedliche Konfigurationen auf die Rechenzeit auswirken. Um das Verständnis dafür zu schaffen, welcher Einfluss von verschiedenen Konfigurationsoptionen auf die Laufzeit ausgeht, wird eine Performance-Modellierung des Mechanik-Lösermoduls von Pace3D durchgeführt. Das gewählte Verfahren zur Performance-Modellierung unterstützt bisher nur die Berücksichtigung numerischer Konfigurationsoptionen. Das Verfahren wird deshalb erweitert, sodass sich auch binäre Konfigurationsoptionen berücksichtigen lassen. Zur Evaluierung der Performance-Modelle wird ausgewertet, wie gut interpolierte und extrapolierte Testpunkte vorhergesagt werden. Unter Verwendung ausschließlich numerischer Eingabeparameter wird eine Genauigkeit von 87,99 % erzielt. Das Modell mit numerischen sowie einem binären Parameter erzielt eine Genauigkeit von 89,14 %.eter erzielt eine Genauigkeit von 89,14 %.)
Themenextraktion zur Domänenauswahl für Programmierung in natürlicher Sprache + (Für den Menschen sind Kontexte von Anweisu … Für den Menschen sind Kontexte von Anweisungen für die Programmierung in natürlicher Sprache einfach ersichtlich, für den Rechner ist dies nicht der Fall.Eine Art des Kontextwissens ist das Verständnis der Themen.Hierfür wird im Rahmen des PARSE-Projekts zur Programmierung in natürlicher Sprache ein Ansatz zur Themenextraktion vorgestellt.Dafür wird eine Auflösung von mehrdeutigen Nomen benötigt, weshalb in dieser Arbeit ebenfalls ein Werkzeug dafür entwickelt wurde.Als einen Anwendungsfall für die extrahierten Themen wird die Auswahl von passenden Ontologien angegangen.Durch diese Auswahl wird ermöglicht, statt einer großen Ontologie mehrere kleine domänenspezifische Ontologien einzusetzen.Für die Evaluation der Themenextraktion wurde eine Umfrage durchgeführt, die ergab, dass das erste extrahierte Thema in bis zu 63,6% der Fälle treffend war.In 91% der Fälle ist mindestens eines der ersten vier extrahierten Themen passend.Die Evaluation der Ontologieauswahl ergab ein F1-Maß von 90,67% und ein F2-Maß von 89,94%.-Maß von 90,67% und ein F2-Maß von 89,94%.)
Gamify Your Learning Experience -- Möglichkeiten von Gamification Lernprozesse und -erfolge zu visualisieren (PdF) + (Gamification has emerged as a prominent ap … Gamification has emerged as a prominent approach in the field of education, aiming to enhance students’ motivation and foster productive and successful long-term learning experiences. This research paper delves into the correlation between various game design elements and intrinsic motivation within educational contexts. Despite the ongoing challenge of identifying the optimal game design elements to augment intrinsic motivation for individual students, this study presents insights drawn from an extensive systematic literature review, encompassing 24 scholarly papers. From the comprehensive analysis, ten distinct game elements were selected for examination: badges, points, leaderboards, virtual currency, progress bars, achievements, avatars, concept maps, storytelling, and feedback. In order to assess the potential of each game design element to heighten intrinsic motivation, a survey utilizing self-created mockups was conducted. The survey outcomes reveal noteworthy trends, particularly emphasizing the combinations of concept maps and achievements, as well as avatars and virtual currency, as highly popular among participants. Moreover, correlations were identified between the frequency of video game engagement and specific game design elements. For instance, individuals who frequently engage in video games displayed a proclivity for selecting leaderboards. Conversely, the study found no significant influence of player types on the preference for specific game design elements. Overall, this research contributes to the advancement of gamification in education by shedding light on the relationship between game design elements and intrinsic motivation. These insights pave the way for the development of tailored gamified approaches that can positively impact students’ motivation.an positively impact students’ motivation.)
Punktesysteme in Online Kursen: Eine Möglichkeit zur Förderung von Interesse mit Hilfe von Gamification + (Gamification ist ein neuartiger Ansatz um … Gamification ist ein neuartiger Ansatz um Motivation bei Lernenden zu steigern. In dieser Studie wurde die Wirksamkeit eines Punktesystems auf Motivation und Ineteresse von Lernenden in einem Onlinekurs untersucht. Verglichen mit einer früheren Studie ohne Punktesystem zeigte sich, dass die Punkte allein das Interesse nicht signifikant erhöhten. Auch eine Anpassung der Punkteskala führte nicht zu einem positiven Effekt. Mögliche Gründe und Implikationen werden diskutiert.ründe und Implikationen werden diskutiert.)
(Freiwillige Teilnahme) Abschlussvortrag Praxis der Forschung SS23 II + (Gamify Your Learning Experience -- Möglich … Gamify Your Learning Experience -- Möglichkeiten von Gamification Lernprozesse und -erfolge zu visualisierenGamification enhances education by boosting motivation and fostering effective learning. This paper explores the link between game design elements and intrinsic motivation in education. Drawing from 24 scholarly papers, it identifies ten key game design elements: badges, points, leaderboards, virtual currency, progress bars, achievements, avatars, concept maps, storytelling, and feedback. To evaluate their impact, a survey using mockups was conducted. Results highlight the popularity of combinations like concept maps with progress bars and points with feedback. The study also uncovers correlations between preferred elements and learner characteristics. By uncovering these insights, the research advances gamification in education and guides tailored approaches for boosting student motivation.pproaches for boosting student motivation.)
Analyzing Different Approaches to Integrating Handwritten and Generated Object-oriented Code + (Generating source code from models is one … Generating source code from models is one of the major advantages of a model-driven development process but most of the time this generated code does not suffice and developers are still required to write code by hand. This leads to the question of how to best integrate handwritten and generated code.Previous authors suggested a number of possible solutions to this problem of integrating handwritten and generated code but the possibilities to objectively compare these alternatives are still limited. Therefore we collected the different analysis criteria suggested by other authors as well as complemented them with additional criteria proposed by senior developers. We then applied these criteria to the possible integration approaches presented by previous authors to create an overview for developers to use when having to choose an integration approach for their model-driven project.Applying the results of this analysis we chose the best-fitting integration approach for the development of a large industrial development project and found out that migrating to this suggested integration approach would improve the overall software quality regarding complexity, coupling, and cohesion.arding complexity, coupling, and cohesion.)
Automatische Klassifikation von GitHub-Projekten nach Anwendungsbereichen + (GitHub ist eine der beliebtesten Plattform … GitHub ist eine der beliebtesten Plattformen für kollaboratives Entwickeln von Software-Projekten und ist eine wertvolle Ressource für Software-Entwickler. Die große Anzahl von Projekten, welche auf diesem Dienst zu finden sind, erschwert allerdings die Suche nach relevanten Projekten. Um die Auffindbarkeit von Projekten auf GitHub zu verbessern, wäre es nützlich, wenn diese in Kategorien klassifiziert wären. Diese Informationen könnten in einer Suchmaschine oder einem Empfehlungssystem verwendet werden. Manuelle Klassifikation aller Projekte ist allerdings wegen der großen Anzahl von Projekten nicht praktikabel. Daher ist ein automatisches Klassifikationssystem wünschenswert. Diese Arbeit befasst sich mit der Problematik, ein automatisches Klassifikationssystem für GitHub-Projekte zu entwickeln. Bei der vorgestellten Lösung werden GitHub-Topics verwendet, welches manuelle Klassifikation von GitHub-Projekten sind, welche von den Eigentümern der Projekte vorgenommen wurden. Diese klassifizierten Projekte werden als Trainingsdaten für ein überwachtes Klassifikationssystem verwendet. Somit entfällt die Notwendigkeit, manuell Trainingsdaten zu erstellen. Dies ermöglicht die Klassifikation mit flexiblen Klassenhierarchien. Im Kontext dieser Arbeit wurde ein Software-Projekt entwickelt, welches die Möglichkeit bietet, Trainingsdaten mithilfe der GitHub-API basierend auf GitHub-Topics zu generieren und anschließend mit diesen ein Klassifikationssystem zu trainieren. Durch einen modularen Ansatz können für den Zweck der Klassifikation eine Vielzahl von Vektorisierungs- und Vorhersagemethoden zum Einsatz kommen. Neue Implementierungen solcher Verfahren können ebenfalls leicht eingebunden werden. Das Projekt bietet zudem Schnittstellen für externe Programme, welche es ermöglicht, einen bereits trainierten Klassifikator für weiterführende Zwecke zu verwenden. Die Klassifikationsleistung des untersuchten Ansatzes bietet für Klassenhierarchien, welche sich gut auf GitHub-Topics abbilden lassen, eine bessere Klassifikationsleistung als vorherige Arbeiten. Bei Klassenhierarchien, wo dies nicht der Fall ist, die Klassifikationsleistung hingegen schlechter.assifikationsleistung hingegen schlechter.)
Graph Attention Network for Injection Molding Process Simulation + (Graph Neural Networks (GNNs) have demonstr … Graph Neural Networks (GNNs) have demonstrated great potential for simulating physical systems that can be represented as graphs. However, training GNNs presents unique challenges due to the complex nature of graph data. The focus of this thesis is to examine their learning abilities by developing a GNN-based surrogate model for the injection molding process from materials science. While numerical simulations can accurately model the mold filling with molten plastic, they are computationally expensive and require significant trial-and-error for parameter optimization. We propose a GNN-based model that can predict the fill times and physical properties of the mold filling process. We model the mold geometry as a static graph and encode the process information into node, edge, and global features. We employ a self-attention mechanism to enhance the learning of the direction and magnitude of the fluid flow. To further enforce the physical constraints and behaviors of the process, we leverage domain knowledge to construct features and loss functions. We train our model on simulation data, using a multi-step loss to capture the temporal dependencies and enable it to iteratively predict the filling for unseen molds. Thereby, we compare our models with different distance-based heuristics and conventional machine learning models as baselines in terms of predictive performance, computational efficiency, and generalization ability. We evaluate our architectural and training choices, and discuss both the potential applications and challenges of using GNNs for surrogate modeling of injection molding.r surrogate modeling of injection molding.)
Efficient Training of Graph Neural Networks for Dynamic Phenomena (Proposal) + (Graph Neural Networks (GNNs) have shown gr … Graph Neural Networks (GNNs) have shown great potential for use cases that can be described as graphs. However, training GNNs presents unique challenges due to the characteristics of graph data. The focus of this thesis is to examine their learning abilities by developing a GNN-based surrogate model for the injection molding process from materials science. While numerical simulations can model the mold filling accurately, they are computationally expensive and require significant trial-and-error for parameter optimization. We propose representing the mold geometry as a static graph and constructing additional node and edge features from domain knowledge. We plan to enhance our model with a self-attention mechanism, allowing dynamic weighting of a node's neighbors based on their current states. Further improvements may come from customizing the model’s message passing function and exploring node sampling methods to reduce computational complexity. We compare our approach to conventional machine learning models w.r.t. predictive performance, generalizability to arbitrary mold geometries and computational efficiency.This thesis is a follow-up work to a bachelor thesis written at the chair in 2022.helor thesis written at the chair in 2022.)
Investigating Variational Autoencoders and Mixture Density Recurrent Neural Networks for Code Coverage Maximization + (Graphical User Interfaces (GUIs) are a com … Graphical User Interfaces (GUIs) are a common interface to control software. Testing the graphical elements of GUIs is time-consuming for a human tester because it requires interacting with each element, in each possible state that the GUI can be in. Instead, automated approaches are desired, but they often require many interactions with the software to improve their method. For computationally-intensive tasks, this can become infeasible. In this thesis, I investigate the usage of a reinforcement learning (RL) framework for the task of automatically maximizing the code coverage of desktop GUI software using mouse clicks. The framework leverages two neural networks to construct a simulation of the software. An additional third neural network controls the software and is trained on the simulation. This avoids the possibly costly interactions with the actual software. Further, to evaluate the approach, I developed a desktop GUI software on which the trained networks try to maximize the code coverage. The results show that the approach achieves a higher coverage compared to a random tester when considering a limited amount of interactions. However, for longer interaction sequences, it stagnates, while the random tester increases the coverage further, and surpasses the investigated approach. Still, in comparison, both do not reach a high coverage percentage. Only random testers, that use a list of clickable widgets for the interaction selection, achieved values of over 90% in my evaluation.ieved values of over 90% in my evaluation.)
Development of an Active Learning Approach for One Class Classiﬁcation using Bayesian Uncertainty + (HYBRID: This Proposal will be online AND i … HYBRID: This Proposal will be online AND in the seminar room 348.When working with large data sets, in many situations one has to deals with a large set data from a single class and only few negative examples from other classes. Learning classifiers, which can assign data points to one of the groups, is known as one-class classification (OCC) or outlier detection. The objective of this thesis is to develop and evaluate an active learning process to train an OCC. The process uses domain knowledge to reasonably adopt a prior distribution. Knowing that prior distribution, query strategies will be evaluated, which consider the certainty, more detailed the uncertainty, of the estimated class membership scorings. The integration of the prior distribution and the estimation of uncertainty, will be modeled using a gaussian process. will be modeled using a gaussian process.)
Schnittstellenkonzept für Hardwaresimulationen zur Co-Simulation mit Software + (Hardwaresimulationen dienen dazu, die Hard … Hardwaresimulationen dienen dazu, die Hardware zu simulieren und somit das Verhalten der Software auf der Hardware zu testen. Beim Testen von Software, die auf Hardware läuft, entsteht bei jeder Simulation ein Zielkonflikt zwischen Genauigkeit und Geschwindigkeit. Es gibt verschiedene Hardwaresimulationen zur Auswahl, die eine höhere Genauigkeit bieten, aber längere Ausführungszeiten erfordern. Wenn jedoch die Geschwindigkeit der Co-Simulation, die mehrere Simulationen kombiniert, von größter Bedeutung ist, wählt man eine Simulation, die zwar geringere Genauigkeit bietet, aber schneller ausgeführt werden kann. Je nach Zielsetzung erfordert die Co-Simulation unterschiedliche Hardwaresimulationen. Ein Austausch von Hardwaresimulationen kann jedoch aufwändig sein und Anpassungen an der Co-Simulation erfordern. Diese Arbeit zielt darauf ab, eine allgemeine Schnittstelle für Hardwaresimulationen zu entwickeln, die den Wechsel erleichtert, ohne die Co-Simulation zu beeinträchtigen. Eine allgemeine Schnittstelle für alle Hardwaresimulationen ist jedoch nicht realisierbar. Daher erfolgt eine Klassifizierung, um ähnliche Simulationen zu gruppieren und für eine Klasse eine allgemeine Schnittstelle zu entwickeln.ne allgemeine Schnittstelle zu entwickeln.)
Modellierung und Simulation von verketteten Ausfallszenarien in Palladio + (Heutige emergente und verteilte Softwaresy … Heutige emergente und verteilte Softwaresysteme sollen auch bei Teilausfällen ein bestimmtes Minimum an Funktionalität bereitstellen. Die Nachweisbarkeit von Reaktionen auf Fehlerszenarien ist deshalb bereits in frühen Phasen der Entwicklung essenziell. Denn so lassen sich Aussagen über die Zuverlässigkeit und Resilienz an leichtgewichtigen Modellen statt teuren Experimenten treffen.Bisherige Performance-Analysen im Palladio-Komponenten-Modell (PCM) modellieren Ausfälle stochastisch und verhindern es so, bestimmte Fehlerauftritte gezielt zu untersuchen. Die, im Rahmen dieser Arbeit bereitgestellte Modellierung von verketteten Ausfallszenarien erlaubt eine explizite Szenariendefinition und integriert probabilistisch abhängige Fehlerauftritte in das PCM. Durch Anpassungen am Palladio-Plugin SimuLizar ist es nun außerdem möglich, die erstellten Modelle in der Simulation auszuwerten.Am Fallbeispiel eines Lastverteilungssystems konnte die Evaluation einerseits die technische Funktionalität der Implementierung validieren. Zusätzlich wird gezeigt, dass der Ansatz eine Einordnung verschiedener Entwurfsalternativen von LoadBalancern ermöglicht, wodurch die Entscheidungsfindung in der System-Entwicklung unterstützt werden kann.ystem-Entwicklung unterstützt werden kann.)
Hidden Outliers in Manifolds + (Hidden outliers represent instances of dis … Hidden outliers represent instances of disagreement between a full-space and an ensemble. This adversarial nature naturally replicates the subspace behavior that high-dimensional outliers exhibit in reality. Due to this, they have been proven useful for representing complex occurrences like fraud, critical infrastructure failure, and healthcare data, as well as for their use in general outlier detection as the positive class of a self-supervised learner. However, while interesting, hidden outliers' quality highly depends on the number of subspaces selected in the ensemble out of the total possible. Since the number of subspaces increases exponentially with the number of features, this makes high-dimensional applications of Data Analysis, such as Computer Vision, computationally unfeasible. In this thesis, we are going to study the generation of hidden outliers on the embedded data manifold using deep learning techniques to overcome this issue. More precisely, we are going to study the behavior, characteristics, and performance in multiple use-cases of hidden outliers in the data manifold.s of hidden outliers in the data manifold.)
Incremental Real-Time Personalization in Human Activity Recognition Using Domain Adaptive Batch Normalization + (Human Activity Recognition (HAR) from acce … Human Activity Recognition (HAR) from accelerometers is a fundamental problem in ubiquitous computing. Machine learning based recognition models often perform poorly when applied to new users that were not part of the training data. Previous work has addressed this challenge by personalizing general recognition models to the motion pattern of a new user in a static batch setting. The more challenging online setting has received less attention. No samples from the target user are available in advance, but they arrive sequentially. Additionally, the user's motion pattern may change over time. Thus, adapting to new and forgetting old information must be traded off. Finally, the target user should not have to do any work to use the recognition system by labeling activities. Our work addresses this challenges by proposing an unsupervised online domain adaptation algorithm. It works by aligning the feature distribution of all the subjects, sources and target, within deep neural network layers.target, within deep neural network layers.)
On the Utility of Privacy Measures for Battery-Based Load Hiding + (Hybrid presentation : https://kit-lecture. … Hybrid presentation : https://kit-lecture.zoom.us/j/67744231815Battery based load hiding gained a lot of popularity in recent years as an attempt to guarantee a certain degree of privacy for users in smart grids. Our work evaluates a set of the most common privacy measures for BBLH. For this purpose we define logical natural requirements and score how well each privacy measure complies to each requirement. We achieve this by scoring the response for load profile altering (e.g. noise addition) using measures of displacement. We also investigate the stability of privacy measures toward load profile length and number of bins using specific synthetic data experiments.Results show that certain private measures fail badly to one or many requirements and therefore should be avoided.uirements and therefore should be avoided.)
Entwicklung einer domänenspezifischen Sprache zur Spezifikationsbeschreibung ereignisorientierter Simulationen + (Im Bereich der modellgetriebenen Softwaree … Im Bereich der modellgetriebenen Softwareentwicklung sind Simulationen ein wichtigesKonzpet. Zum Beispiel erlaubt das Palladio Component Model (PCM) die Modellierungund der Palladio Simulator die Simulation von Softwarearchitekturen durch ereignisorientierte Simulationen,um Mängel in Softwarearchitekturen frühzeitig zu erkennen.Das Tool DesComp ermöglicht es, ereignisorientierte Simulationen zu modellieren und zu vergleichen.Vor der Implementierung einer neuen Simulation können so Ähnlichkeiten zu bestehenden Simulationen identifiziertwerden, um diese wiederzuverwenden. Der DesComp-Ansatz modelliert das Simulations-Verhaltenmittels Erfüllbarkeit prädikatenlogischer Formeln(Satisfiability Modulo Theories, kurz SMT). Die Spezifikation des Simulations-Verhaltens durch SMT-Codeist allerdings aufwändig und erfordert Hintergrundwissen zum SMT-LIB Standard.Im Rahmen dieser Arbeit wurde eine domänenspezifischen Sprache (DSL) zur Spezifikation derSimulations-Struktur und des Simulations-Verhaltens mit dem Xtext-Framework entwickelt.Auf Grundlage der Struktur-Spezifikationdes DesComp-Ansatzes wurde dafür ein Metamodell zur Modellierung von Struktur und Verhalten ereignisorientierterSimulationen erstellt. Dieses Metamodell wurde als Grundlage der abstrakten Syntax der entwickeltenSprache verwendet. Das Metamodell dient dann als Ausgangspunkt für die weitere Verwendung und Analyse derSimulations-Spezifikation.Dazu wurde eine Transformation der Verhaltens-Spezifikation in SMT-Code implementiert,die zusammen mit derSimulations-Struktur in eine Graph-Datenbank exportiert werden kann, um die Simulationin DesComp zu analysieren oder mit anderen Simulationen zu vergleichen.Die entwickelte Sprache wurde anhand der Simulation BusSimulation und desPalladio-Simulators EventSim evaluiert, indem die Modellierungen derSimulationen mit DesComp und der Sprache anhand verschiedener Kriterienverglichen wurde. verschiedener Kriterien verglichen wurde.)
Abschlusspräsentation BA "Bestimmung eines Quartiers anhand von Positionsdaten" + (Im Forschungsprojekt QuartrBack wurden Men … Im Forschungsprojekt QuartrBack wurden Menschen mit Demenz GPS-Tracker mitgegeben, um deren Position zu jeder Zeit bestimmen zu können. Die Herausforderung dieser Bachelorarbeit war es, aus den erhaltenen (sehr dünnen) Daten ein möglichst genaues Quartier (also jene Wege, die oft von einer Person gelaufen werden) zu erstellen. Dies wurde mithilfe der OpenStreetMaps Daten realisiert.hilfe der OpenStreetMaps Daten realisiert.)
Challenges for Service Integration into Third-Party Application + (Im Laufe der Zeit hat sich die Softwareent … Im Laufe der Zeit hat sich die Softwareentwicklung von der Entwicklung von Komplett-systemen zur Entwicklung von Software Komponenten, die in andere Applikation inte-griert werden können,verändert.Bei Software Komponenten handelt es sich um Services,die eine andere Applikation erweitern.Die Applikation wird dabeivonDritten entwickelt.In dieser Bachelorthesis werden die Probleme betrachtet, die bei der Integration von Ser-vices auftreten. Mit einer Umfrage wird das Entwicklungsteam von LogMeIn, welchesfür die Integration von Services zuständig ist, befragt. Aus deren Erfahrungen werdenProbleme ausfndig gemacht und Lösungen dafür entwickelt. Die Probleme und Lösungen werden herausgearbeitet und an hand eines fort laufenden Beispiels, des GoToMeetingAdd-ons für den Google Kalender,veranschaulicht.Für die Evaluation wird eine Fallstudiedurchgeführt, in der eine GoToMeeting Integration für Slack entwickelt wird. Währenddieser Entwicklung treten nicht alle ausgearbeiteten Probleme auf. Jedoch können dieProbleme, die auftreten mit den entwickelten Lösungen gelöst werden. Zusätzlich trittein neues Problem auf, für das eine neue Lösung entwickelt wird. Das Problem und diezugehörige Lösung werden anschließend zu dem bestehenden Set von Problemen und Lösungen hinzugefügt. Das Hinzufügen des gefundenen Problems ist ein perfektes Beispieldafür, wie das Set in Zukunft bei neuen Problemen, erweitert werden kann.ei neuen Problemen, erweitert werden kann.)
Verschlüsselung von änderungsbasierten Modellen + (Im Rahmen der Bachelorarbeit wird eine pro … Im Rahmen der Bachelorarbeit wird eine prototypische Implementation für die symmetrische, asymmetrische und Attribute-basierte Ver -und Entschlüsselung von Modelländerungen innerhalb Vitruvius vorgestellt. Vor -und Nachteile, Skalierbarkeit und Performanz dieser Verfahren werden besprochen.ormanz dieser Verfahren werden besprochen.)
Architektur-basierte Wartbarkeitsvorhersage von Metamodellen mittels Evolutionsszenarien + (Im Rahmen der Masterarbeit „Architektur-ba … Im Rahmen der Masterarbeit „Architektur-basierte Wartbarkeitsvorhersage von Metamodellen mittels Evolutionsszenarien“ wurden Metamodelle für die Modellierung von Metamodell-Architekturen sowie Evolutionsszenarien, die Änderungen auf Metamodell-Architekturen beschreiben, entworfen. Das Metamodell für Metamodell-Architekturen ermöglicht die Modellierung von komplexen Metamodellen auf einer abstrakteren Ebene analog zur Software-Architektur. Für beide Metamodelle wurden Editoren für die Modellierung entwickelt. Zusätzlich wurde ein Werkzeug zur Vorhersage der Wartbarkeit, basierend auf einem Evolutionsszenario, entwickelt. Die entwickelten Werkzeuge wurden anschließend auf ihre Benutzbarkeit über eine Benutzerstudie sowie auf Funktionalität über Fallstudien analysiert.unktionalität über Fallstudien analysiert.)
Analyse der Relation zwischen textueller Dokumentation und formellen Modellen in der Softwarearchitektur + (Im Rahmen dieser Bachelorarbeit wurde die … Im Rahmen dieser Bachelorarbeit wurde die Relation zwischen Dokumentationen der Softwarearchitektur in natürlicher Sprache und formellen Modellen untersucht. Dabei wurde versucht herauszufinden, wie sich die Entwurfsentscheidungen in der Dokumentation auf das Modell auswirken. Zu diesem Zweck wurden zwei Fallstudien durchgeführt. Zunächst wurde ein Modell der Implementierung erstellt, das auf dem Palladio-Komponentenmodell basiert. Danach wurden die Aussagen in der Dokumentation klassifiziert und anschließend wurde untersucht, welche Entwurfsentscheidungen im Modell wiederzufinden sind und welche nicht dargestellt werden. Die Ergebnisse wurden genutzt, um eine Aussage über die Relation zwischen den Artefakten zu treffen.lation zwischen den Artefakten zu treffen.)
Conception and Implementation of a Runtime Model for Telemetry-Based Software Monitoring and Analysis + (Im Zeitalter des Cloud Computings und der … Im Zeitalter des Cloud Computings und der Big Data existieren Software-Telemetriedaten im Überfluß. Die schiere Menge an Daten und Datenplattformen kann allerdings zu Problemen in ihrer Handhabung führen. In dieser Masterarbeit wird ein Laufzeitmodell vorgestellt, welches es ermöglicht, Messungen von Telemetriedaten auf verschiedenen Datenplatformen durchzuführen. Hierbei folgt das Modell dem vollen Lebenszyklus einer Messung von der Definition durch eine eigens hierfür entwickelte domänenspezifischen Sprache, bis zur Visualisierung der resultierenden Messwerte. Das Modell wurde bei dem Software-as-a-Service-Unternehmen LogMeIn implementiert und getestet. Hierbei wurde eine Evaluation hinsichtlich der Akzeptanz des implementierten Dienstes bei der vermuteten Zielgruppe anhand einer Nutzerstudie innerhalb des Unternehmens durchgeführt.e innerhalb des Unternehmens durchgeführt.)
Eine Domänenspezifische Sprache für Änderungsausbreitungsregeln + (Im Zuge der Masterarbeit sollte eine domän … Im Zuge der Masterarbeit sollte eine domänenspezifische Sprache für Änderungsausbreitungsregeln evaluiert und erweitert werden.Durch diese Sprache wird es Domänenexperten ermöglicht, Änderungsausbreitungsregeln auf Grundlage eines Metamodells innerhalb des Änderungsausbreitungsframeworks zu erstellen. Dabei sind keine tiefer gehenden Kenntnisse der Java-Programmierung oder des Änderungsausbreitungsframeworks notwendig. Aus den in dieser Sprache formulierten Regeln werden automatisch Java-Klassen generiert, die eine Änderungsausbreitungsberechnung durchführen können.Zu Evaluationszwecken wurden die bisher mittels Java-Methoden implementierten Änderungsausbreitungsberechnungen untersucht. Diese konnten in Regelklassen gegliedert und teilweise in der Sprache abgebildet werden. Für die nicht abbildbaren Regeln wurden neue Sprachkonstrukte konzipiert. Zudem wurde die Übertragbarkeit von der Sprache zwischen unterschiedlichen Anwendungsdomänen untersucht.schiedlichen Anwendungsdomänen untersucht.)
Kontinuierliche Verfeinerung automatisch extrahierter Performance-Modelle + (Immer mehr Unternehmen stehen heutzutage v … Immer mehr Unternehmen stehen heutzutage vor dem Problem, dass eines oder mehrere ihrer Altsysteme auf einer monolithischen Softwarearchitektur basieren, die über Jahre hinweg immer mehr an Komplexität zugenommen hat. Die Weiterentwicklung eines solchen Altsystems ist aufwendig und dementsprechend mit hohen Kosten verbunden. Um diese Kosten längerfristig zu senken, können Architektur-Muster, wie die Microservices Architektur eingesetzt werden. Der Migrationsprozess von einer monolithischen Architektur, hin zu einer Microservices-Architektur, ein komplexer und fehleranfälliger Prozess.Ziel dieser Masterthesis ist die Unterstützung eines solchen Migrationsprozess, indem ein Konzept für eine kontinuierliche Verfeinerung von automatisch extrahierten, architekturellen Performanz-Modellen entwickelt und in einem prototypischen Plug-in umgesetzt wird. Die Thesis beinhaltet ein Konzept zur Durchführung und Speicherung von manuellen Verfeinerungsschritten an extrahierten Performanz-Modellen. Außerdem ermöglicht die Thesis eine Zusammenführung von automatisch extrahierten Performanz-Modellen mit einem zu verfeinernden Performanz-Modell. Ein Ansatz zur Integration des erarbeiteten Konzepts in eine Continuous Integration Umgebung wird ebenfalls präsentiert.ation Umgebung wird ebenfalls präsentiert.)
Automated Cloud-to-Cloud Migration of Distributed Sofware Systems for Privacy Compliance + (In 2018 wird die neue EU Datenschutzverord … In 2018 wird die neue EU Datenschutzverordnung in Kraft treten. Diese Verordnung beinhaltet empfindliche Strafen für Datenschutzverletzungen. Einer der wichtigsten Faktoren für die Einhaltung der Datenschutzverordnung ist die Verarbeitung von Stammdaten von EU-Bürgern innerhalb der EU. Wir haben für diese Regelung eine Privacy Analyse entwickelt, formalisiert, implementiert und evaluiert. Außerdem haben wir mit iObserve Privacy ein System nach dem MAPE Prinzip entwickelt, dass automatisch Datenschutzverletzungen erkennt und eine alternatives, datenschutzkonformes Systemhosting errechnet. Zudem migriert iObserve Privacy die Cloudanwendung entsprechend dem alternativen Hosting automatisch.hend dem alternativen Hosting automatisch.)
Context-based confidentiality analysis in dynamic Industry 4.0 scenarios + (In Industry 4.0 environments highly dynami … In Industry 4.0 environments highly dynamic and flexible access control strategies are needed. State of the art strategies are often not included in the modelling process but must be considered afterwards. This makes it very difficult to analyse the security properties of a system. In the framework of the Trust 4.0 project the confidentiality analysis tries to solve this problem using a context-based approach. Thus, there is a security model named “context metamodel”. Another important problem is that the transformation of an instance of a security model to a wide-spread access control standard is often not possible. This is also the case for the context metamodel. Moreover, another transformation which is very interesting to consider is one to an ensemble based component system which is also presented in the Trust 4.0 project. This thesis introduces an extension to the beforementioned context metamodel in order to add more extensibility to it. Furthermore, the thesis deals with the creation of a concept and an implementation of the transformations mentioned above. For that purpose, at first, the transformation to the attribute-based access control standard XACML is considered. Thereafter, the transformation from XACML to an ensemble based component system is covered. The evaluation indicated that the model can be used for use cases in Industry 4.0 scenarios. Moreover, it also indicated the transformations produce adequately accurate access policies. Furthermore, the scalability evaluation indicated linear runtime behaviour of the implementations of both transformations for respectively higher number of input contexts or XACML rules.r number of input contexts or XACML rules.)
Merging and Versioning in a Multi-Modeling Environment + (In Model-Driven software development, a co … In Model-Driven software development, a complex system is often modeled in different, specialized models.To keep consistency, VITRUVIUS provides a mechanism to define consistency preserving actions for different models.Furthermore, versioning is an important task at developing software.There are various concepts and implementations for the versioning of models, e.g., EMFStore, but none of these are able to guarantee the cross-model consistency provided by VITRUVIUS.Thus, conflicting changes in different models may not be identified as conflicting while merging different branches.In this thesis, an approach is presented that defines a versioning system and preserves the consistency of models of the same system.The approach is based on a graph analysis of the dependency graph of the occurred changes.Besides a requirement relation the dependency graph includes a trigger relation.Afterwards, the two dependency graphs are scanned for a subgraph isomorphism.All changes outside of the isomorphism are potentially conflicting changes.At the manual change resolution, the trigger and the require relation is used to guarantee that a change is applicable and after its application all models are consistent with each other.The approach is illustrated and validated with an application, which combines component-based architectures and class diagrams.nt-based architectures and class diagrams.)
Assessing Hypotheses in Multi-Agent Systems for Natural Language Processing + (In Multi-Agenten Systemen (MAS) arbeiten v … In Multi-Agenten Systemen (MAS) arbeiten verschiedene Agenten an einem gemeinsamen Problem.Auch im Bereich der natürlichen Sprachverarbeitung (NLP) werden solche Systeme verwendet.Agenten eines MAS für natürliche Sprache können neben Ergebnissen auch Ergebnisse mit Konfidenzen, s.g. Hypothesen generieren.Diese Hypothesen spiegeln die Mehrdeutigkeit der natürlichen Sprache wider.Sind Agenten abhängig voneinander, so kann eine falsche Hypothese schnell zu einer Fehlerfortpflanzung in die Hypothesen der abhängigen Agenten führen.Die Exploration von Hypothesen bietet die Chance, die Ergebnisse von Agenten zu verbessern.Diese Arbeit verbessert die Ergebnisse von Agenten eines MAS für NLP durch eine kontrollierte Exploration des Hypothesen-Suchraums.Hierfür wird ein Framework zur Exploration und Bewertung von Hypothesen entwickelt.In einer Evaluation mit drei Agenten konnten vielversprechende Ergebnisse hinsichtlich der Verbesserung erzielt werden.So konnte etwa mit der Top-X Exploration eine durchschnittliche Verbesserung des F1-Maßes des Topic-Detection-Agenten von ursprünglich 40% auf jetzt 49% erreicht werden.ünglich 40% auf jetzt 49% erreicht werden.)
Development of an Active Learning Approach for One Class Classifi cation using Bayesian Uncertainty + (In One-Class classification, the classifie … In One-Class classification, the classifier decides if points belong to a specific class. In this thesis, we propose an One-Class classification approach, suitable for active learning, that models for each point, a prediction range in which the model assumes the points state to be. The proposed classifier uses a Gaussian process. We use the Gaussian processes prediction range to derive a certainty measure, that considers the available labeled points for stating its certainty. We compared this approach against baseline classifiers and show the correlation between the classifier's uncertainty and misclassification ratio.s uncertainty and misclassification ratio.)
Rahmenwerk zur Generierung von Sichten aus dem Palladio-Komponentenmodell + (In Palladio werden die erstellten Modelle … In Palladio werden die erstellten Modelle groß, weil die heutigen Softwaresysteme durch ihre Komplexität immer größer werden und nur noch mit der modellgetriebener Softwareentwicklung eine gute Architektur erstellt werden kann. Das Softwaresystem wird in mehrere Modelle aufgeteilt, damit sie unabhängig voneinander sind und ersetzt werden können. Dadurch werden die Modelle unübersichtlich, zum Beispiel müssen mehrere Modelle geöffnen werden, um einen Ablauf nachvollziehen zu können. In verteilten Modellen sind Abläufe aufwendiger zu verfolgen und das Modell schwerer zu verstehen. Aus diesem Grund wurde ein Werkzeug entwickelt, das Teile von Palladio als Sicht anzeigen kann. Palladio ist ein Werkzeug zur modellbasierten Performance-Analyse. Die modellierten Softwaresysteme sind in vier Modelle aufgeteilt, so kann dieselbe Repository-Spezifikation mit verschiedenen System-Modellen oder Hardware-Konfigurationen simuliert werden. In Palladio ist es aufwendig den Ablauf eines Systemaufrufs in einem System zu finden. Durch die Unübersichtlichkeit werden die gefundenen Abläufe fehlerhaft und inkonsistent, was den Einstieg in die Software, das Warten und das Erweitern der Modelle erschwert. In dieser Bachelorarbeit wird im ersten Teil ein Rahmenwerk zum Erzeugen von Sichten vorgestellt. Diesem Rahmenwerk können neue Sichten hinzugefügt werden, die eine Hilfestellung beim Erstellen und Verstehen von Modellen geben. Mit Hilfe von Modelltransformationen erzeugt dieses Rahmenwerk neue Blickwinkel auf gesonderte Teile des Palladio-Komponentenmodells. Eine erste Sicht ist die Darstellung des Palladio-Komponentenmodells als ein Sequenzdiagramm, die im zweiten Teil der Bachelorarbeit vorgestellt wird. Die Diagramme wurden mit PlantUML erzeugt. Der PlantUML-Quelltext wird mit einer Model-zu-Text-Transformation generiert.Durch das erstellte Rahmenwerk können neue Einblicke auf ein Palladio-Komponentenmodell gegeben werden. Neue Benutzer von Palladio müssen sich nicht durch Modelle arbeiten. Sie können mit dem Rahmenwerk Abläufe direkt erkennen. Die Entwickler von Palladio können eigene Sichten hinzufügen. Dadurch erweitert sich der Werkzeugkasten von Palladio und ermöglicht einen leichteren Einstieg in das Softwaresimulationsprogramm.nstieg in das Softwaresimulationsprogramm.)
Microgrid-Topologien für Smart Grids + (In Zeiten des Umstiegs auf erneuerbare Ene … In Zeiten des Umstiegs auf erneuerbare Energien und dem Einsatz von Smart Metern zum Messen und Steuern des Netzes stellen sich neue Herausforderungen für die Stromversorgung. Um die Kommunikation des Smart Grids zu ermöglichen wird vorgeschlagen das Netzwerk in Microgrids zu unterteilen. Dazu wird eine sinnvolle Aufteilung und eine robuste Kommunikationstopologie benötigt. In dieser Arbeit werden die Anforderungen einer solchen Aufteilung und Topologie erarbeitet und verschiedene Lösungsansätze vorgeschlagen und verglichen. Basierend auf den Ergebnissen wird ein anpassungsfähiger Algorithmus entworfen, der ein Stromnetz in mehrere Microgrids zerlegt und eine Kommunikationstopologie erzeugt. und eine Kommunikationstopologie erzeugt.)
Semantic Interoperability in Decentralized Identity Ecosystems + (In an identity ecosystem, actors exchange … In an identity ecosystem, actors exchange digital proofs, so called "credentials". Actors can also take on different roles: "Issuers" generate credentials and issue them to other actors. "Holders" store them and present them to "verifiers", who verify and accept the credential or reject it.In decentralized identity ecosystems, actors can interact with each other on an equal basis, regardless of their current role.They are not subjected to permanent hierarchies. Instead, they are loosely coupled with each other and where it is possible, intermediaries are avoided.In this thesis, the "semantic interoperability" of actors in decentralized identity ecosystems are examined. Semantic interoperability aims at a common understanding of credentials for all actors. For this purpose, two things have to be taken into account: First, the understanding of the properties and statements evidenced in the credential, e.g., "What does the content say and what does it not say? What level of trust is guaranteed? What kind of actor issued the credential?"Second, it is about the context of the credential in its own environment, e.g., "Is the evidence of these properties adequate to continue this process? Is the level of trust sufficient?"Regarding this, there are already promising approaches from researchers and practitioners, especially in the area of the "Semantic Web", which is closely connected to the topic of semantic interoperability.This is why we want to collect and classify various existing technologies and standards for creating semantic interoperability. These technologies and standards will also be evaluated for their use on the basis of requirements collected in the project "Schaufenster sichere digitale Identitäten Karlsruhe" (Showcase secure digital identities Karlsruhe).case secure digital identities Karlsruhe).)
Collective Entity Matching for Linking Structures in Attributed Material Graphs + (In data analysis, entity matching (EM) or … In data analysis, entity matching (EM) or entity resolution is the task of finding the same entity within different data sources. When joining different data sets, it is a required step where the same entities may not always share a common identifier. When applied to graph data like knowledge graphs, ontologies, or abstractions of physical systems, the additional challenge of entity relationships comes into play. Now, not just the entities themselves but also their relationships and, therefore, their neighborhoods need to match. These relationships can also be used to our advantage, which builds the foundation for collective entity matching (CEM).In this bachelor thesis, we focus on a graph data set based on a material simulation with the intent to match entities between neighboring system states. The goal is to identify structures that evolve over time and link their states with a common identifier. Current CEM Algorithms assume perfect matches to be possible, i.e., every entity can be matched. We want to overcome this challenge and address the high imbalance of potential candidates and impossible matches. A third major challenge is the large volumes of data which requires our algorithm to be efficient.ch requires our algorithm to be efficient.)
Online Nyström MMD Approximation + (In data analysis, the ability to detect an … In data analysis, the ability to detect and understand critical shifts in information patterns holds immense significance. Whether it is monitoring real-time network traffic, identifying anomalies in financial markets, or tracking fluctuations in climate data, the ability to swiftly identify change points is crucial for effective decision-making. Since the default implementation of MMD is quadratic the algorithms to enable this however tend to exceed runtime limits for certain contexts, such as those where the speed and volume of incoming data is relatively high. In continuation of recent developments in change point detection optimization through estimators, notably RADMAN, we propose to integrate the “Nyström” estimator into a similar context of exponential bucketing to improve on this matter. This thesis will focus on the concept, the implementation and testing of this construct and its comparison to other recent approaches.its comparison to other recent approaches.)
Verfahren zur Reduktion von neuronalen Netzen - Analyse und Automatisierung + (In den vergangenen Jahren sind vermehrt An … In den vergangenen Jahren sind vermehrt Anwendungen von Neuronalen Netzen (NN) entstanden. Ein aktuelles Problem ist der beachtliche Ressourcenbedarf an Speicher, Rechenkapazität oder Energie, den nicht nur die Trainingsphasen, sondern auch die Anwendungsphasen von neuronalen Netzen erfordern. Aus diesem Grund ist eine erfolgreiche Verbreitung von neuronalen Netzen auf ressourcenbeschränkten Plattformen mit geringer Leistung momentan noch mit zahlreichen Herausforderungen verbunden.Die vorliegende Arbeit untersucht diese Problematik und stellt Techniken vor, wie vollständig trainierte neuronale Netze möglichst unter Erhaltung der Genauigkeit in der Anzahl ihrer Neuronen und Verbindungen reduziert werden können. Mithilfe von Experimenten in TensorFlow und Keras wird gezeigt, welche dieser Verfahren sich im Kontext von verschiedenen Praxisbeispielen eignen. Weiterhin beschreibt die Arbeit einen neuen Ansatz SNARE (Score-based Neural Architecture REduction) mit dem Ziel, eine Reduktion nicht nur auf einzelnen Schichten, sondern auf gesamten Netzwerken automatisiert durchzuführen. Die Tool-Implementierung von SNARE analysiert dazu zunächst die Struktur von trainierten Keras NNs mit TensorFlow Backend. Unter der Berücksichtigung von verschiedenen Kriterien wie dem FLOP-Beitrag werden anschließend iterativ Schichten ausgewählt, Reduktionsoperationen angewendet und durch erneutes Trainieren entstandene Fehler kompensiert.Ergebnisse zeigen, dass SNARE auf einer LeNet5-Architektur bei einem Genauigkeitsverlust von 0,39% eine Parameterreduktion um den Faktor 35 erreicht. Zusätzlich erzielte SNARE auf einem NN zur Erkennung von menschlichen Bewegungen aus mobilen Sensordaten eine Reduktionsrate von 245 bei gleicher Genauigkeit.ionsrate von 245 bei gleicher Genauigkeit.)
Ausgestaltung von Data-Science Methoden zur Bearbeitung ungelöster Mathematik-Probleme + (In der Mathematik gibt es unzählige ungelö … In der Mathematik gibt es unzählige ungelöste Probleme, welche die Wissenschaft beschäftigen.Dabei stellen sie eine wichtige Aufgabe und Herausforderung dar.Und es wird stetig versucht ihrer Lösung Schritt für Schritt näher zu kommen.Unter diesen bisher noch ungelösten Problemen der Mathematik ist auch das sogenannte „Frankl-Conjecture“ (ebenfalls bekannt unter dem Namen „Union-Closed Set Conjecture“).Diese Vermutung besagt, dass für jede, unter Vereinigung abgeschlossene Familie von Mengen, ein Element existiert, welches in mindestens der Hälfte der Familien-Mengen enthalten ist.Auch diese Arbeit hat das Ziel der Lösung dieses Problems Schritt für Schritt näher zu kommen, oder zumindest hilfreiche neue Werkzeuge für eine spätere Lösung bereitzustellen.Dafür wurde versucht eine Bearbeitung mit Hilfe von Data-Science-Methoden durchzuführen.Dies geschah, indem zunächst möglichst viele Beispiele für das Conjecture zufällig generiert wurden.Anschließend konnten diese generierten Beispiele betrachtet und weiter analysiert werden.e betrachtet und weiter analysiert werden.)
Optimierung von Inkrementellen Modellanalysen + (In der Modellgetriebenen Softwareentwicklu … In der Modellgetriebenen Softwareentwicklung sind Analysen der entstehenden Modelle notwendig, um Validierungen schon auf der Modellebene durchführen zu können, um so kostenintensiveren Fehlern vorzubeugen und Kosten zu sparen. Allerdings sind die Modelle stetigen Änderungen unterworfen, die sich auch auf die Analyseergebnisse auswirken können, die man gerne stets aktuell hätte. Da die Modelle sehr groß werden können, sich aber immer nur kleine Teile dieser Modelle ändern, ist es sinnvoll diese Analysen inkrementell zu gestalten. Ein Ansatz für solche inkrementellen Modellanalysen ist NMF Expressions, das im Hintergrund einen Abhängigkeitsgraphen der Analyse aufbaut und bei jeder atomaren Änderung des Modells aktualisiert. Die Effizienz der Analysen hängt dabei aber oft von der genauen Formulierung der Anfragen ab. Eine ungeschickte Formulierung kann somit zu einer ineffizienten Analyse führen. In der Datenbankwelt hingegen spielt die genaue Formulierung der Anfragen keine so große Rolle, da automatische Optimierungen der Anfragen üblich sind. In dieser Masterarbeit wird untersucht, inwieweit sich die Konzepte der Optimierungen von Anfragen aus der Datenbankwelt auf die Konzepte von inkrementelle Modellanalysen übertragen lassen. Am Beispiel von NMF Expression wird gezeigt, wie solche Optimierungen für inkrementelle Modellanalysen umgesetzt werden können. Die implementierten Optimierungen werden anhand von definierten Modellanalysen getestet und evaluiert.ten Modellanalysen getestet und evaluiert.)
Linking Software Architecture Documentation and Models + (In der Softwareentwicklung ist die Konsist … In der Softwareentwicklung ist die Konsistenz zwischen Artefakten ein wichtiges Thema. Diese Arbeit schlägt eine Struktur zur Erkennung von korrespondierenden und fehlenden Elementen zwischen einer Dokumentation und einem formalen Modell vor. Zunächst identifiziert und extrahiert der Ansatz die im Text beschriebenen Modell-instanzen und -beziehungen. Dann verbindet der Ansatz diese Textelemente mit ihren entsprechenden Gegenstücken im Modell. Diese Verknüpfungen sind mit Trace-Links vergleichbar. Der Ansatz erlaubt jedoch die Abstufung dieser Links. Darüber hinaus werden Empfehlungen für Elemente generiert, die nicht im Modell enthalten sind.Der Ansatz identifiziert Modellnamen und -typen mit einem F1-Wert von über 54%. 60% der empfohlenen Instanzen stimmen mit den in der Benutzerstudie gefundenen Instanzen überein. Bei der Identifizierung von Beziehungen und dem Erstellen von Verknüpfungen erzielte der Ansatz vielversprechende Ergebnisse. Die Ergebnisse können durch zukünftige Arbeiten verbessert werden.Dies ist realisierbar da der Entwurf eine einfache Erweiterung des Ansatzes erlaubt.einfache Erweiterung des Ansatzes erlaubt.)
Untersuchung des Einflusses von Kommunikationsmodellen auf die Zusammensetzbarkeit von Informationsflusseigenschaften + (In der Softwareentwicklung wird häufig das … In der Softwareentwicklung wird häufig das Prinzip verwendet, ein großes System aus kleineren Teilsystemen zusammenzusetzen. Dies erfordert eine Kommunikation zwischen den Teilsystemen, um Informationen auszutauschen. Allerdings kann dabei der Informationsfluss durch das Gesamtsystem unsicher werden und somit die Vertraulichkeit, eine der wichtigsten Sicherheitseigenschaften eines Systems, verletzt werden. Um sicheren Informationsfluss zu erzielen, müssen sogenannte Informationsflusseigenschaften erfüllt werden. Aus der Literatur ist bekannt, dass Informationsflusseigenschaften bei der Komposition von sicheren Systemen verletzt werden können. Das bedeutet, wenn zwei sichere Systeme zusammengesetzt werden, besteht die Möglichkeit, dass das Gesamtsystem unsicher wird. Hierbei spielt die Art der Kommunikation zwischen den Teilsystemen eine entscheidende Rolle. Die Literatur liefert Ergebnisse, die zeigen, dass synchrone Kommunikation die Zusammensetzbarkeit verletzt, während asynchrone Kommunikation die Zusammensetzbarkeit gewährleistet. Allerdings existieren in der Literatur keine konkreten Ergebnisse darüber, wie sich Abstufungen von synchroner zu asynchroner Kommunikation auf die Zusammensetzbarkeit auswirken.In dieser Arbeit wird untersucht, wie sich verschiedene Kommunikationsformen zwi-schen synchroner und asynchroner Kommunikation auf die Zusammensetzbarkeit von Informationsflusseigenschaften auswirken. Hierfür werden generische Konzepte zur Modellierung asynchroner Kommunikationsformen entwickelt. Die Untersuchung erfolgt mithilfe von Timed Automata. Es wird ein Beispiel modelliert, in dem zwei sichere Systeme, die als Timed Automata modelliert sind, zusammengesetzt werden und unter synchroner Kommunikation ein unsicheres Gesamtsystem bilden. Anschließend wird die synchrone Kommunikation mithilfe der entwickelten Modellierungskonzepte durch asynchrone Kommunikationsformen ersetzt und für jede Form wird die Sicherheit des zusammengesetzten Systems überprüft. Zur Modellierung und Überprüfung des Gesamtsystems hinsichtlich des Erhalts von Informationsflusseigenschaften wird in dieser Arbeit das Werkzeug UPPAAL verwendet. Neben den Modellierungskonzepten liefert diese Arbeit konkrete Ergebnisse über dieAuswirkungen der Kommunikationsformen auf die Zusammensetzbarkeit, was einen weiteren Beitrag darstellt. Basierend auf diesen Ergebnissen werden die Eigenschaften einer Kommunikationsform abgeleitet, die für die Zusammensetzbarkeit erforderlich sind,sowie Eigenschaften, die sich negativ auswirken. Im Hinblick auf die abgeleiteten Eigenschaften wird für die prozedurale Kommunikationdiskutiert, wie diese sich auf die Zusammensetzbarkeit auswirkt. Dafür wird sie in die synchrone und asynchrone Kommunikation eingeordnet. und asynchrone Kommunikation eingeordnet.)
Quantitativer Vergleich von Metriken für mehrdimensionale Abhängigkeiten + (In der datengetriebenen Forschung ist das … In der datengetriebenen Forschung ist das Analysieren hochdimensionaler Daten von zentraler Bedeutung. Hierbei ist es nicht immer ausreichend lediglich Abhängigkeiten zwischen Paaren von Attributen zu erkennen. Häufig sind hier Abhängigkeiten zwischen mehreren Attributen vorhanden, welche sich zwischen den zweidimensionalen Paaren nicht feststellen lassen. Zur Erkennung monotoner Zusammenhänge zwischen beliebig vielen Dimensionen existiert bereits eine mehrdimensionale Erweiterung des Spearman Rangkorrelationskoeffizienten, für beliebige Abhängigkeiten existiert jedoch kein solches erprobtes Maß. Hier setzt diese Arbeit an und vergleicht die beiden multivariaten informationstheoretischen Metriken "allgemeine Redundanz" und "Interaktionsinformation" miteinander. Als Basislinie für diesen Vergleich dienen die Spearman Rangkorrelation, sowie das Kontrastmaß von HiCS.rrelation, sowie das Kontrastmaß von HiCS.)
Modellierung geschachtelter Freiheitsgrade zur automatischen Evaluation von Software-Architekturen + (In der modernen Software-Entwicklung wird … In der modernen Software-Entwicklung wird eine Vielzahl von Subsystemen von Drittanbietern wiederverwendet, deren Realisierungen und Varianten jeweils einen dedizierten Einfluss auf die Qualitätseigenschaften des Gesamtsystems implizieren. Doch nicht nur die Realisierung und Variante eines Subsystems, sondern auch die Platzierung in der Zielarchitektur haben einen Einfluss auf die resultierende Qualität.In dieser Arbeit wird der bestehende Ansatz zur Modellierung und Simulation von wiederverwendbaren Subsystemen in Palladio bzw. PerOpteryx um einen neuen Inklusionsmechanismus erweitert, der eine flexible, feingranulare Modellierung und anschließende automatisierte Qualitätsoptimierung der Platzierung von wiederverwendbaren Subsystemen ermöglicht. Dazu wird eine domänenspezifische Sprache definiert, die eine deklarativen Beschreibung der Einwebepunkte in einem Architekturmodell durch aspektorientierte Semantiken erlaubt. Mithilfe eines Modellwebers werden die wiederverwendbaren Subsysteme in eine annotierte Zielarchitektur eingewebt. Schließlich wird der Ansatz in die automatisierte Qualitätsoptimierung von PerOpteryx integriert, sodass der Architekt bei seinen Entwurfsentscheidungen bezüglich dieser Freiheitsgrade unterstützt wird. Das vorgestellte Verfahren wurde durch eine simulationsbasierte Fallstudie anhand von realen Applikationsmodellen evaluiert. Es hat sich gezeigt, dass der Ansatz geeignet ist, um eine Vielzahl von Architekturkandidaten automatisiert generieren bzw. evaluieren und somit einen Architekten bei seinen Entwurfsentscheidungen unterstützen zu können.urfsentscheidungen unterstützen zu können.)
Identification and refactoring of bad smells in model-based analyses + (In der modernen Softwareentwicklung sind m … In der modernen Softwareentwicklung sind modellbasierte Analysen weit verbreitet. Software-Metriken wie die Vorhersage der Cache-Nutzung haben heute ein breites Anwendungsspektrum. Diese Analysen bedürfen ebenso wie traditionelle objektorientierte Programme der Pflege. Bad Smells und ihre Auswirkungen in objektorientiertem Quellcode sind gründlich erforscht worden. Dies fehlt bei der modellbasierten Analyse. Wir haben uns mit objektorientierten Bad Smells beschäftigt und nach ähnlichen Problemen in der modellbasierten Analyse gesucht. Schlechte Gerüche in der Analyse sind ein Faktor, der zur Qualität der Analysesoftware beiträgt. Eine geringere Qualität erschwert den Entwicklungsprozess der Analyse. Wir haben zehn neue Bad Smells entdeckt. Wir haben Algorithmen zur Identifizierung und zum Refaktorisieren für sie entwickelt. Wir stellen Implementierungen der Identifizierungsalgorithmen zur Verfügung und bewerten sie an- hand realer Software. Wir haben versucht, Bad Smells in bestehender Analysesoftware wie Camunda zu erkennen. Wir haben diese Bad Smells in den vorhandenen Analysen gefunden.ells in den vorhandenen Analysen gefunden.)
Automated GUI Testing of Web Applications with Large Language Models + (In der vorgestellten Arbeit wird das Poten … In der vorgestellten Arbeit wird das Potential von Large Language Models (LLMs) für die Automatisierung von GUI-Tests in Webanwendungen untersucht, eine Methode, die gegenüber dem traditionellen Ansatz des Monkey-Testing einige Vorteile bietet. Vier leistungsfähige LLMs, nämlich WizardLM, Vicuna (beide basierend auf LLAMA), GPT-3.5-Turbo und GPT-4-Turbo, werden hinsichtlich ihrer Fähigkeit, umfangreiche und relevante Teile des Codes durch Interaktion mit der Benutzeroberfläche auszuführen, evaluiert. Die Evaluation umfasst Tests an einer einfachen, für diese Studie entwickelten Proof-of-Concept-Anwendung sowie an PHPLiteAdmin, einem komplexeren Open-Source-Datenbank-Management-Tool.Die Ergebnisse zeigen, dass insbesondere die GPT-basierten Modelle in bestimmten Szenarien eine höhere Effizienz als der traditionelle Monkey-Tester aufweisen, vor allem bei der Generierung von sinnvollen Texteingaben. Dies unterstreicht das Innovationspotential von LLMs im Bereich der Software-Tests, zeigt aber auch die Herausforderungen und Grenzen auf, die bei der Anwendung auf komplexere Systeme zu erwarten sind. Diese Arbeit leistet somit einen wichtigen Beitrag zur Diskussion über die Weiterentwicklung und Optimierung automatisierter Testverfahren in der Softwareentwicklung. Testverfahren in der Softwareentwicklung.)
Eine Sprache für die Spezifikation disziplinübergreifender Änderungsausbreitungsregeln + (In der Änderungsausbreitungsanalyse wird u … In der Änderungsausbreitungsanalyse wird untersucht, wie sich Änderungen in Systemen ausbreiten. Dazu werden unter anderem Algorithmen entwickelt, die identifizieren, welche Elemente in einem System von einer Änderung betroffen sind. Für die Anpassung bestehender Algorithmen existiert keine spezielle Sprache, weshalb Domänenexperten universelle Programmiersprachen, wie Java, verwenden müssen, um Änderungsausbreitungen zu formulieren. Durch den imperativen Charakter von Java, benötigen Domänenexperten mehr Code und mehr Wissen über Implementierungsdetails, als sie mit einer, auf die Änderungs- ausbreitungsanalyse zugeschnittenen, Sprache bräuchten. Eine Sprache sollte stets an den Algorithmus der jeweiligen Änderungsausbreitungsanalyse angepasst sein. Für den in dieser Arbeit betrachteten Ansatz zur Änderungsausbreitungsanalyse mit der Bezeichnung Karlsruhe Architectural Maintainability Prediction (KAMP), besteht noch keine spezielle Sprache. KAMP ist ein Ansatz zur Bewertung architekturbasierter Änderungsanfragen, der in einem gleichnamigen Softwarewerkzeug implementiert ist. Diese Arbeit präsentiert mit der Change Propagation Rule Language (CPRL) eine spezielle Sprache für den, in KAMP verwendeten, Algorithmus der Änderungsausbreitungsanalyse. Zum Abschluss wird der Vorteil der entwickelten Sprache, gegenüber drei konkurrierenden Sprachen, ermittelt. Die Arbeit kommt zum Schluss, dass CPRL kompakter als konkurrierende Sprachen ist und es gleichzeitig erlaubt, die Mehrheit an denkbaren Änderungsausbreitungen zu beschreiben.ren Änderungsausbreitungen zu beschreiben.)
Untersuchung der Auswirkungen von Messdatenverschleierung auf Disaggregations-Qualität + (In diesem Vortrag geht es um den Schutz de … In diesem Vortrag geht es um den Schutz der Privatsphäre im Kontext von Smart Meter Daten. Im Rahmen einer Bachelorthesis werden Ansätze zur Verschleierung von Smart Meter Daten mittels bekannten Algorithmen zur Disaggregation evaluiert. Disaggregation bezeichnet dabei das extrahieren von Geräteverwendungen aus aggregierten Smart Meter Daten.dungen aus aggregierten Smart Meter Daten.)
Statische Extraktion von Laufzeit-Indikatoren + (In dieser Arbeit geht es um die Analyse vo … In dieser Arbeit geht es um die Analyse von LLVM-Quellcode mit dem Ziel, einen Indikator für die Anzahl der CPU-Instruktionen zu finden. Ein Indikator ist ein geschlossener Term, der für eine bestimmte Eingabe die Anzahl der CPU-Instruktionen eines Stück Codes liefert. Diese Definition korreliert mit der Eingabegröße eines Programmes. Wir analysieren den Kontrollflussgraph und Schleifenbedingungen, um Variablen im Code zu finden, die stellvertretend für die Eingabegröße stehen. Diese Indikator-Ermittlung ist ein Fundament für bessere Online-Autotuner in der Zukunft, die sich automatisch auf Eingaben wechselnder Größen einstellen können.aben wechselnder Größen einstellen können.)
Platzierung von Versteckten Ausreißern in Nutzerdaten + (In dieser Arbeit werden Methoden entwickel … In dieser Arbeit werden Methoden entwickelt um versteckte Ausreißer in Datensätzen zu platzieren. Versteckte Ausreißer sind dabei abweichende Datenpunkte die im Gesamtraum als abweichend erkannte werden können, aber in gewissen Teilräumen als normale Datenpunkte erscheinen. Zusätzlich werden benutzerdefinierte Einschränkungen entwickelt, die es einem Benutzer erlauben, den Bereich in dem versteckte Ausreißer platziert werden sollen, einzuschränken. Die Verfahren werden in unterschiedlichen Szenarien mit realen und synthetischen Daten evaluiert. realen und synthetischen Daten evaluiert.)
Modellgetriebene Konsistenzerhaltung von Automationssystemen + (In dieser Arbeit werden Verfahren entwicke … In dieser Arbeit werden Verfahren entwickelt, um die den Datenaustausch in Fabrikanlagen durch die Anwendung von modell- und änderungsgetriebener Konsistenzerhaltung, wie sie für die Softwaretechnik entwickelt wurde, zu unterstützen. In der Arbeit fokussieren wir uns dabei besonders auf die Eingabe einer fehlerhaften (nicht auflösbaren) Referenz. Dafür kategorisieren wir die Eigenschaften der Referenzen und des Typs des jeweiligen Fehlers und entwickeln basierend darauf ein Regelwerk. Zum anderen werden in CAEX Prototypen genutzt, um Objekte zu instantiieren. Dabei hängt es von den individuellen Eigenschaften ab, ob die Prototypen und Klone im Anschluss daran konsistent gehalten werden sollen. Hierfür entwickeln wir wiederum Kategorien für die jeweiligen Eigenschaften, und aufbauend darauf ein Regelwerk. Beispielsweise sollte bei einem Prototypen für einen Roboter eine Änderung an seiner Hardware nicht auf Klone übertragen werden, die bereits in Fabriken eingesetzt werden. Diesen Ansatz implementierten wir mithilfe des VITRUVIUS-Frameworks, das ein Framework zur modell- und änderungsgetriebenen Konsistenzerhaltung darstellt. Anhand dessen konnten wir die Funktionalität unserer Implementierung zeigen. Durch ein Beispielmodell konnten wir zeigen, dass unsere Kategorisierungen von Referenzen, Fehlertypen, Eigenschaften und Klonen in der Fabrikanlagenplanung anwendbar sind.n der Fabrikanlagenplanung anwendbar sind.)
Analyse von KI-Ansätzen für das Trainieren virtueller Roboter mit Gedächtnis + (In dieser Arbeit werden mehrere rekurrente … In dieser Arbeit werden mehrere rekurrente neuronale Netze verglichen.Es werden LSTMs, GRUs, CTRNNs und Elman Netze untersucht. Die Netze werden dabei untersucht sich einen Punkt zu merken und anschließend nach dem Punkt mit einem virtuellen Roboterarm zu greifen.Bei LSTM, GRU und Elman Netzen wird auch untersucht wie die Netze die Aufgabe lösen, wenn jedes Neuron nur auf den eigenen Speicher zugreifen kann.Dabei hat sich herausgestellt, dass LSTMs und GRUs deutlich besser bei den Experimenten bewertet werden als CTRNNs und Elman Netze.Außerdem werden die Rechenzeit und der Zusammenhang zwischen der Anzahl der zu trainierenden Parameter und der Ergebnisse der Experimente verglichen.der Ergebnisse der Experimente verglichen.)
Pattern-Based Heterogeneous Parallelization + (In dieser Arbeit werden zwei neue Arten de … In dieser Arbeit werden zwei neue Arten der Codegenerierung durch den automatisch parallelisierenden Übersetzer Aphes für beschleunigte Ausführung vorgestellt. Diese basieren auf zwei zusätzlich erkannten Mustern von implizitem Parallelismus in Eingabeprogrammen, nämlich Reduktionen in Schleifen und rekursive Funktionen die das Teile-und-herrsche-Muster umsetzen. Aphes hebt sich in zwei Punkten von herkömmlichen parallelisierenden Übersetzern hervor, die über das reine Parallelisieren hinausgehen: Der erste Punkt ist, dass Aphes sich auf heterogene Systeme spezialisiert. Das zweite Hervorstellungsmerkmal ist der Einsatz von Online-Autotuning. Beide Aspekte wurden während der Umsetzung dieser Arbeit beachtet. Aus diesem Grund setzen die von uns implementierten Code-Generatoren sowohl lokale Beschleunigung über OpenMP und C++11 Threads als auch entfernte Beschleunigung mittels Nvidias CUDA um. Desweiteren setzt der generierte Code weiter auf die bereits in Aphes vorhandene Infrastruktur zum Autotuning des generierten Maschinencodes zur Laufzeit.Während unserer Tests ließen sich in mit Aphes kompilierten Programmen mit Reduktionen in Schleifen Beschleunigungen von bis zu Faktor 50 gegenüber mit Clang kompilierten Programmen beobachten. Von Aphes transformierter Code mit rekursiven Funktionen erzielte Beschleunigungswerte von 3,15 gegenüber herkömmlich mit GCC und Clang generierten ausführbaren Dateien des gleichen Programms. In allen Fällen war der Autotuner in der Lage, innerhalb der ersten 50 Ausführungsiterationen des zu optimierenden Kernels zu konvergieren. Allerdings wiesen die konvergierten Ausführungszeiten teils erheblicheUnterschiede zwischen den Testläufen auf. Unterschiede zwischen den Testläufen auf.)
Dynamisches Autotuning mehrerer nominaler Parameter + (In dieser Arbeit wird dieses Problem unter … In dieser Arbeit wird dieses Problem unter Zuhilfenahme des Wissens über kausale Abhängigkeiten verschiedener Tuningaufgaben vereinfacht. Da sich die Fragen nach einigen Parameterwerten oft nur dann stellen, wenn andere Parameter gewisse Werte einnehmen, ist es unsinnig, erstere in jedem Fall in den Optimierungsprozess einzubeziehen. Insbesondere erlaubt das entwickelte Verfahren das verlustfreie, simultane Autotuning voneinander abhängiger nominaler und Verhältnisparameter, ohne auf möglicherweise wertvolle Informationen über deren gegenseitige Einflussnahme aufeinander zu verzichten.e Einflussnahme aufeinander zu verzichten.)
Schematisierung von Entwurfsentscheidungen in natürlichsprachiger Softwarearchitekturdokumentation + (In dieser Arbeit wird ein Schema entwickel … In dieser Arbeit wird ein Schema entwickelt, um Architekturentscheidungen aus Softwarearchitekturdokumentationen einzuordnen. Somit solldas Einordnen und Wiederverwenden von Entscheidungen in Softwarearchitekturdokumentation erleichtert werden.In meinem Ansatz wird ein Schema zur Einordnung entwickelt, das sich an aktuelle Literatur anlehnt und drei grundsätzliche Arten von Entscheidungen unterscheidet: Existenzentscheidungen, Eigenschaftenentscheidungen und Umgebungsentscheidungen.Zur Evaluation wurden Open-Source-Softwareprojekte mit natürlichsprachiger Softwarearchitekturdokumentationen betrachtet und iterativ überprüft, wo das aktuelle Schema verbessert werden kann. Zum Schluss wird vorgestellt, welche der Entscheidungsklassen sich im Palladio Component Model abbilden lassen. Palladio Component Model abbilden lassen.)
Entwicklungsmethoden für Produktfamilien + (In dieser Masterarbeit werden Methodiken e … In dieser Masterarbeit werden Methodiken erarbeitet, welche die Entwicklung von Produktlinien in der Modellbasierten Systementwicklung (MBSE) unterstützen sollen.Für die Verhaltensbeschreibung von Systemen werden unter anderem Aktivitätsdiagramme verwendet, die keine expliziten Konstrukte zur Modellierung von Variabilität anbieten. Deshalb wird in dieser Arbeit ein Ansatz zur Modellierung von Variabilität in Aktivitätsdiagrammen vorgestellt, der Metamodell-unabhängig ist und somit nicht nur für Aktivitätsdiagramme verwendet werden kann. Dieser Ansatz wird mit gängigen Ansätzen der Variabilitätsmodellierung verglichen und es wird unter anderem untersucht, inwieweit dieser Ansatz die Elementredundanz im Vergleich zu den anderen Ansätzen verringert. Anschließend wird erarbeitet, wie Aktivitätsdiagramme und gefärbte Petri-Netze untereinanderkonsistent gehalten werden können. Dazu werden deren Gemeinsamkeiten und Unterschiede herausgearbeitet, um Konsistenzhaltungsregeln zu definieren und die Grenzen der Konsistenzhaltung zu finden.Zum Abschluss wird skizziert, was notwendig ist, um die beiden Ansätze miteinander zu kombinieren, um eine Verhaltensbeschreibung einer Produktlinie aus Aktivitätsdiagrammen und gefärbten Petri-Netze zu erhalten, bei denen stets die Aktivitätsdiagramme und Petri-Netze der einzelnen Produktkonfigurationen konsistent zueinander sind.onfigurationen konsistent zueinander sind.)
Modeling Dynamic Systems using Slope Constraints: An Application Analysis of Gas Turbines + (In energy studies, researchers build model … In energy studies, researchers build models for dynamic systems to predict the produced electrical output precisely. Since experiments are expensive, the researchers rely on simulations of surrogate models. These models use differential equations that can provide decent results but are computationally expensive. Further, transition phases, which occur when an input change results in a delayed change in output, are modeled individually and therefore lacking generalizability.Current research includes Data Science approaches that need large amounts of data, which are costly when performing scientific experiments. Theory-Guided Data Science aims to combine Data Science approaches with domain knowledge to reduce the amount of data needed while predicting the output precisely.However, even state-of-the-art Theory-Guided Data Science approaches lack the possibility to model the slopes occuring in the transition phases. In this thesis we aim to close this gap by proposing a new loss constraint that represents both transition and stationary phases. Our method is compared with theoretical and Data Science approaches on synthetic and real world data.proaches on synthetic and real world data.)
Local Outlier Factor for Feature‐evolving Data Streams + (In high-volume data streams it is often un … In high-volume data streams it is often unpractical to monitor all observations -- often we are only interested in deviations from the normal operation. Detecting outlying observations in data streams is an active area of research. However, most approaches assume that the data's dimensionality, i.e., the number of attributes, stays constant over time. This assumption is unjustified in many real-world use cases, such as sensor networks or computer cluster monitoring.Feature-evolving data streams do not impose this restriction and thereby pose additional challenges.In this thesis, we extend the well-known Local Outlier Factor (LOF) algorithm for outlier detection from the static case to the feature-evolving setting. Our algorithm combines subspace projection techniques with an appropriate index structure using only bounded computational resources. By discarding old observations our approach also deals with concept drift. We evaluate our approach against the respective state-of-the-art methods in the static case, the streaming case, and the feature-evolving case.aming case, and the feature-evolving case.)
Architectural Generation of Context-based Attack Paths + (In industrial processes (Industry 4.0) and … In industrial processes (Industry 4.0) and other fields in our lives like the energy or health sector, the confidentiality of data becomes increasingly important. For the protection of confidential information on critical systems, it is crucial to be able to find relevant attack paths in different access-control contexts to a critical element. In order to minimize costs, it is important to already consider this issue in the design phase of the software architecture. There are already approaches considering the topic of attack path generation. However, they do not consider software architecture modeling or they do not consider both vulnerabilities and access control mechanisms. Hence, this thesis presents an approach for finding all potential attack paths in a software architecture model considering access control and vulnerabilities. However, all attack paths are often to many, so the approach presented here introduces and utilizes meaningful filter criteria based on wide-spread vulnerability classification standards.ad vulnerability classification standards.)
Fallstudie zur Privatsphäre in Connected-Car Systemen + (In jedem Software-System, in dem Nutzerdat … In jedem Software-System, in dem Nutzerdaten anfallen, muss deren Verarbeitung strengen Auflagen unterliegen. Das bislang strengste und am weitesten verbreitete dieser Gesetze ist die Europäische Datenschutz-Grundverordnung. Um unter dieser Verordnung Daten legal zu verarbeiten, ist es für Software-Entwickler sehr günstig, diese so früh wie möglich im Entwicklungsprozess zu berücksichtigen.Eine Möglichkeit, um datenschutzrechtliche Verstöße zur Designzeit festzustellen, ist die Datenflussanalyse. Dabei werden dem konventionellen Software-Modell noch Eigenschaften hinzugefügt, ebenso wie den modellierten Daten. Aus dem Aufruf-Graphen kann dann ein Datenflussdiagramm erstellt werden, welches anzeigt, welche Daten von welchen Komponenten wohin fließen. Diese Arbeit beschreibt eine Fallstudie, in welcher die Datenflussanalyse in einem konkreten System untersucht wird. Zunächst werden Anforderungen aufgestellt, welche eine Fallstudie der Bereiche Mobilität und Datenschutz erfüllen muss. Der wissenschaftliche Beitrag dieser Arbeit liegt dann in diesen Anforderungen sowie der testweisen Durchführung der Fallstudie. Dabei wird ein fiktives Ride-Pooling Unternehmen modelliert. Das Modell wird mithilfe der Datenflussanalyse untersucht, und aus den Ergebnissen werden Schlüsse über die Analysegezogen. werden Schlüsse über die Analyse gezogen.)
Predictability of Classiﬁcation Performance Measures with Meta-Learning + (In machine learning, classification is the … In machine learning, classification is the problem of identifying to which of a set of categories a new instance belongs. Usually, we cannot tell how the model performs until it is trained. Meta-learning, which learns about the learning algorithms themselves, can predict the performance of a model without training it based on meta-features of datasets and performance measures of previous runs. Though there is a rich variety of meta-features and performance measures on meta-learning, existing works usually focus on which meta-features are likely to correlate with model performance using one particular measure. The effect of different types of performance measures remain unclear as it is hard to draw a comparison between results of existing works, which are based on different meta-data sets as well as meta-models. The goal of this thesis is to study if certain types of performance measures can be predicted better than other ones and how much does the choice of the meta-model matter, by constructing different meta-regression models on same meta-features and different performance measures. We will use an experimental approach to evaluate our study.perimental approach to evaluate our study.)
Benchmarking Tabular Data Synthesis Pipelines for Mixed Data + (In machine learning, simpler, interpretabl … In machine learning, simpler, interpretable models require significantly more training data than complex, opaque models to achieve reliable results. This is a problem when gathering data is a challenging, expensive or time-consuming task. Data synthesis is a useful approach for mitigating these problems.An essential aspect of tabular data is its heterogeneous structure, as it often comes in ``mixed data´´, i.e., it contains both categorical and numerical attributes. Most machine learning methods require the data to be purely numerical. The usual way to deal with this is a categorical encoding.In this thesis, we evaluate a proposed tabular data synthesis pipeline consisting of a categorical encoding, followed by data synthesis and an optional relabeling of the synthetic data by a complex model. This synthetic data is then used to train a simple model. The performance of the simple model is used to quantify the quality of the generated data. We surveyed the current state of research in categorical encoding and tabular data synthesis and performed an extensive benchmark on a motivated selection of encoders and generators.ated selection of encoders and generators.)
Bad Smells and Antipatterns in Metamodeling + (In modern software development, metamodels … In modern software development, metamodels play an important role as they build the basis for domain-specific modeling languages, which are used for system design, simulation and code generation. Like any artifact in a software-development process, these languages and their respective models need to evolve over time. However, if metamodels that define those languages are badly designed, the evolution process is complicated and therefore additional effort has to be spent for maintenance. Such design problems are considered as a bad smell. Existing approaches to detect smells in metamodels deal mainly with simple defects or focus only on a small number of smells. Therefore, we present a comprehensive investigation of bad smells and antipatterns by reviewing design smells of object-oriented programming and, if possible, transfer them to metamodeling. These smells are in part automatically detectable, thus, we provide tool support with suitable detection methods as an extension for EMF Refactor. We evaluate this approach by testing every automatically detectable smell with appropriate models and an application of the tool support on an already existing large metamodel to evaluate the suggested refactorings.el to evaluate the suggested refactorings.)
Semi-automatic Consistency Preservation of Models + (In order to manage the high complexity of … In order to manage the high complexity of developing software systems, oftentimes several models are employed describing different aspects of the system under development. Models often contain redundant or dependent information, meaning changes to one model without adjustments to others representing the same concepts lead to inconsistencies, which need to be repaired automatically. Otherwise, developers would have to know all dependencies to preserve consistency by hand.For automated consistency preservation, model transformations can be used to specify how elements from one model correspond to those of another and define consistency preservation operations to fix inconsistencies. In this specification, it is not always possible to determine one generally correct way of preserving consistency without insight into the intentions of the developer responsible for making the changes. To be able to factor in underlying intentions, user interactions used to clarify the course of consistency preservation in ambiguous cases are needed. Existing approaches either do not consider user interactions during consistency preservation or provide an unstructured set of interaction options. In this thesis, we therefore identify a structured classification of user interaction types to employ during consistency preservation. By applying those types in preexisting case studies for consistency preservation between models in different application domains, we were able to show the applicability of these types in terms of completeness and appropriateness.Furthermore, software projects are rarely developed by a single person, meaning that multiple developers may work on the same models in different development branches and combine their work at some point using a merge operation. One reasonable option to merge different development branches of models is to track model changes and merge the change sequences by applying one after another. Since the model state changed due to changes made in the one branch, the changes in the other branch can potentially lead to different user decisions being necessary for consistency preservation. Nevertheless, most necessary decisions will be the same, which is why it would be useful to reuse the previously applied choices if possible. To achieve this, we provide a concept for storing and reapplying decisions during consistency preservation in this thesis. Thus, we establish which information is necessary and reasonable to represent a user interaction and allow for its correct reuse. By applying the reuse mechanism to a change scenario with several user interactions in one of the case studies mentioned above, we were able to show the feasibility of our overall concept for correctly reusing changes.all concept for correctly reusing changes.)
Review of dependency estimation with focus on data efficiency + (In our data-driven world, large amounts of … In our data-driven world, large amounts of data are collected in all kinds of environments. That is why data analysis rises in importance. How different variables influence each other is a significant part of knowledge discovery and allows strategic decisions based on this knowledge. Therefore, high-quality dependency estimation should be accessible to a variety of people. Many dependency estimation algorithms are difficult to use in a real-world setting. In addition, most of these dependency estimation algorithms need large data sets to return a good estimation. In practice, gathering this amount of data may be costly, especially when the data is collected in experiments with high costs for materials or infrastructure. I will do a comparison of different state-of-the-art dependency estimation algorithms. A list of 14 different criteria I but together, will be used to determine how promising the algorithm is. This study focuses especially on data efficiency and uncertainty of the dependency estimation algorithms. An algorithm with a high data efficiency can give a good estimation with a small amount of data. A degree of uncertainty helps to interpret the result of the estimator. This allows better decision-making in practice. The comparison includes a theoretical analysis and conducting different experiments with dependency estimation algorithms that performed well in the theoretical analysis.erformed well in the theoretical analysis.)
Relevance-Driven Feature Engineering + (In predictive maintenance scenarios, failu … In predictive maintenance scenarios, failure classification is challenging because large high-dimensional data volumes are being generated continuously in modern factories. Currently complex error analysis occurs manually based on recorded data in our industry use-case. The resulting misclassification leads to longer rework times. Our goal is to perform automated failure detection. In particular, this thesis builds a classification model to detect faulty engines in the vehicle manufacturing process. The work’s first part focuses on the binary anomaly detection classification problem and aims to predict an engine’s deficiency status. Here, we manage to recognize more than 90% of the faulty engines. In the second part, we extend our analysis to the multi-class classification problem with high-unbalanced classes. Here, our objective is to forecast the exact type of failure. To some extent, this situation shows similarities with the microarray analysis – we observe high-dimensional data with few instances available. This thesis develops a relevance-driven feature engineering meta-algorithm framework. We study the integration of feature relevance evaluation in the construction process of new features. We also use ensemble feature selection algorithms and define our own criteria to determine the relevance of feature subsets. These criteria are integrated in the feature engineering process in order to accelerate it by ignoring parts of the search space without significantly degrading the data quality. significantly degrading the data quality.)
Instrumentation with Runtime Monitors for Extraction of Performance Models during Software Evolution + (In recent times, companies are increasingl … In recent times, companies are increasingly looking to migrate their legacy software system to a microservice architecture. This large-scale refactor is often motivated by concerns over high levels of interdependency, developer productivity problems and unknown boundaries for functionality. However, modernizing legacy software systems has proven to be a difficult and complex process to execute properly. This thesis intends to provide a mean of decision support for this migration process in the form of an accurate and meaningful performance monitoring instrumentation and a performance model of said system. It specifically presents an instrumentation concept that incurs minimal performance overhead and is generally compatible with legacy systems implemented using object-oriented programming paradigms. In addition, the concept illustrates the extraction of performance model specifics with the monitoring data. This concept was developed on an enterprise legacy system provided by Capgemini. This concept was then implemented on this system. A subsequent case study was conducted to evaluate the quality of the concept.ed to evaluate the quality of the concept.)
Traceability Link Recovery for Relations in Natural Language Software Architecture Documentation and Software Architecture Models + (In software development, software architec … In software development, software architecture plays a vital role in developing and maintaining software systems. It is communicated through artifacts such as software architecture documentation (SAD) and software architecture models (SAM). However, maintaining consistency and traceability between these artifacts can be challenging. If there are inconsistencies or missing links, it can lead to errors, misunderstandings, and increased maintenance costs. This thesis proposes an approach for recovering traceability links of software architecture relations between natural language SAD and SAM. The approach involves the use of Pre-trained Language Models (PLMs) such as BERT and ChatGPT and supports different extraction modes and prompt engineering techniques for ChatGPT, as well as different model variants and training strategies for BERT. The proposed approach is integrated with ArDoCo, a tool that detects inconsistencies and recovers trace links between software artifacts. ArDoCo is used for pre-processing the SAD text and parsing the SAM, thus facilitating the traceability link recovery process. In order to assess the performance of the framework, a gold standard of SAD and SAM created from open-source projects is utilized. The evaluation shows that the ChatGPT approach has promising results in relation extraction with a recall of 0.81 and in traceability link recovery with an F1-score of 0.83, while BERT-based models struggle due to the lack of domain-specific training data.the lack of domain-specific training data.)
Coreference Resolution for Software Architecture Documentation + (In software engineering, software architec … In software engineering, software architecture documentation plays an important role. It contains many essential information regarding reasoning and design decisions. Therefore, many activities are proposed to deal with documentation for various reasons, e.g., extract- ing information or keeping different forms of documentation consistent. These activities often involve automatic processing of documentation, for example traceability link recovery (TLR). However, there can be problems for automatic processing when coreferences are present in documentation. A coreference occurs when two or more mentions refer to the same entity. These mentions can be different and create ambiguities, for example when there are pronouns. To overcome this problem, this thesis proposes two contributions to resolve coreferences in software architecture documentation.The first contribution is to explore the performance of existing coreference resolution models for software architecture documentation. The second is to divide coreference resolution into many more specific type of resolutions, like pronoun resolution, abbreviation resolution, etc. resolution, abbreviation resolution, etc.)
Automatic Context-Based Policy Generation from Usage- and Misusage-Diagrams + (In systems with a very dynamic process lik … In systems with a very dynamic process like Industry 4.0, contexts of allparticipating entities often change and a lot of data exchange happens withexternal organizations such as suppliers or producers which brings concernabout unauthorized data access. This creates the need for access controlsystems to be able to handle such a combination of a highly dynamic system andthe arising concern about the security of data. In many situations thedecision for access control depends on the context information of therequester. Another problem of dynamic system is that the manual developmentof access policies can be time consuming and expensive. Approaches usingautomated policy generation have shown to reduce this effort.In this master thesis we introduce a concept which combines context basedmodel-driven security with automated policy generation and evaluate if itis a suitable option for the creation of access control systems and if itcan reduce the effort in policy generation. The approach makes use of usageand misusage diagrams which are on a high architectural abstraction levelto derive and combine access policies for data elements which are locatedon a lower abstraction level. are located on a lower abstraction level.)
Encryption-aware SQL query log rewriting for LIKE predicates + (In the area of workflow analysis, the work … In the area of workflow analysis, the workflow in respect to e.g. a working process canbe analyzed by looking into the data which was used for the working process or createdduring the working process. The main contribution of this work is to extend CoVER in such a way that it supports LIKE predicates with order preserving encryption.edicates with order preserving encryption.)
Design Space Evaluation for Confidentiality under Architectural Uncertainty + (In the early stages of developing a softwa … In the early stages of developing a software architecture, many properties of the final system are yet unknown, or difficult to determine. There may be multiple viable architectures, but uncertainty about which architecture performs the best. Software architects can use Design Space Exploration to evaluate quality properties of architecture candidates to find the optimal solution.Design Space Exploration can be a resource intensive process. An architecture candidate may feature certain properties which disqualify it from consideration as an optimal candidate, regardless of its quality metrics. An example for this would be confidentiality violations in data flows introduced by certain components or combinations of components in the architecture. If these properties can be identified early, quality evaluation can be skipped and the candidate discarded, saving resources.Currently, analyses for identifying such properties are performed disjunct from the design space exploration process. Optimal candidates are determined first, and analyses are then applied to singular architecture candidates. Our approach augments the PerOpteryx design space exploration pipeline with an additional architecture candidate filter stage, which allows existing generic candidate analyses to be integrated into the DSE process. This enables automatic execution of analyses on architecture candidates during DSE, and early discarding of unwanted candidates before quality evaluation takes place.We use our filter stage to perform data flow confidentiality analyses on architecture candidates, and further provide a set of example analyses that can be used with the filter. We evaluate our approach by running PerOpteryx on case studies with our filter enabled. Our results indicate that the filter stage works as expected, able to analyze architecture candidates and skip quality evaluation for unwanted candidates.uality evaluation for unwanted candidates.)
Token-Based Plagiarism Detection for Statecharts + (In the field of software engineering, exis … In the field of software engineering, existing plagiarism detection systems have primarily focused on detecting cases of plagiarism in code. However, other artefacts such as models also play a crucial role in the development process. Statecharts, in particular, are used to model the behavior of a system. This thesis investigates the applicability and challenges of applying token-based plagiarism detection systems to statecharts. We extend the plagiarism detector JPlag to support detecting cases of plagiarism in statecharts. Our approach is evaluated using a dataset of student assignments from a modeling course, where we generate plagiarized statecharts by adopting common obfuscation attacks. We study the effects of the token-extraction strategy, sorting techniques and the minimum token match parameter. The results suggest that an approach tailored to the specific kind of model, such as statecharts, works better than a generic solution for models.better than a generic solution for models.)
Developing a Framework for Mining Temporal Data from Twitter as Basis for Time-Series Correlation Analysis + (In the last decade, ample research has bee … In the last decade, ample research has been produced regarding the value of user-generated data from microblogs as a basis for time series analysis in various fields.In this context, the objective of this thesis is to develop a domain-agnostic framework for mining microblog data (i.e., Twitter). Taking the subject related postings of a time series (e.g., inflation) as its input, the framework will generate temporal data sets that can serve as basis for time series analysis of the given target time series (e.g., inflation rate).To accomplish this, we will analyze and summarize the prevalent research related to microblog data-based forecasting and analysis, with a focus on the data processing and mining approach. Based on the findings, one or several candidate frameworks are developed and evaluated by testing the correlation of their generated data sets against the target time series they are generated for.While summative research on microblog data-based correlation analysis exists, it is mainly focused on summarizing the state of the field. This thesis adds to the body of research by applying summarized findings and generating experimental evidence regarding the generalizability of microblog data mining approaches and their effectiveness.mining approaches and their effectiveness.)
Evaluation architekturbasierter Performance-Vorhersage im Kontext automatisierter Fahrzeuge + (In the past decades, there has been an inc … In the past decades, there has been an increased interest in the development of automated vehicles. Automated vehicles are vehicles that are able to drive without the need for constant interaction by a human driver. Instead they use multiple sensors to observe their environment and act accordingly to observed stimuli. In order to avoid accidents, the reaction to these stimuli needs to happen in a sufficiently short amount of time. To keep implementation overhead and cost low, it is highly beneficial to know the reaction time of a system as soon as possible. Thus, being able to assess their performance already at design time allows system architects to make informed decisions when comparing software components for the use in automated vehicles. In the presented thesis, I analysed the applicability of architecture-based performance prediction in the context of automated vehicles using the Palladio Approach. In particular, I focused on the prediction of design-time worst-case reaction time as the reaction ability of automated vehicles, which is a crucial metric when assessing their performance.l metric when assessing their performance.)
Meta-Learning for Encoder Selection + (In the process of machine learning, the da … In the process of machine learning, the data to be analyzed is often not only numerical but also categorical data. Therefore, encoders are developed to convert categorical data into the numerical world. However, different encoders may have other impacts on the performance of the machine learning process. To this end, this thesis is dedicated to understanding the best encoder selection using meta-learning approaches. Meta-learning, also known as learning how to learn, serves as the primary tool for this study. First, by using the concept of meta-learning, we find meta-features that represent the characteristics of these data sets. After that, an iterative machine learning process is performed to find the relationship between these meta-features and the best encoder selection. In the experiment, we analyzed 50 datasets, those collected from OpenML. We collected their meta-features and performance with different encoders. After that, the decision tree and random forest are chosen as the meta-models to perform meta-learning and find the relationship between meta-features and the performance of the encoder or the best encoder. The output of these steps will be a ruleset that describes the relationship in an interpretable way and can also be generalized to new datasets.d can also be generalized to new datasets.)
Meta-learning for Encoder Selection + (In the real world, mixed-type data is comm … In the real world, mixed-type data is commonly used, which means it contains both categorical and numerical data. However, most algorithms can only learn from numerical data. This makes the selection of encoder becoming very important. In this presentation, I will present an approach by using ideas from meta-learning to predict the performance from the meta-features and encoders.mance from the meta-features and encoders.)
Robust Subspace Search + (In this thesis, the idea of finding robust … In this thesis, the idea of finding robust subspaces with help of an iterative process is being discussed. The process firstly aims for subspaces where hiding outliers is feasible. Subsequently, the subspaces used in the first part are being adjusted. In doing so, the convergence of this iterative process can reveal valuable insights in systems where the existence of hidden outliers poses a high risk (e.g. power station). The main part of this thesis will deal with the aspect of hiding outliers in high dimensional data spaces and the challenges resulting from such spaces.the challenges resulting from such spaces.)
Architectural Uncertainty Analysis for Access Control Scenarios in Industry 4.0 + (In this thesis, we present our approach to … In this thesis, we present our approach to handle uncertainty in access control during design time. We propose the concept of trust as a composition of environmental factors that impact the validity of and consequently trust in access control properties. We use fuzzy inference systems as a way of defining how environmental factors are combined. These trust values are than used by an analysis process to identify issues which can result from a lack of trust.We extend an existing data flow diagram approach with our concept of trust. Our approach of adding knowledge to a software architecture model and providing a way to analyze model instances for access control violations shall enable software architects to increase the quality of models and further verify access control requirements under uncertainty. We evaluate the applicability based on the availability, the accuracy and the scalability regarding the execution time. scalability regarding the execution time.)
Surrogate models for crystal plasticity - predicting stress, strain and dislocation density over time (Defense) + (In this work, we build surrogate models to … In this work, we build surrogate models to approximate the deformation behavior of face-centered cubic crystalline structures under load, based on the continuum dislocation dynamics (CDD) simulation. The CDD simulation is a powerful tool for modeling the stress, strain, and evolution of dislocations in a material, but it is computationally expensive. Surrogate models provide approximations of the results at a much lower computational cost. We propose two approaches to building surrogate models that only require the simulation parameters as inputs and predict the sequences of stress, strain, and dislocation density. The approaches comprise the use of time-independent multi-target regression and recurrent neural networks. We demonstrate the effectiveness by providing an extensive study of different implementations of both approaches. We find that, based on our dataset, a gradient-boosted trees model making time-independent predictions performs best in general and provides insights into feature importance. The approach significantly reduces the computational cost while still producing accurate results.st while still producing accurate results.)
Approximating an Ngram Corpus with Probabilistic Methods + (In this work, we consider ngram corpora, i … In this work, we consider ngram corpora, i.e., a set of word chains of different lengths and its usage frequency in natural language. For example, the 3-gram "bag of words" may be used 200 times. Obviously, there exists a dependence between the usage frequency of (1) the unigrams "bag", "of", and "words", (2) the bigrams "bag of" and "of words", and (3) the trigram "bag of words". This connection is partially used in language models to implement grammar correction or speech recognition. From a database point of view, the ngram corpus contains either redundant information or information that can be well estimated. This is an indication that we can achieve a high reduction of the corpus size while still providing its information with high accuracy.In this work, we research the connection between n- and (n+1)-grams and vice versa. Our objective is to store only a part of the full ngram corpus and estimate the rest of the corpus.orpus and estimate the rest of the corpus.)
Architecture-based Uncertainty Impact Analysis for Confidentiality + (In times of highly interconnected systems, … In times of highly interconnected systems, confidentiality becomes a crucial security quality attribute. As fixing confidentiality breaches becomes costly the later they are found, software architects should address confidentiality early in the design time. During the architectural design process, software architects take Architectural Design Decisions (ADDs) to handle the degrees of freedom, i.e. uncertainty. However, ADDs are often subjected to assumptions and unknown or imprecise information. Assumptions may turn out to be wrong so they have to be revised which re-introduces uncertainty. Thus, the presence of uncertainty at design time prevents from drawing precise conclusions about the confidentiality of the system. It is, therefore, necessary to assess the impact of uncertainties at the architectural level before making a statement about confidentiality. To address this, we make the following contributions: First, we propose a novel uncertainty categorization approach to assess the impact of uncertainties in software architectures. Based on that, we provide an uncertainty template that enables software architects to structurally derive types of uncertainties and their impact on architectural element types for a domain of interest. Second, we provide an Uncertainty Impact Analysis (UIA) that enables software architects to specify which architectural elements are directly affected by uncertainties. Based on structural propagation rules, the tool automatically derives further architectural elements which are potentially affected. Using the large-scale open-source contract tracing application called Corona Warn App (CWA) as a case study, we show that the UIA achieves 100% recall while maintaining 44%-91% precision when analyzing the impact of uncertainties on architectural elements.f uncertainties on architectural elements.)
Domain-specific Language for Data-driven Design Time Analyses and Result Mappings for Logic Programs + (In today's connected world, exchanging dat … In today's connected world, exchanging data is essential to many business applications. In order to cope with security requirements early, design time data flow analyses have been proposed. These approaches transform the modeled architecture into underlying formalisms such as logic programs. Constraints that check requirements often have to be formulated in terms of the underlying formalism. This requires architects to know about the formalism, the transformed architecture and the verification environment. We aim to bridge this gap between the architectural domain and the underlying formalism. We propose a domain-specific language (DSL) which enables architects to define individual constraints in terms of the architecture. Our approach maps the constraints and results between the architectural and the formalism automatically. Our evaluation indicates good overall expressiveness, usability and space efficiency for different sized data flow restrictions.or different sized data flow restrictions.)
Evaluating Subspace Search Methods with Hidden Outlier + (In today’s world, most datasets do not hav … In today’s world, most datasets do not have only a small number of attributes. The highnumber of attributes, which are referred to as dimensions, hinder the search of objectsthat normally not occur. For instance, consider a money transaction that has been notlegally carried out. Such objects are called outlier. A common method to detect outliersin high dimensional datasets are based on the search in subspaces of the dataset. Thesesubspaces have the characteristics to reveal possible outliers. The most common evaluation of algorithms searching for subspaces is based on benchmark datasets. However, thebenchmark datasets are often not suitable for the evaluation of these subspace search algorithms. In this context, we present a method that evaluates subspace search algorithmswithout relying on benchmark datasets by hiding outliers in the result set of a subspacesearch algorithm.result set of a subspace search algorithm.)
Verfeinerung von Zugriffskontrollrichtlinien unter Berücksichtigung von Ungewissheit in der Entwurfszeit + (In unserer vernetzten und digitalisierten … In unserer vernetzten und digitalisierten Welt findet ein zunehmender Austausch von Daten statt. Um die persönlichen Daten von Nutzern zu schützen, werden rechtliche Vorgaben in Form von obligatorischen Richtlinien für den Datenaustausch beschlossen. Diese sind in natürlicher Sprache verfasst und werden oft erst zu späten Entwurfs-Phasen der Softwareentwicklung berücksichtigt. Der fehlende Einbezug von Richtlinien, schon während der Entwurfs-Phase, kann zu unberücksichtigten Lücken der Vertraulichkeit führen. Diese müssen dann oft unter höheren Aufwänden in späteren Anpassungen behoben werden. Eine Verfeinerung der Richtlinien, die bereits zur Entwurfszeit von Software ansetzt, kann einem Softwarearchitekten frühzeitig Hinweise auf kritische Eigenschaften oder Verletzungen der Software liefern und hilft diese zu vermeiden. Das Ziel dieser Arbeit ist es, einen Verfeinerungsansatz trotz Ungewissheiten durch mangelnde Informationen zu entwickeln. Die Erkennung und Einordnung von Ungewissheiten erfolgt basierend auf einer Taxonomie von Ungewissheit. Der Verfeinerungsprozess analysiert verschiedene Abstraktionsebenen einer Softwarearchitektur, angefangen bei der Systemebene, über einzelne Komponenten hin zu Aufrufen von Diensten und deren Schnittstellen. Mögliche Verletzungen der eingegebenen Richtlinien werden durch die Erstellung eines Zugriffskontrollgraphen, der Dekomposition des Graphen und der Identifikation einzelner Serviceaufrufe festgestellt. Die identifizierten, kritischen Elemente der Softwarearchitektur werden ausgegeben.der Softwarearchitektur werden ausgegeben.)
Derivation of Change Sequences from State-Based File Differences for Delta-Based Model Consistency + (In view-based software development, views … In view-based software development, views may share concepts and thus contain redundant or dependent information. Keeping the individual views synchronized is a crucial property to avoid inconsistencies in the system. In approaches based on a Single Underlying Model (SUM), inconsistencies are avoided by establishing the SUM as a single source of truth from which views are projected. To synchronize updates from views to the SUM, delta-based consistency preservation is commonly applied. This requires the views to provide fine-grained change sequences which are used to incrementally update the SUM. However, the functionality of providing these change sequences is rarely found in real-world applications. Instead, only state-based differences are persisted. Therefore, it is desirable to also support views which provide state-based differences in delta-based consistency preservation. This can be achieved by estimating the fine-grained change sequences from the state-based differences.This thesis evaluates the quality of estimated change sequences in the context of model consistency preservation. To derive such sequences, matching elements across the compared models need to be identified and their differences need to be computed. We evaluate a sequence derivation strategy that matches elements based on their unique identifier and one that establishes a similarity metric between elements based on the elements’ features. As an evaluation baseline, different test suites are created. Each test consists of an initial and changed version of both a UML class diagram and consistent Java source code. Using the different strategies, we derive and propagate change sequences based on the state-based difference of the UML view and evaluate the outcome in both domains. The results show that the identity-based matching strategy is able to derive the correct change sequence in almost all (97 %) of the considered cases. For the similarity-based matching strategy we identify two reoccurring error patterns across different test suites. To address these patterns, we provide an extended similarity-based matching strategy that is able to reduce the occurrence frequency of the error patterns while introducing almost no performance overhead.ntroducing almost no performance overhead.)
Vergleich verschiedener Sprachmodelle für den Einsatz in automatisierter Rückverfolgbarkeitsanalyse + (Informationen über logische Verbindungen z … Informationen über logische Verbindungen zwischen Anforderungen und ihrer Umsetzung in Quelltext sind nützlich für viele Aufgabenstellungen der Softwareentwicklung. Sie können beispielsweise die Wartung von Software bei Anforderungs-Änderungen erleichtern. Diese Rückverfolgbarkeitsverbindungen können im Zuge einer Rückverfolgbarkeitsanalyse ermittelt werden. Verfahren, wie FTLR, führen eine automatisierte Rückverfolgbarkeitsanalyse durch. FTLR erkennt Rückverfolgbarkeitsverbindungen mithilfe eines Vergleichs von Repräsentationen von Anforderungen und Quelltext. Bislang setzt FTLR das Sprachmodell fastText zur Repräsentation von Anforderungen und Quelltext ein. Der Ansatz fastText besitzt jedoch Schwachstellen. Das Sprachmodell ist nicht in der Lage verschiedene Bedeutungen eines Wortes zu repräsentieren. Außerdem wurde es nicht auf Quelltext vortrainiert. In dieser Arbeit wurde untersucht, ob sich alternative Sprachmodelle ohne diese Schwachstellen besser zum Einsatz in FTLR eigenen als fastText. In einem Experiment auf fünf Vergleichsdatensätzen für die Rückverfolgbarkeitsanalyse wurden die Ergebnisse der beiden alternativen Sprachmodelle UniXcoder und Wikipedia2Vec mit fastText verglichen. Das Sprachmodell UniXcoder eignet sich auf den Vergleichsdatensätzen iTrust und LibEST besser als fastText. Das Sprachmodell Wikipedia2Vec eignet sich auf keinem der eingesetzten Vergleichsdatensätze besser als fastText. Im Durchschnitt über alle verwendeten Testdatensätze eignet sich fastText besser für den Einsatz in FTLR als UniXcoder und Wikipedia2Vec.z in FTLR als UniXcoder und Wikipedia2Vec.)
Injection Molding Simulation based on Graph Neural Networks + (Injection molding simulations are importan … Injection molding simulations are important tools for the development of new injection molds. Existing simulations mostly are numerical solvers based on the finite element method. These solvers are reliable and precise, but very computionally expensive even on simple part geometries. In this thesis, we aim to develop a faster injection molding simulation based on Graph Neural Networks (GNNs). Our approach learns a simulation as a composition of three functions: an encoder, a processor and a decoder. The encoder takes in a graph representation of a 3D geometry of a mold part and returns a numeric embedding of each node and edge in the graph. The processor updates the embeddings of each node multiple times based on its neighbors. The decoder then decodes the final embeddings of each node into physically meaningful variables, say, the fill time of the node. The envisioned GNN architecture has two interesting properties: (i) it is applicable to any kind of material, geometry and injection process parameters, and (ii) it works without a “time integrator”, i.e., it predicts the final result without intermediate steps. We plan to evaluate our architecture by its accuracy and runtime when predicting node properties. We further plan to interpret the learned GNNs from a physical perspective. learned GNNs from a physical perspective.)
Verknüpfung von Textelementen zu Softwarearchitektur-Modellen mit Hilfe von Synsets + (Inkonsistenzen bei der Benennung von Texte … Inkonsistenzen bei der Benennung von Textelementen einer Softwarearchitektur-Dokumentation (SAD) und Modellelementen eines Softwarearchitektur-Modells (SAM) führen zu Problemen bei der Rückverfolgbarkeit. Statt einem direkten Vergleich zwischen den Bezeichnern der Textelemente und den Namen der Modellelemente wird deshalb ein semantischer Vergleich auf Basis von Synsets durchgeführt, die durch die Auflösung sprachlicher Mehrdeutigkeiten (WSD, Word Sense Disambiguation) ermittelt werden. Mit einem WSD-Algorithmus werden die Bedeutungen der Textelemente im Kontext der SAD in Form von Synsets bestimmt. Über diese Synsets werden Synonyme der Textelemente verwendet, um eine Verknüpfung mit den Modellelementen herzustellen. Dadurch ist es möglich, Textelemente zu Modellelementen zuzuordnen, die semantisch dasselbe Element abbilden, aber unterschiedlich benannt sind.bilden, aber unterschiedlich benannt sind.)
Modeling and analyzing zero-trust architectures taking into account various quality objectives + (Integrating a Zero Trust Architecture (ZTA … Integrating a Zero Trust Architecture (ZTA) into a system is a step towards establishing a good defence against external and internal threats. However, there are different approaches to integrating a ZTA which vary in the used components, their assembly and allocation. The earlier in the development process those approaches are evaluated and the right one is selected the more costs and effort can be reduced. In this thesis, we analyse the most prominent standards and specifications for integrating a ZTA and derive a general model by extracting core ZTA tasks and logical components. We model these using the Palladio Component Model to enable assessing ZTAs at design time. We combine performance and security annotations to create a single model which supports both performance and security analysis. By doing this we also assess the possibility of combining performance and security analyses.mbining performance and security analyses.)
Streaming MMD Change Detection + (Kernel methods are among the most well-kno … Kernel methods are among the most well-known approaches in data science. Their ability to represent probability distributions as elements in a reproducing kernel Hilbert space gives rise to maximum mean discrepancy (MMD). MMD quantifies the dissimilarity of two distributions and allows powerful two-sample tests on many domains. One important application of general two-sample tests is change detection in data streams: Here, one tests the null hypothesis that the distributions of data within the stream do not change versus the alternative hypothesis that the distributions do change; a change in distribution then indicates a change point. The broad applicability of kernel-based two-sample tests renders their use for change detection in data streams highly desirable. But, their quadratic runtime complexity prohibits their application. While approximations for kernel methods that reduce their runtime in the static setting exist, their application to data streams is challenging.In this thesis, we propose a novel change detector, RADMAN, which leverages the random Fourier feature-based kernel approximation to efficiently detect changes in data streams with a polylogarithmic runtime complexity of O(log^2 n) per insert operation, with n the total number of observations. The proposed approach runs significantly faster than existing methods but obtains similar result quality. Our experiments on synthetic and real-world data sets show that it performs better than current state-of-the-art approaches. than current state-of-the-art approaches.)
Ein Datensatz handgezeichneter UML-Klassendiagramme für maschinelle Lernverfahren + (Klassendiagramme ermöglichen die grafische … Klassendiagramme ermöglichen die grafische Modellierung eines Softwaresystems.Insbesondere zu Beginn von Softwareprojekten entstehen diese als handgezeichnete Skizzen auf nicht-digitalen Eingabegeräten wie Papier oder Whiteboards.Das Festhalten von Skizzen dieser Art ist folglich auf eine fotografische Lösung beschränkt.Eine digitale Weiterverarbeitung einer auf einem Bild gesicherten Klassendiagrammskizze ist ohne manuelle Rekonstruktion in ein maschinell verarbeitbares Diagramm nicht möglich.Maschinelle Lernverfahren können durch eine Skizzenerkennung eine automatisierte Transformation in ein digitales Modell gewährleisten.Voraussetzung für diese Verfahren sind annotierte Trainingsdaten.Für UML-Klassendiagramme sind solche bislang nicht veröffentlicht.Diese Arbeit beschäftigt sich mit der Erstellung eines Datensatzes annotierter UML-Klassendiagrammskizzen für maschinelle Lernverfahren.Hierfür wird eine Datenerhebung, ein Werkzeug für das Annotieren von UML-Klassendiagrammen und eine Konvertierung der Daten in ein Eingabeformat für das maschinelle Lernen präsentiert.Der annotierte Datensatz wird im Anschluss anhand seiner Vielfältigkeit, Detailtiefe und Größe bewertet.Zur weiteren Evaluation wird der Einsatz des Datensatzes an einem maschinellen Lernverfahren validiert.Das Lernverfahren ist nach dem Training der Daten in der Lage, Knoten mit einem F1-Maß von über 99%, Textpositionen mit einem F1-Maß von über 87% und Kanten mit einem F1-Maß von über 71% zu erkennen.Die Evaluation zeigt folglich, dass sich der Datensatz für den Einsatz maschineller Lernverfahren eignet.Einsatz maschineller Lernverfahren eignet.)

Analyzing Efficiency of High-Performance Applications + (Kurzfassung)
Analyzing Scientific Workflow Management Systems + (Kurzfassung)
Commit-basierte kontinuierliche Integration von Leistungsmodellen + (Kurzfassung)
Concept and Implementation of a Delta Chain + (Kurzfassung)
Definition einer Referenzarchitektur für organisationsübergreifende Zusammenarbeit in modellbasierten Entwicklungsprozessen zur Wahrung des geistigen Eigentums + (Kurzfassung)
Efficient Reduction of Energy Time Series + (Kurzfassung)
Entwurf eines Migrationsverfahren für Microsoft Access Anwendungen + (Kurzfassung)
Erzeugung von Verschlüsselungsregeln auf Modelländerungen aus Zugriffskontrollregeln auf Modellelementen + (Kurzfassung)
Evaluation und Optimierung der Wartbarkeit von Software-Architekturen + (Kurzfassung)
Extraktion von Label-Propagationsfunktionen für Informationsflussanalysen aus architekturellen Verhaltensbeschreibungen + (Kurzfassung)
Iterative Quelltextanalyse für Informationsflusssicherheit zur Überprüfung von Vertraulichkeit auf Architekturebene + (Kurzfassung)
Optimierung des Migrationsverfahrens in modellbasierten E/E-Entwicklungswerkzeugen durch bedarfsorientierte Prozessierung der Historie von Bestandsmodellen + (Kurzfassung)
Retrieval-Augmented Large Language Models for Traceability Link Recovery + (Kurzfassung)
Source-Target-Mapping von komplexen Relationen in Modell-zu-Modell-Transformationen + (Kurzfassung)

Exploring The Robustness Of The Natural Language Inference Capabilties Of T5 + (Large language models like T5 perform exce … Large language models like T5 perform excellently on various NLI benchmarks. However, it has been shown that even small changes in the structure of these tasks can significantly reduce accuracy. I build upon this insight and explore how robust the NLI skills of T5 are in three scenarios. First, I show that T5 is robust to some variations in the MNLI pattern, while others degenerate performance significantly. Second, I observe that some other patterns that T5 was trained on can be substituted for the MNLI pattern and still achieve good results. Third, I demonstrate that the MNLI pattern translate well to other NLI datasets, even improving accuracy by 13% in the case of RTE. All things considered, I conclude that the robustness of the NLI skills of T5 really depend on which alterations are applied.y depend on which alterations are applied.)
Theory-Guided Data Science for Lithium-Ion Battery Modeling + (Lithium-ion batteries are driving innovati … Lithium-ion batteries are driving innovation in the evolution of electromobility and renewable energy. These complex, dynamic systems require reliable and accurate monitoring through Battery Management Systems to ensure the safety and longevity of battery cells. Therefore an accurate prediction of the battery voltage is essential which is currently realized by so-called Equivalent Circuit (EC) Models. Although state-of-the-art approaches deliver good results, they are hard to train due to the high number of variables, lacking the ability to generalize, and need to make many simplifying assumptions. In contrast to theory-based models, purely data-driven approaches require large datasets and are often unable to produce physically consistent results. Theory-Guided Data Science (TGDS) aims at using scientific knowledge to improve the effectiveness of Data Science models in scientific discovery. This concept has been very successful in several domains including climate science and material research. Our work is the first one to apply TGDS to battery systems by working together closely with domain experts. We compare the performance of different TGDS approaches against each other as well as against the two baselines using only theory-based EC-Models and black-box Machine Learning models.els and black-box Machine Learning models.)
Attention Based Selection of Log Templates for Automatic Log Analysis + (Log analysis serves as a crucial preproces … Log analysis serves as a crucial preprocessing step in text log data analysis, including anomaly detection in cloud system monitoring. However, selecting an optimal log parsing algorithm tailored to a specific task remains problematic.With many algorithms to choose from, each requiring proper parameterization, making an informed decision becomes difficult. Moreover, the selected algorithm is typically applied uniformly across the entire dataset, regardless of the specific data analysis task, often leading to suboptimal results.In this thesis, we evaluate a novel attention-based method for automating the selection of log parsing algorithms, aiming to improve data analysis outcomes. We build on the success of a recent Master Thesis, which introduced this attention-based method and demonstrated its promising results for a specific log parsing algorithm and dataset. The primary objective of our work is to evaluate the effectiveness of this approach across different algorithms and datasets. across different algorithms and datasets.)
Metamodel Evolution in the Context of a MOF-Based Metamodeling Infrastructure + (Lorem ipsum dolor sit amet, consetetur sad … Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.ta sanctus est Lorem ipsum dolor sit amet.)
Evaluation of Automated Feature Generation Methods + (Manual feature engineering is a time consu … Manual feature engineering is a time consuming and costly activity, when developing new Machine Learning applications, as it involves manual labor of a domain expert. Therefore, efforts have been made to automate the feature generation process. However, there exists no large benchmark of these Automated Feature Generation methods. It is therefore not obvious which method performs well in combination with specific Machine Learning models and what the strengths and weaknesses of these methods are. In this thesis we present an evaluation framework for Automated Feature Generation methods, that is integrated into the scikit-learn framework for Python. We integrate nine Automated Feature Generation methods into this framework.We further evaluate the methods on 91 datasets for classification problems. The datasets in our evaluation have up to 58 features and 12,958 observations. As Machine Learning models we investigate five models including state of the art models like XGBoost.ding state of the art models like XGBoost.)
Surrogate Model Based Process Parameters Optimization of Textile Forming + (Manufacturing optimization is crucial for … Manufacturing optimization is crucial for organizations to remain competitive in the market. However, complex processes, such as textile forming, can be challenging to optimize, requiring significant resources. Surrogate-based optimization is an efficient method that uses simplified models to guide the search for optimal parameter combinations of manufacturing processes. Moreover, incorporating uncertainty estimates into the model can further speed up the optimization process, which can be achieved by using Bayesian deep neural networks. Additionally, convolutional neural networks can take advantage of spatial information in the images that are part of the textile forming parameters. In this work, a Bayesian deep convolutional surrogate model is proposed that uses all available process parameters to predict the shear angle of a textile element. By incorporating background information into the surrogate model, it is expected to predict detailed process results, leading to greater efficiency and increased product quality. efficiency and increased product quality.)
Streaming Model Analysis - Synergies from Stream Processing and Incremental Model Analysis + (Many modern applications take a potentiall … Many modern applications take a potentially infinite stream of events as input to interpret and process the data. The established approach to handle such tasks is called Event Stream Processing. The underlying technologies are designed to process this stream efficiently, but applications based on this approach can become hard to maintain, as the application grows. A model-driven approach can help to manage increasing complexity and changing requirements. This thesis examines how a combination of Event Stream Processing and Model-Driven Engineering can be used to handle an incoming stream of events. An architecture that combines these two technologies is proposed and two case studies have been performed. The DEBS grand challenges from 2015 and 2016 have been used to evaluate applications based on the proposed architecture towards their performance, scalability and maintainability. The result showed that they can be adapted to a variety of change scenarios with an acceptable cost, but that their processing speed is not competitive.their processing speed is not competitive.)
Empirical Identification of Performance Influences of Configuration Options in High-Performance Applications + (Many modern high-performance applications … Many modern high-performance applications are highly-configurable software systems that provide hundreds or even thousands of configuration options. System administrators or application users need to understand all these options and their impacts on the software performance to choose suitable configuration values. To understand the influence of configuration options on the run-time characteristics of a software system, users can use performance prediction models, but building performance prediction models for highly-configurable high-performance applications is expensive. However, not all configuration options, which a software system offers, are performance-relevant. Removing these performance-irrelevant configuration options from the modeling process can reduce the construction cost. In this thesis, we explore and analyze two different approaches to empirically identify configuration options that are not performance-relevant and can be removed from the performance prediction model. The first approach reuses existing performance modeling methods to create much cheaper prediction models by using fewer samples and then analyzing the models to identify performance-irrelevant configuration options. The second approach uses white-box knowledge acquired through dynamic taint analysis to systematically construct the minimal number of required experiments to detect performance-irrelevant configuration options. In the evaluation with a case study, we show that the first approach identifies performance-irrelevant configuration options but also produces misclassifications. The second approach did not perform to our expectations. Further improvement is necessary.tations. Further improvement is necessary.)
Enabling the Information Transfer between Architecture and Source Code for Security Analysis + (Many software systems have to be designed … Many software systems have to be designed and developed in a way that specific security requirements are guaranteed. Security can be specified on different views of the software system that contain different kinds of information about the software system. Therefore, a security analysis on one view must assume security properties of other views. A security analysis on another view can be used to verify these assumptions. We provide an approach for enabling the information transfer between a static architecture analysis and a static, lattice-based source code analysis. This approach can be used to reduce the assumptions in a component-based architecture model. In this approach, requirements under which information can be transferred between the two security analyses are provided. We consider the architecture and source code security analysis as black boxes. Therefore, the information transfer between the security analyses is based on a megamodel consisting of the architecture model, the source code model, and the source code analysis results. The feasibility of this approach is evaluated in a case study using Java Object-sensitive ANAlysis and Confidentiality4CBSE. The evaluation shows that information can be transferred between an architecture and a source code analysis. The information transfer reveals new security violations which are not found using only one security analysis.ot found using only one security analysis.)
Auswirkungen von Metamodellen auf Modellanalysen + (Metamodelle sind das zentrale Artefakt bei … Metamodelle sind das zentrale Artefakt bei der modellgetriebenen Softwareentwicklung. Obwohl viele Qualitätsattribute und Evaluierungsmechanismen für Metamodelle bekannt sind, ist es noch nicht empirisch untersucht, welche Auswirkungen Metamodelle auf andere Artefakten haben. Die gegenwärtige Ausarbeitung beschäftigt sich mit der Auswirkung von Metamodellen auf andere Artefakte der Softwareentwicklung. Genauer wird untersucht, inwieweit die Qualitätsattribute von Metamodellen die Modellanalysen und die Modelltransformationen beeinflussen. Zu diesem Zweck werden verschiedene Artefakte analysiert – die Ergebnisse aus Metamodell-Metriken, Code-Metriken von Modellanalysen und ATL-Transformationen, sowie manuellen Bewertungen von Metamodellen. Die Daten werden analysiert, Korrelationen werden bestimmt und Abhängigkeiten werden aufgedeckt.immt und Abhängigkeiten werden aufgedeckt.)
Enabling Architectural Performability Analyses for Microservices via Design Pattern Completions + (Microservices architectures have gained po … Microservices architectures have gained popularity over the recent years, especially since global players in the internet economy changed to this architectural style. Many architectural patterns for recurring problems were identified, such as the Service Discovery for service registration or Client-side Load Balancing for load distribution.Architectural analyses with the Palladio framework allow for the investigation of the attainment of these requirements during design time. The Architectural Templates method combines architecture models with architectural patterns and styles and allows for design-time analyses.In this thesis, we create a Microservices Architectural Templates catalog, containing microservices Architectural Templates. A selection of widely used patterns is analyzed and conceptually mapped to the Architectural Templates method.A case study, conducted with a sample application representing a customer relationship management application, shows that software architects can profit from the provided templates by automatic model completions and accurate analyses results.completions and accurate analyses results.)
Differentially Private Event Sequences over Infinite Streams + (Mit Smart Metern erfasste Datenströme stel … Mit Smart Metern erfasste Datenströme stellen eine Gefahr für die Privatheit dar, sodass Bedarf für Privatheitsverfahren besteht. Aktueller Stand der Technik für Datenströme ist w-event differential privacy. Dies wurde bisher v.a. für die Publikation von Histogram-Queries verwendet. Ziel dieser Arbeit ist die eingehende experimentelle Analyse der Mechanismen, mit dem Fokus darauf zu beurteilen, wie gut diese Mechanismen sich für die Publikation von Sum-Queries, wie sie im Smart Meter Szenario gebraucht werden, eignen. Die Arbeit besteht aus drei Teilen: (1) Reproduktion der in der Literatur propagierten guten Ergebnisse der wichtigsten w-event DP Mechanismen für Histogram-Queries, (2) Evaluierung deren Qualität bei Anwendung auf Smart Meter Daten (Sum-Queries), (3) Evaluierung der Qualität zweier Mechanismen bzgl. der Gewährleistung von Pan-Privacy, einer erweiterten Garantie. Während wir in (1) die Ergebnisse größtenteils nicht reproduzieren konnten, erzielten wir in (2) gute Ergebnisse. Bzgl. (3) gelang es uns, die theoretische Qualitätsanalyse aus der Literatur zu bestätigen.tsanalyse aus der Literatur zu bestätigen.)
Modellierung und Simulation von dynamischen Container-basierten Software-Architekturen in Palladio + (Mit dem Palladio Komponentenmodell (PCM) l … Mit dem Palladio Komponentenmodell (PCM) lassen sich Softwaresysteme modellieren und simulieren. Moderne verteilte Software-Systeme werden jedoch nicht mehr einfach statisch deployed, sondern es wird ein gewünschter Zustand definiert, der mithilfe einer Kontrollschleife dann eingehalten werden soll. Das passiert dann bspw. durch das Starten oder Stoppen von Containern und Pods. In dieser Arbeit wurde eine Erweiterung des PCM um die Konzepte von Containerorchestrierungswerkzeugen wie Kubernetes erarbeitet und umgesetzt. Zusätzlich wurde ein Konzept erarbeitet um dynamische Containerbasierte Systeme zu simulieren. Es wurde dabei insbesondere die Allokation bzw. Reallokation von Pods zur Simulationszeit betrachtet. Abschließend wurde die Modellerweiterung evaluiert.end wurde die Modellerweiterung evaluiert.)
Tradeoff zwischen Privacy und Utility für Short Term Load Forecasting + (Mit der Etablierung von Smart Metern gehen … Mit der Etablierung von Smart Metern gehen verschiedene Vor- und Nachteile einher. Einerseits bieten die Smart Meter neue Möglichkeiten Energieverbräuche akkurater vorherzusagen (Forecasting) und sorgen damit für eine bessere Planbarkeit des Smart Grids. Andererseits können aus Energieverbrauchsdaten viele private Informationen extrahiert werden, was neue potentielle Angriffsvektoren auf die Privatheit der Endverbraucher impliziert. Der Schutz der Privatheit wird in der Literatur durch verschiedene Perturbations-Methoden umgesetzt. Da Pertubation die Daten verändert, sorgt dies jedoch für weniger akkurate Forecasts. Daher gilt es ein Tradeoff zu finden. In dieser Arbeit werden verschiedene gegebene Techniken zur Perturbation hinsichtlich ihrer Privacy (Schutz der Privatheit) und Utility (Akkuratheit der Forecasts) experimentell miteinander verglichen. Hierzu werden verschiedene Datensätze, Forecasting-Algorithmen und Metriken zur Bewertung von Privacy und Utility herangezogen. Die Arbeit kommt zum Schluss, dass die so genannte Denoise- und WeakPeak-Technik zum Einstellen eines Tradeoffs zwischen Privacy und Utility besonders geeignet ist.rivacy und Utility besonders geeignet ist.)
Einbindung eines EDA-Programms zur Erstellung elektronischer Leiterplatten in das Vitruvius-Framework + (Mithilfe der modellgetriebenen Softwareent … Mithilfe der modellgetriebenen Softwareentwicklung kann im Entwicklungsprozess eines Software-Systems, dieses bzw. dessen Teile und Abstraktionen durch Modelle beschrieben werden. Diese Modelle können untereinander in Abhängigkeitsbeziehungen stehen sowie über redundante Informationen verfügen. Um Inkonsistenzen zu vermeiden, werden Tools zur automatisierten Konsistenzhaltung eingesetzt.In dieser Arbeit wird das EDA-Programm Eagle, das zur Erstellung elektronischer Schaltpläne und Leiterplatten genutzt wird, in das Vitruvius-Framework eingebunden. Bestandteile sind hierbei das Ableiten eines Ecore-Metamodells, das die Schaltplandatei von Eagle beschreibt, das Etablieren von Transformationen zwischen Ecore-Modellen und Schaltplandateien sowie das Extrahieren von Änderungen zwischen zwei chronologisch aufeinanderfolgenden Schaltplandateien. Die extrahierten Änderungen werden in das Vitruvius-Framework eingespielt, wo sie durch das Framework zu in Konsistenzbeziehung stehenden Ecore-Modellen propagiert werden. Zudem wird ein Verfahren eingesetzt, um Änderungen in der Schaltplandatei einem eindeutigen elektronischen Bauteil zuordnen zu können. Dies ist erforderlich, um Bauteile im Kontext mit anderen Programmen zu verfolgen, da die Eigenschaften eines Bauteils in verschiedenen Programmen variieren können.verschiedenen Programmen variieren können.)
Automated Extraction of Stateful Power Models for Cyber Foraging Systems + (Mobile devices are strongly resource-const … Mobile devices are strongly resource-constrained in terms of computing and battery capacity. Cyber-foraging systems circumvent these constraints by offloading a task to a more powerful system in close proximity. Offloading itself induces additional workload and thus additional power consumption on the mobile device. Therefore, offloading systems must decide whether to offload or to execute locally. Power models, which estimate the power consumption for a given workload can be helpful to make an informed decision.Recent research has shown that various hardware components such as wireless network interface cards (WNIC), cellular network interface cards or GPS modules have power states, that is, the power consumption behavior of a hardware component depends on the current state. Power models that consider power states(stateful power models) can be modeled as Power State Machines (PSM). For systems with multiple power states, stateful models proved to be more accurate than models that do not consider power states (stateless models).Manually generating PSMs is time-consuming and limits the practicability of PSMs. Therefore, in this thesis, we explore the possibility of automatically generating PSMs. The contribution of this thesis is twofold: (1) We introduce an automated measurementbased profiling approach (2) and we introduce a step-based approach, which, provided with profiling data, automatically extracts PSMs along with tail states and state transitions.We evaluate the automated PSM extraction in a case study on an offloading speech recognition system. We compare the power consumption prediction accuracy of the generated PSM with the prediction accuracy of a stateless regression based model.Because we measure the power consumption of the whole system, we use along with all WiFi power models the same CPU power model in order to predict the power consumption of the whole system. We find that a slightly adapted version of thegenerated PSM predicts the power consumption with a mean error of approx. 3% and an error of approx. 2% in the best case. In contrast, the regression model produces a mean error ofapprox. 19% and an error of approx. 18% in the best case. an error of approx. 18% in the best case.)
Inkrementelle Modellreduktion zur Verkürzung der Testzyklen in der Transformationsentwicklung + (Modellgetriebene Softwareentwicklung (MDD) … Modellgetriebene Softwareentwicklung (MDD) ist ein Paradigma der Softwareentwicklung, in dem das Modell eine zentrale Rolle spielt. In der MDD wird das Problemfeld durch das Model abstrakt und repräsentativ beschrieben. Im Laufe der Entwicklung wird das Modell durch Modelltransformation schrittweise konkretisiert und schließlich in Programmcode umgewandelt. Je umfangreicher und komplexer das Problemfelds ist, desto größer ist die Anzahl der Modellelemente und desto komplexer ist der Zusammenhang zwischen den Modellelementen. Aus diesem Grund ist die Transformation eines solch großen Modells zeitaufwendig und fehleranfällig. Es werden in der Entwicklung mehrmals Test durchgeführt, um die Korrektheit des Modells und der Transformation zu gewährleisten. Die große Anzahl der Elemente im Modell verlangsamt den Test und erschwert das Finden der Fehlerursache im Modell und in der Transformation. Daher wurde im Rahmen dieser Bachelorarbeit untersucht, ob ein Ausschnitt des Modells existiert, welcher folgende Eigenschaften hat: Dieser Ausschnitt soll nur Teile des originalen Modells enthalten. Weiter sollen mit diesem Ausschnitt alle Fehler des vollständigen Modells repräsentiert werden können. Die Ursache und Korrektur des fehlerhaften Modells und der fehlerhaften Transformation werden im Rahmen dieser Arbeit nicht untersucht. Die Arbeit konzentriert sich auf das Erstellen und Untersuchen dieses Ausschnitts des Modells.ntersuchen dieses Ausschnitts des Modells.)
Anytime Tradeoff Strategies with Multiple Targets + (Modern applications typically need to find … Modern applications typically need to find solutions to complex problems under limited time and resources. In settings, in which the exact computation of indicators can either be infeasible or economically undesirable, the use of “anytime” algorithms, which can return approximate results when interrupted, is particularly beneficial, since they offer a natural way to trade computational power for result accuracy.However, modern systems typically need to solve multiple problems simultaneously. E.g. in order to find high correlations in a dataset, one needs to examine each pair of variables. This is challenging, in particular if the number of variables is large and the data evolves dynamically.This thesis focuses on the following question: How should one distribute resources at anytime, in order to maximize the overall quality of multiple targets? First, we define the problem, considering various notions of quality and user requirements. Second, we propose a set of strategies to tackle this problem. Finally, we evaluate our strategies via extensive experiments. our strategies via extensive experiments.)
Outlier Analysis in Live Systems from Application Logs + (Modern computer applications tend to gener … Modern computer applications tend to generate massive amounts of logs and have become so complex that it is often difficult to explain why applications failed. Locating outliers in application logs can help explain application failures. Outlier detection in application logs is challenging because (1) the log is unstructured text streaming data. (2) labeling application logs is labor-intensive and inefficient.Logs are similar to natural languages. Recent deep learning algorithm Transformer Neural Network has shown outstanding performance in Natural Language Processing (NLP) tasks. Based on these, we adapt Transformer Neural Network to detect outliers from applications logs In an unsupervised way. We compared our algorithm against state-of-the-art log outlier detection algorithms on three widely used benchmark datasets. Our algorithm outperformed state-of-the-art log outlier detection algorithms.-the-art log outlier detection algorithms.)
Subspace Search in Data Streams + (Modern data mining often takes place on hi … Modern data mining often takes place on high-dimensional data streams, which evolve at a very fast pace: On the one hand, the "curse of dimensionality" leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in a few low-dimensional subspaces. On the other hand, data streams are non-stationary and virtually unbounded. Hence, algorithms operating on data streams must work incrementally and take concept drift into account. While "high-dimensionality" and the "streaming setting" provide two unique sets of challenges, we observe that the existing mining algorithms only address them separately. Thus, our plan is to propose a novel algorithm, which keeps track of the subspaces of interest in high-dimensional data streams over time. We quantify the relevance of subspaces via a so-called "contrast" measure, which we are able to maintain incrementally in an efficient way. Furthermore, we propose a set of heuristics to adapt the search for the relevant subspaces as the data and the underlying distribution evolves.We show that our approach is beneficial as a feature selection method and as such can be applied to extend a range of knowledge discovery tasks, e.g., "outlier detection", in high-dimensional data-streams.ection", in high-dimensional data-streams.)
Bewertung verschiedener Parallelisierungsstrategien im Hinblick auf Leistungsfähigkeit von paralleler Programmausführung + (Moderne Prozessoren erreichen eine Leistun … Moderne Prozessoren erreichen eine Leistungssteigerung durch Hinzufügen mehrerer Kerne. Dadurch muss bei der Softwareentwicklung darauf geachtet werden, die Programmabläufe zu parallelisieren. Einflussfaktoren, die die Leistungsfähigkeit paralleler Programmausführung beeinflussen können, wurden bereits kategorisiert. Der Einfluss der gewählten Parallelisierungsstrategie ist dabei unbekannt. Im Rahmen der Bachelorarbeit wurde der Einfluss der gewählten Parallelisierungsstrategie auf die Leistungsfähigkeit von Software untersucht. Dazu wurden unterschiedliche Hardwareanforderungen genutzt. Mit ihnen wurden einzelne Arbeitspakete generiert. Diese wurden durch verschiedene Parallelisierungsstrategien ausgeführt. Die verwendeten Parallelisierungsstrategien sind: Java Threads, Java ParallelStreams, OpenMp und Akka Actor. Bei jeder Ausführung wurden die Laufzeit und das Cacheverhalten gemessen. Zudem wurden die Experimente auf verschiedenen dezidierten Servern und dem BwUniCluster durchgeführt. Die Auswertungen erfolgten mittels Beschleunigungskurven und der Cache Miss Rate. Die Ergebnisse zeigen, dass sich die Parallelisierungsstrategien bei den verwendeten Arbeitspaketen nur in geringem Maße unterscheiden.aketen nur in geringem Maße unterscheiden.)
Integrating Architecture-based Confidentiality Analysis with Code-based Information Flow Analysis + (Moderne Softwaresysteme müssen einer Vielz … Moderne Softwaresysteme müssen einer Vielzahl von Sicherheitsanforderungen gerecht werden. Diese Anforderungen scheinen im Laufe der Zeit immer strenger zu werden. Heutzutage führt ein Softwaresystem, das Vertraulichkeitsanforderungen nicht erfüllt, oft zur unbeabsichtigten Offenlegung sensibler Daten. Dies ist oft mit finanziellen Kosten verbunden, da die DSGVO Bußgelder eingeführt und erhöht hat, kann aber auch den Ruf eines Unternehmens beeinträchtigen und zu Kundenverlusten führen. Viele Sicherheitslücken können aus Diskrepanzen zwischen der Architekturplanung und der Implementierung des Codes entstehen. Aus diesem Grund untersucht diese Arbeit die Integration einer statischen, architekturbasierten Vertraulichkeitsanalyse mit einer statischen, codebasierten Informationsflussanalyse. Durch die Kombination dieser beiden Analysen möchten wir zeigen, dass wir eine Diskrepanz zwischen Design und Implementierung identifizieren können. Der in dieser Arbeit gewählte Ansatz behandelt die Architekturplanung als das beabsichtigte Verhalten des Systems. Es werden die erforderlichen Artefakte generiert, um eine codebasierte Analyse durchzuführen und zu überprüfen, ob die auf der Architektur definierten Eigenschaften auf die Implementierung anwendbar sind. In einer kleinen Studie haben wir die Durchführbarkeit des Ansatzes evaluiert. Zusammenfassend zielt diese Arbeit darauf ab, die Lücke zwischen der architekturellen Sicht und der Codesicht zu überbrücken, indem Vertraulichkeitseigenschaften in beiden verbunden werden.seigenschaften in beiden verbunden werden.)
Rekonstruktion von Komponentenmodellen für Qualitätsvorhersagen auf der Grundlage heterogener Artefakte in der Softwareentwicklung + (Moderne Softwaresysteme werden oftmals nic … Moderne Softwaresysteme werden oftmals nicht mehr als monolithische Anwendungen konstruiert. Verteilte Architekturen liegen im Trend. Der Einsatz von Technologien wie Docker und Spring bringt, neben dem Quelltext, zusätzliche Konfigurationsdateien mit ein. Eine Rekonstruktion der Softwarearchitektur nur anhand des Quelltextes wird dadurch erschwert. Zu Beginn dieser Arbeit wurden einige wissenschaftliche Arbeiten untersucht, die sich mit dem Thema Rekonstruktion von Softwarearchitekturen beschäftigen. Jedoch konnte keine Arbeit gefunden werden, welche sowohl heterogene Softwareartefakte unterstützt als auch ein für die Qualitätsvorhersage geeignetes Modell generiert.Aufgrund dessen stellt diese Arbeit einen neuen Ansatz vor, der mehrere heterogene Softwareartefakte zur Rekonstruktion eines Architekturmodells miteinbezieht. Genauer wird in dieser Arbeit der Ansatz als Prototyp für die Artefakte Java-Quelltext, Dockerfiles, Docker-Compose-Dateien sowie Spring-Konfigurationsdateien umgesetzt. Als Zielmodell kommt das Palladio-Komponentenmodell zum Einsatz, welches sich für Analysen und Simulationen hinsichtlich Performanz und Verlässlichkeit eignet. Es wird näher untersucht, inwiefern die Informationen der Artefakte zusammengeführt werden können. Der Ansatz sieht es vor, die Artefakte zuerst in Modelle zu transformieren. Für diese Transformationen werden zwei unterschiedliche Vorgehensweisen betrachtet. Zuerst soll Java-Quelltext mithilfe von JDT in ein bestehendes Metamodell übertragen werden. Für die übrigen Artefakte wird eine Xtext-Grammatik vorgeschlagen, welche ein passendes Metamodell erzeugen kann. Die Architektur des Ansatzes wurde außerdem so gestaltet, dass eine Anpassung oder Erweiterung bezüglich der unterstützten Artefakte einfach möglich ist.Zum Abschluss wird die prototypische Implementierung beschrieben und evaluiert. Dafür wurden zwei Fallstudien ausgewählt und mithilfe des Prototyps das Architekturmodell der Projekte extrahiert. Die Ergebnisse wurden anhand von vorher definierten Metriken anschließend untersucht. Dadurch konnte gezeigt werden, dass der Ansatz funktioniert und durch die heterogenen Artefakte ein Mehrwert zur Rekonstruktion des Architekturmodells beigetragen werden kann.rchitekturmodells beigetragen werden kann.)
Monitoring Complex Systems with Domain Knowledge: Adapting Contextual Bandits to Tracing Data + (Monitoring in complex computing systems is … Monitoring in complex computing systems is crucial to detect malicious states or errors in program execution. Due to the computational complexity, it is not feasible to monitor all data streams in practice. We are interested in monitoring pairs of highly correlated data streams. However we can not compute the measure of correlation for every pair of data streams at each timestep.Picking highly correlated pairs, while exploring potentially higher correlated ones is an instance of the exploration / exploitation problem. Bandit algorithms are a family of online learning algorithms that aim to optimize sequential decision making and balance exploration and exploitation. A contextual bandit additional uses contextual information to decide better.In our work we want to use a contextual bandit algorithm to keep an overview over highly correlated pairs of data streams. The context in our work contains information about the state of the system, given as execution traces.A key part of our work is to explore and evaluate different representations of the knowledge encapsulated in traces.Also we adapt state-of-the-art contextual bandit algorithms to the use case of correlation monitoring.to the use case of correlation monitoring.)
Integrating Structured Background Information into Time-Series Data Monitoring of Complex Systems + (Monitoring of time series data is increasi … Monitoring of time series data is increasingly important due to massive data generated by complex systems, such as industrial production lines, meteorological sensor networks, or cloud computing centers. Typical time series monitoring tasks include: future value forecasting, detecting of outliers or computing the dependencies.However, the already existing methods for time series monitoring tend to ignore the background information such as relationships between components or process structure that is available for almost any complex system. Such background information gives a context to the time series data, and can potentially improve the performance of time series monitoring tasks.In this bachelor thesis, we show how to incorporate structured background information to improve three different time series monitoring tasks. We perform the experiments on the data from the cloud computing center, where we extract background information from system traces. Additionally, we investigate different representations and quality of background information and conclude that its usefulness is independent from a concrete time series monitoring task.om a concrete time series monitoring task.)
Pattern Matching for Microservices in a Container-Based Architecture + (Multiple containers as packages of softwar … Multiple containers as packages of software code can interact with each other in a network and build together a container-based architecture. Huge architectures are hard to understand without any knowledge about the application or the applied underlying technologies. Therefore, this master thesis uses the approach of design pattern detection to reduce the amount of complexity of one architecture representation to multiple smaller pattern instances. So, a user can understand the depicted pattern instances in a short period of time by knowing the general patterns in advance.y knowing the general patterns in advance.)
Studienplanung mit Hilfe von Workflow-Verifikation: Fokus Dozentensicht + (Nach der Entwicklung eines Informationssys … Nach der Entwicklung eines Informationssystems im Rahmen einer studentischen Teamarbeit am Lehrstuhl "Systeme der Informationsverwaltung", das den Studierenden bei der Studienplanung unterstützt, soll dieses System erweitert werden, sodass es auch den Dozenten bei der Einplanung ihrer Lehrveranstaltungen in das Lehrangebot des jeweiligen Modulhandbuchs unterstützen kann. In dieser Arbeit wurde eine Anforderungsanalyse durchgeführt und konzipiert, wie das existierende System erweitert werden kann. Der Lehrstuhl hat bereits umfangreiche Erfahrung in datengestützter Verifikation von Prozessabläufen unter Nutzung von Petri Netzen. Da ein Studienplan als Ablauf seiner Lehrveranstaltungen als Prozess allerdings mit involvierten Daten modelliert werden kann, wurden in dieser Arbeit Verifikationsmethoden untersucht und kombiniert, um eine Datenwert-basierte Verifikation von Petri-Netz-Modellen zu ermöglichen. Anhand der Ergebnisse wurden Tests durchgeführt, um zu untersuchen, inwiefern solche Verifikationsmethoden die Studienpläne auf Korrektheit überprüfen können. Die Tests und die Untersuchungen haben gezeigt, dass ein Einsatz von Verifikationsmethoden für Petri-Netze zur Unterstützung eines solchen Systems unter bestimmten Einschränkungen ermöglicht werden kann.en Einschränkungen ermöglicht werden kann.)
Modellierung und Simulation von verteilter und wiederverwendbarer nachrichtenbasierter Middleware + (Nachrichtenbasierte Middleware (MOM) wird … Nachrichtenbasierte Middleware (MOM) wird in verschiedenen Domänen genutzt. Es gibt eine Vielzahl von verschiedenen MOMs, die jeweils unterschiedliche Ziele oder Schwerpunkte haben. Währende die einen besonderen Wert auf Performance oder auf Verfügbarkeit legen, möchten andere allseitig einsetzbar sein. Außerdem bieten MOMs eine hohe Konfigurierbarkeit an. Das Ziel dieser Masterarbeit ist es, den Softwarearchitekten bei der Wahl und der Konfiguration einer MOM bereits in der Designphase zu unterstützen. Existierende Modellierungs- und Vorhersagetechniken vernachlässigen den Einfluss von Warteschlangen. Dadurch können bestimmte Effekte der MOM nicht abgebildet werden, zum Beispiel, das Ansteigen der Latenz einer Nachricht, wenn die Warteschlange gefüllt ist. Die Beiträge der Masterarbeit sind: Auswahl und Ausmessen einer MOM, um Effekte und Ressourcenanforderungen zu untersuchen; Performance-Modellierung einer MOM mit Warteschlangen mit anschließender Kalibrierung; Eine Modeltransformation um bereits existierende Modell-Elemente wiederzuverwenden. Der Ansatz wurde mithilfe des SPECjms2007 Benchmarks evaluiert.ilfe des SPECjms2007 Benchmarks evaluiert.)
Automatisierte Gewinnung von Nachverfolgbarkeitsverbindungen zwischen Softwarearchitektur und Quelltext + (Nachverfolgbarkeitsverbindungen zwischen A … Nachverfolgbarkeitsverbindungen zwischen Architektur und Quelltext können das Wissen über ein System erweitern. Aufgrund des Erstellungsaufwands existieren in Softwareprojekten oft keine oder nur unvollständige Nachverfolgbarkeitsinformationen. Diese Arbeit untersucht einen Ansatz mit zwei Schritten, um automatisiert Nachverfolgbarkeitsverbindungen zwischen Architekturmodellelementen und Quelltext zu generieren. Damit die Erstellung von Nachverfolgbarkeitsverbindungen für verschiedene Programmiersprachen und Architektur-Metamodelle vereinheitlicht wird, werden im ersten Schritt aus den vorliegenden Artefakten Modelle erstellt. Der Quelltext wird dabei in ein von der konkreten Programmiersprache unabhängiges Modell überführt. Dafür wird ein Metamodell verwendet, das auf dem von der OMG spezifizierten KDM basiert. Für den zweiten Schritt werden auf den erstellten Modellen arbeitende Heuristiken und Aggregationen definiert. Diese werden genutzt, um die Nachverfolgbarkeitsverbindungen zu generieren. Die Heuristiken nutzen zum Beispiel Paket-, Pfad-, Namen- und Methoden-Informationen. Die Evaluation des Ansatzes nutzt einen dafür erstellten Goldstandard mit fünf Fallstudien. Es werden Nachverfolgbarkeitsverbindungen für PCM, UML, Java und Shell generiert. Für den Mikro-Durchschnitt des F1-Maßes wird ein Wert von 99,11 % erreicht. Fließt jede Komponente und Schnittstelle in gleichem Maße in den Wert ein, beträgt das F1-Maß 93,71 %. Insgesamt können mit dem Ansatz dieser Arbeit also sehr gute Ergebnisse erzielt werden. Für die TEAMMATES-Fallstudie wird mithilfe mehrerer Quelltextversionen der Einfluss der Konsistenz auf die Ergebnisse untersucht. Der Mikro-Durchschnitt des F1-Maßes ist für die konsistentere Version um 6,05 Prozentpunkte höher. Die Konsistenz kann also die Qualität der Ergebnisse beeinflussen. die Qualität der Ergebnisse beeinflussen.)
Entity Recognition in Software Documentation Using Trace Links to Informal Diagrams + (Natural Language Software Architecture Doc … Natural Language Software Architecture Documentation ( NLSAD ) and Software Architecture Model ( SAM) provide information about a software systems design and qualities. Inconsistencies between these artifacts can negatively impact the comprehension and evolution of the system. ArDoCo is an approach that was proposed in prior work by Keim et al. to find such inconsistencies and relies on Traceability Link Recovery (TLR) between entities in the NLSAD and SAM . ArDoCo searches for Unmentioned Model Elements (UMEs) in the model and Missing Model Elements (MMEs) in the text using the linkage information. ArDoCo’s approach shows promising results but has room for improvement regarding precision due to falsely identified textual entities. This work proposes using informal diagrams from the Software Architecture Documentation (SAD) to improve this. The approach performs an additional TLR between the textual entities and the diagram entities. According to heuristics, the linkage of textual entities and diagram entities is utilized to increase or decrease the confidence in textual entities. The Diagram Text TLR and its impact on ArDoCo’s performance are evaluated separately using the same data set as previous work by Keim et al. The data set was extended to include informal diagrams. The Diagram Text TLR achieves a good F1-score with Optical Character Recognition (OCR) of 0.54. The approach improves the MME detection (0.77→0.94 accuracy) by lowering the amount of falsely identified textual entities (0.39→0.69 precision) with a negligible impact on recall. The UME detection and ArDoCo ’s NLSAD to SAM are slightly positively impacted and continue to perform excellently. The results show that using informal diagrams to improve entity recognition in the text is promising. Room for improvement exists in dealing with issues related to OCR and diagram element processing.ted to OCR and diagram element processing.)
Bestimmung von Aktionsidentität in gesprochener Sprache + (Natürliche Sprache enthält Aktionen, die a … Natürliche Sprache enthält Aktionen, die ausgeführt werden können.Innerhalb eines Diskurses kommt es häufig vor, dass Menschen eine Aktion mehrmals beschreiben.Dies muss nicht immer bedeuten, dass diese Aktion auch mehrmals ausgeführt werden soll.Diese Bachelorarbeit untersucht, wie erkannt werden kann, ob sich eine Nennung einer Aktion auf eine bereits genannte Aktion bezieht.Es wird ein Vorgehen erarbeitet, das feststellt, ob sich mehrere Aktionsnennungen in gesprochener Sprache auf dieselbe Aktionsidentität beziehen.Bei diesem Vorgehen werden Aktionen paarweise verglichen.Das Vorgehen wird als Agent für die Rahmenarchitektur PARSE umgesetzt und evaluiert.Das Werkzeug erzielt ein F1-Maß von 0,8, wenn die Aktionen richtig erkannt werden und Informationen über Korreferenz zwischen Entitäten zur Verfügung stehen.z zwischen Entitäten zur Verfügung stehen.)
Performanzmodellierung von Apache Cassandra im Palladio-Komponentenmodell + (NoSQL-Datenbankmanagementsysteme werden al … NoSQL-Datenbankmanagementsysteme werden als Back-End für Software im Big-Data-Bereich verwendet, da sie im Vergleich zu relationalen Datenbankmanagementsystemen besser skalieren, kein festes Datenbankschema benötigen und in virtuellen Systemen einfach eingesetzt werden können. Apache Cassandra wurde aufgrund seiner Verbreitung und seiner Lizensierung als Open-Source-Projekt als Beispiel für NoSQL-Datenbankmanagementsysteme ausgewählt. Existierende Modelle von Apache Cassandra betrachten dabei nur die maximal mögliche Anzahl an Anfragen an Cassandra und deren Durchsatz und Latenz. Diese Anzahl zu reduzieren erhöht die Latenz der einzelnen Anfragen. Das in dieser Bachelorarbeit erstellte Modell soll unter anderem diesen Effekt abbilden.Die Beiträge der Arbeit sind das Erstellen und Parametrisieren eines Modells von Cassandra im Palladio-Komponentenmodell und das Evaluieren des Modells anhand von Benchmarkergebnissen. Zudem wird für dieses Ziel eine Vorgehensweise entwickelt, die das Erheben der notwendigen Daten sowie deren Auswertung und Evaluierung strukturiert und soweit möglich automatisiert und vereinfacht.Die Evaluation des Modells erfolgt durch automatisierte Simulationen, deren Ergebnisse mit den Benchmarks verglichen werden. Dadurch konnte die Anwendbarkeit des Modells für einen Thread und eine beliebige Anzahl Anfragen bei gleichzeitiger Verwendung von einer oder mehreren verschiedenen Operationen, abgesehen von der Scan-Operation, gezeigt werden.en von der Scan-Operation, gezeigt werden.)
Analysis of Classifier Performance on Aggregated Energy Status Data + (Non-intrusive load monitoring (NILM) algor … Non-intrusive load monitoring (NILM) algorithms aim at disaggregating consumption curves of households to the level of single appliances. However, there is no conventional way of quantifying and representing the tradeoff between the quality of analyses, such as the accuracy of the disaggregated consumption curves, and the load on the available computing resources. Thus, it is hard to plan the underlying infrastructure and resources for the analysis system and to find the optimal configuration of the system. This thesis introduces a system that assesses the quality of different analyses and their runtime behavior. This assessment is done based on varying configuration parameters and changed characteristics of the input dataset. Varied characteristics are the granularity of the data and the noisiness of the data. We demonstrate that the collected runtime behavior data can be used to choose reasonable characteristics of the input data set.ble characteristics of the input data set.)
Performancevorhersage für Container-Anwendungen (PdF) + (Nowadays distributed applications are ofte … Nowadays distributed applications are often not statically deployed on virtual machines. Instead, a desired state is defined declaratively. A control loop then tries to create the desired state in the cluster. Predicting the impact on the performance of a system using these deployment techniques is difficult. This paper introduces a method to predict the performance impact of the usage of containers and container orchestration in the deployment of a system. Our proposed approach enables system simulation and experimentation with various mechanisms of container orchestration, including autoscaling and container scheduling. We validated this approach using a micro-service reference application across different scenarios. Our findings suggest, that the simulation could effectively mimic most features of container orchestration tools, and the performance prediction of containerized applications in dynamic scenarios could be improved significantly.scenarios could be improved significantly.)
Enabling Consistency between Software Artefacts for Software Adaption and Evolution + (Nowadays, software systems are evolving at … Nowadays, software systems are evolving at a pace never seen before. As a result, emerging inconsistencies between different software artifacts are almost inevitable. Currently, there are already approaches for automated consistency maintenance between source code and architecture models. However, these approaches have various limitations. Therefore, in this thesis, we present a comprehensive approach for supporting the consistency preservation between software artifacts with special focus on software evolution and adaptation. At design-time, source code analysis and consistency rules are used, while at run-time, monitoring data is used as input for a transformation pipeline. In contrast to already existing approaches, the automated derivation of the system composition is supported. Ultimately, self-validations were included as a central component of the approach. In a case study based evaluation the accuracy of the models and the performance of the approach was measured. In addition, the scalability of the transformations within the pipeline was investigated.ions within the pipeline was investigated.)
Injection Molding Simulation based on Graph Neural Networks (GNNs) + (Numerical filling simulations are an impor … Numerical filling simulations are an important tool for the development of injection molding parts. Existing simulations rely on numerical solvers based on the finite element method. These solvers are reliable and precise, but very computationally expensive even on simple part geometries.In this thesis, we aim to develop a faster injection molding simulation based on Graph Neural Networks (GNNs) as a surrogate model. Our approach learns a simulation as a composition of three functions: an encoder, a processor and a decoder. The encoder takes in a graph representation of a 3D geometry of an injection molding part and returns a numeric embedding of each node in the graph. The processor updates the embeddings of each node multiple times based on its neighbors. The decoder then decodes the final embeddings of each node into physically meaningful variables, say, the fill state of the node.Our model can predict the progression of the flow front during a time step with a fixed size. To simulate a full mold filling process, our model is applied sequentially until the entire mold is filled. Our architecture is applicable to any kind of material, geometry and injection process parameters. We evaluate our architecture by its accuracy and runtime when predicting node properties. We also evaluate our models transfer learning ability on a real world injection molding part.ty on a real world injection molding part.)
Optimizing Parametric Dependencies for Incremental Performance Model Extraction + (Often during the development phase of a so … Often during the development phase of a software, engineers are facing different implementation alternatives. In order to test several options without investing the resources in implementing each one of them, a so-called performance model comes in practice. By using a performance models the developers can simulate the system in diverse scenarios and conditions. To minimize the differences between the real system and its model, i.e. to improve the accuracy of the model, parametric dependencies are introduced. They express a relation between the input arguments and the performance model parameters of the system. The latter could be loop iteration count, branch transition probabilities, resource demands or external service call arguments.Existing works in this field have two major shortcomings - they either do not perform incremental calibration of the performance model (updating only changed parts of the source code since the last commit), or do not consider more complex dependencies than linear. This work is part of the approach for the continuous integration of performance models. Our aim is to identify parametric dependencies for external service calls, as well as, to optimize the existing dependencies for the other types of performance model parameters. We propose using two machine learning algorithms for detecting initial dependencies and then refining the mathematical expressions with a genetic programming algorithm. Our contribution also includes feature selection of the candidates for a dependency and consideration not only of input service arguments but also the data flow i.e., the return values of previous external calls. return values of previous external calls.)
Automatically detecting Performance Regressions + (One of the most important aspects of softw … One of the most important aspects of software engineering is system performance. Common approaches to verify acceptable performance include running load tests on deployed software. However, complicated workflows and requirements like the necessity of deployments and extensive manual analysis of load test results cause tests to be performed very late in the development process, making feedback on potential performance regressions available much later after they were introduced.With this thesis, we propose PeReDeS, an approach that integrates into the development cycle of modern software projects, and explicitly models an automated performance regression detection system that provides feedback quickly and reduces manual effort for setup and load test analysis. PeReDeS is embedded into pipelines for continuous integration, manages the load test execution and lifecycle, processes load test results and makes feedback available to the authoring developer via reports on the coding platform. We further propose a method for detecting deviations in performance on load test results, based on Welch's t-test. The method is adapted to suit the context of performance regression detection, and is integrated into the PeReDeS detection pipeline. We further implemented our approach and evaluated it with an user study and a data-driven study to evaluate the usability and accuracy of our method. the usability and accuracy of our method.)
Evaluating architecture-based performance prediction for MPI-based systems + (One research field of High Performance Com … One research field of High Performance Computing (HPC) is computing clusters. Computing clusters are distributed memory systems where different machines are connected through a network. To enable the machines to communicate with each other they need the ability to pass messages to each other through the network. The Message Passing Interface (MPI) is the standard in implementing parallel systems for distributed memory systems. To enable software architects in predicting the performance of MPI-based systems several approaches have been proposed. However, those approaches depend either on an existing implementation of a program or are tailored for specific programming languages or use cases. In our approach, we use the Palladio Component Model (PCM) that allows us to model component-based architectures and to predict the performance of the modeled system. We modeled different MPI functions in the PCM that serve as reusable patterns and a communicator that is required for the MPI functions. The expected benefit is to provide patterns for different MPI functions that allow a precise modelation of MPI-based systems in the PCM. And to obtain a precise performance prediction of a PCM instance. performance prediction of a PCM instance.)
Batch query strategies for one-class active learning + (One-class classifiers learn to distinguish … One-class classifiers learn to distinguish normal objects from outliers. These classifiers are therefore suitable for strongly imbalanced class distributions with only a small fraction of outliers. Extensions of one-class classifiers make use of labeled samples to improve classification quality. As this labeling process is often time-consuming, one may use active learning methods to detect samples where obtaining a label from the user is worthwhile, with the goal of reducing the labeling effort to a fraction of the original data set. In the case of one-class active learning this labeling process consists of sequential queries, where the user labels one sample at a time. While batch queries where the user labels multiple samples at a time have potential advantages, for example parallelizing the labeling process, their application has so far been limited to binary and multi-class classification. In this thesis we explore whether batch queries can be used for one-class classification. We strive towards a novel batch query strategy for one-class classification by applying concepts from multi-class classification to the requirements of one-class active learning.requirements of one-class active learning.)
Performance Modeling of Distributed Computing + (Optimizing resource allocation in distribu … Optimizing resource allocation in distributed computing systems is crucial for enhancing system efficiency and reliability. Predicting job execution metadata, based on resource demands and platform characteristics, plays a key role in this optimization process.Distributed computing simulators are utilized for this purpose to model and predict system behaviors.Among the various simulators developed in recent decades, this thesis specifically focuses on the state-of-the-art simulator DCSim. DCSim simulates the nodes and links of the configured platform, generates the workloads according to configured parameter distributions, and performs the simulations. The simulated job execution metadata is accurate, yet the simulations demand computational resources and time that increase superlinearly with the number of nodes simulated.In this thesis, we explore the application of Recurrent Neural Networks and Transformer models for predicting job execution metadata within distributed computing environments.We focus on data preparation, model training, and evaluation for handling numerical sequences of varying lengths.This approach enhances the scalability of predictive systems by leveraging deep neural networks to interpret and forecast job execution metadata based on simulated data or historical data.We assess the models across four scenarios of increasing complexity, evaluating their ability to generalize for unseen jobs and platforms.We examine the training duration and the amount of data necessary to achieve accurate predictions and discuss the applicability of such models to overcome the scalability challenges of DCSim.The key findings of this work demonstrate that the models are capable of generalizing across sequences of lengths encountered during training but fall short in generalizing across different platforms.n generalizing across different platforms.)
Density-Based Outlier Detection Benchmark on Synthetic Data + (Outlier detection algorithms are widely us … Outlier detection algorithms are widely used in application fields such as image processing and fraud detection. Thus, during the past years, many different outlier detection algorithms were developed. While a lot of work has been put into comparing the efficiency of these algorithms, comparing methods in terms of effectiveness is rather difficult. One reason for that is the lack of commonly agreed-upon benchmark data.In this thesis the effectiveness of density-based outlier detection algorithms (such as KNN, LOF and related methods) on entirely synthetically generated data are compared, using its underlying density as ground truth.ng its underlying density as ground truth.)
High-Dimensional Neural-Based Outlier Detection + (Outlier detection in high-dimensional spac … Outlier detection in high-dimensional spaces is a challenging task because of consequences of the curse of dimensionality. Neural networks have recently gained in popularity for a wide range of applications due to the availability of computational power and large training data sets. Several studies examine the application of different neural network models, such an autoencoder, self-organising maps and restricted Boltzmann machines, for outlier detection in mainly low-dimensional data sets. In this diploma thesis we investigate if these neural network models can scale to high-dimensional spaces, adapt the useful neural network-based algorithms to the task of high-dimensional outlier detection, examine data-driven parameter selection strategies for these algorithms, develop suitable outlier score metrics for these models and investigate the possibility of identifying the outlying dimensions for detected outliers.outlying dimensions for detected outliers.)
Bachelorarbeit: Local Outlier Factor for Feature‐evolving Data Streams + (Outlier detection is a core task of data s … Outlier detection is a core task of data stream analysis. As such, many algorithms targeting this problem exist, but tend to treat the data as so-called row stream, i.e., observations arrive one at a time with a fixed number of features. However, real-world data often has the form of a feature-evolving stream: Consider the task of analyzing network data in a data center - here, nodes may be added and removed at any time, changing the features of the observed stream. While highly relevant, most existing outlier detection algorithms are not applicable in this setting. Further, increasing the number of features, resulting in high-dimensional data, poses a different set of problems, usually summarized as "the curse of dimensionality".In this thesis, we propose FeLOF, addressing this challenging setting of outlier detection in feature-evolving and high-dimensional data. Our algorithms extends the well-known Local Outlier Factor algorithm to the feature-evolving stream setting. We employ a variation of StreamHash random hashing projections to create a lower-dimensional feature space embedding, thereby mitigating the effects of the curse of dimensionality. To address non-stationary data distributions, we employ a sliding window approach. FeLOF utilizes efficient data structures to speed up search queries and data updates.Extensive experiments show that our algorithm achieves state-of-the-art outlier detection performance in the static, row stream and feature-evolving stream settings on real-world benchmark data. Additionally, we provide an evaluation of our StreamHash adaptation, demonstrating its ability to cope with sparsely populated high-dimensional data. sparsely populated high-dimensional data.)
Density-Based Outlier Detection Benchmark on Synthetic Data (Thesis) + (Outlier detection is a popular topic in re … Outlier detection is a popular topic in research, with a number of different approaches developed. Evaluating the effectiveness of these approaches however is a rather rarely touched field. The lack of commonly accepted benchmark data most likely is one of the obstacles for running a fair comparison of unsupervised outlier detection algorithms. This thesis compares the effectiveness of twelve density-based outlier detection algorithms in nearly 800.000 experiments over a broad range of algorithm parameters using the probability density as ground truth.g the probability density as ground truth.)
Subspace Generative Adversarial Learning for Unsupervised Outlier Detection + (Outlier detection is an important yet chal … Outlier detection is an important yet challenging task, especially for unlabeled, high-dimensional, datasets. Due to their self-supervised generative nature, Generative Adversarial Networks (GAN) have proven themselves to be one of the most powerful deep learning methods for outlier detection. However, most state-of-the-art GANs for outlier detection share common limitations. Oftentimes we only achieve great results if the model’s hyperparameters are properly tuned or the underlying network structure is adjusted. This optimization is not possible in practice when the data is unlabeled. If not tuned properly, it is not unusual that a state-of-the-art GAN method is outperformed by simpler shallow methods.We propose using a GAN architecture with feature ensemble learning to address hyperparameter sensibility and architectural dependency. This follows the success of feature ensembling in mitigating these problems inside other areas of Deep Learning. This thesis will study the optimization problem, training, and tuning of feature ensemble GANs in an unsupervised scenario, comparing it to other deep generative methods in a similar setting.p generative methods in a similar setting.)
Neural-Based Outlier Detection in Data Streams + (Outlier detection often needs to be done u … Outlier detection often needs to be done unsupervised with high dimensional data in data streams. “Deep structured energy-based models” (DSEBM) and “Variational Denoising Autoencoder” (VDA) are two promising approaches for outlier detection. They will be implemented and adapted for usage in data streams. Finally, their performance will be shown in experiments including the comparison with state of the art approaches.mparison with state of the art approaches.)
Adaptive Variational Autoencoders for Outlier Detection in Data Streams + (Outlier detection targets the discovery of … Outlier detection targets the discovery of abnormal data patterns. Typical scenarios, such as are fraud detection and predictive maintenance are particularly challenging, since the data is available as an infinite and ever evolving stream. In this thesis, we propose Adaptive Variational Autoencoders (AVA), a novel approach for unsupervised outlier detection in data streams.Our contribution is two-fold: (1) we introduce a general streaming framework for training arbitrary generative models on data streams. Here, generative models are useful to capture the history of the stream. (2) We instantiate this framework with a Variational Autoencoder, which adapts its network architecture to the dimensionality of incoming data.Our experiments against several benchmark outlier data sets show that AVA outperforms the state of the art and successfully adapts to streams with concept drift.ully adapts to streams with concept drift.)
Scenario Discovery with Active Learning + (PRIM (Patient Rule Induction Method) is an … PRIM (Patient Rule Induction Method) is an algorithm used for discovering scenarios, by creating hyperboxes in the input space. Yet PRIM alone usually requires large datasets and computational simulations can be expensive. Consequently, one wants to obtain scenarios while reducing the number of simulations. It has been shown, that combining PRIM with machine learning models, can reduce the number of necessary simulation runs by around 75%.In this thesis, I analyze nine different active learning sampling strategies together with several machine learning models, in order to find out if active learning can systematically improve PRIM even further, and if out of those strategies and models, a most beneficial combination of sampling method and intermediate machine learning model exists for this purpose.ne learning model exists for this purpose.)
Patient Rule Induction Method with Active Learning + (PRIM (Patient Rule Induction Method) is an … PRIM (Patient Rule Induction Method) is an algorithm for discovering scenarios from simulations, by creating hyperboxes, that are human-comprehensible. Yet PRIM alone requires relatively large datasets and computational simulations are usually quite expensive. Consequently, one wants to obtain a plausible scenario, with a minimal number of simulations. It has been shown, that combining PRIM with ML models, which generalize faster, can reduce the number of necessary simulation runs by around 75%.We will try to reduce the number of simulation runs even further, using an active learning approach to train an intermediate ML model. Additionally, we extend the previously proposed methodology to not only cover classification but also regression problems. A preliminary experiment indicated, that the combination of these methods, does indeed help reduce the necessary runs even further. In this thesis, I will analyze different AL sampling strategies together with several intermediate ML models to find out if AL can systematically improve existing scenario discovery methods and if a most beneficial combination of sampling method and intermediate ML model exists for this purpose.rmediate ML model exists for this purpose.)
A Parallelizing Compiler for Adaptive Auto-Tuning + (Parallelisierende Compiler und Auto-Tuner … Parallelisierende Compiler und Auto-Tuner sind zwei der vielen Technologien, die Entwick-lern das Schreiben von leistungsfähigen Anwendungen für moderne heterogene Systemeerleichtern können. In dieser Arbeit stellen wir einen parallelisierenden Compiler vor, derParallelität in Programmen erkennen und parallelen Code für heterogene Systeme erzeu-gen kann. Außerdem verwendet der vorgestellte Compiler Auto-Tuning, um eine optimalePartitionierung der parallelisierten Codeabschnitte auf mehrere Plattformen zur Laufzeitzu finden, welche die Ausführungszeit minimiert. Anstatt jedoch die Parallelisierung ein-mal für jeden parallelen Abschnitt zu optimieren und die gefundenen Konfigurationen solange zu behalten wie das Programm ausgeführt wird, sind Programme, die von unseremCompiler generiert wurden, in der Lage zwischen verschiedenen Anwendungskontexten zuunterscheiden, sodass Kontextänderungen erkannt und die aktuelle Konfiguration für je-den vorkommenden Kontext individuell angepasst werden kann. Zur Beschreibung vonKontexten verwenden wir sogenannte Indikatoren, die bestimmte Laufzeiteigenschaftendes Codes ausdrücken und in den Programmcode eingefügt werden, damit sie bei der Aus-führung ausgewertet und vom Auto-Tuner verwendet werden können. Darüber hinausspeichern wir gefundene Konfigurationen und die zugehörigen Kontexte in einer Daten-bank, sodass wir Konfigurationen aus früheren Läufen wiederverwenden können, wenn dieAnwendung erneut ausgeführt wird.Wir evaluieren unseren Ansatz mit der Polybench Benchmark-Sammlung. Die Ergeb-nisse zeigen, dass wir in der Lage sind, Kontextänderungen zur Laufzeit zu erkennen unddie Konfiguration dem neuen Kontext entsprechend anzupassen, was im Allgemeinen zuniedrigeren Ausführungszeiten führt.en zu niedrigeren Ausführungszeiten führt.)
Calibrating Performance Models for Particle Physics Workloads + (Particle colliders are a primary method of … Particle colliders are a primary method of conducting experiments in particle physics, as they allow to both create short-lived, high-energy particles and observe their properties. The world’s largest particle collider, the Large Hadron Collider (subsequently referred to as LHC), is operated by the European Organization for Nuclear Research (CERN) near Geneva. The operation of this kind of accelerator requires the storage and computationally intensive analysis of large amounts of data. The Worldwide LHC Computing Grid (WLCG), a global computing grid, is being run alongside the LHC to serve this purpose.This Bachelor’s thesis aims to support the creation of an architecture model and simulation for parts of the WLCG infrastructure with the goal of accurately being able to simulate and predict changes in the infrastructure such as the replacement of the load balancing strategies used to distribute the workload between available nodes.bute the workload between available nodes.)
Adaptive Monitoring for Continuous Performance Model Integration + (Performance Models (PMs) can be used to pr … Performance Models (PMs) can be used to predict software performance and evaluate the alternatives at the design stage. Building such models manually is a time consuming and not suitable for agile development process where quick releases have to be generated in short cycles. To benefit from model-based performance prediction during agile software development the developers tend to extract PMs automatically. Existing approaches that extract PMs based on reverse-engineering and/or measurement techniques require to monitor and analyze the whole system after each iteration, which will cause a high monitoring overhead.The Continuous Integration of Performance Models (CIPM) approach address this problem by updating the PMs and calibrate it incrementally based on the adaptive monitoring of the changed parts of the code.In this work, we introduced an adaptive monitoring approach for performance model integration, which instruments automatically only the changed parts of the source code using specific pre-defined probes types. Then it monitors the system adaptively. The resulting measurements are used by CIPM to estimate PM parameters incrementally.The evaluation confirmed that our approach can reduce the monitoring overhead to 50%.can reduce the monitoring overhead to 50%.)
(Freiwillige Teilnahme) Abschlussvortrag Praxis der Forschung SS23 I + (Performancevorhersage für Container-Anwend … Performancevorhersage für Container-AnwendungenAbstract: Nowadays distributed applications are often not statically deployed on virtual machines. Instead, a desired state is defined declaratively. A control loop then tries to create the desired state in the cluster. Predicting the impact on the performance of a system using these deployment techniques is difficult. This paper introduces a method to predict the performance impact of the usage of containers and container orchestration in the deployment of a system. Our proposed approach enables system simulation and experimentation with various mechanisms of container orchestration, including autoscaling and container scheduling. We validated this approach using a micro-service reference application across different scenarios. Our findings suggest, that the simulation could effectively mimic most features of container orchestration tools, and the performance prediction of containerized applications in dynamic scenarios could be improved significantly.scenarios could be improved significantly.)
Tuning of Explainable Artificial Intelligence (XAI) tools in the field of text analysis + (Philipp Weinmann will present his plan for … Philipp Weinmann will present his plan for his Bachelor thesis with the title: Tuning of Explainable Artificial Intelligence (XAI) tools in the field of text analysis: He will present a global introduction to explainers for Artificial Intelligence in the context of NLP. We will then explore in details one of these tools: Shap, a perturbation based local explainer and talk about evaluating shap-explanations.d talk about evaluating shap-explanations.)
Explainable Artificial Intelligence for Decision Support + (Policy makers face the difficult task to m … Policy makers face the difficult task to make far-reaching decisions that impact the life of the the entire population based on uncertain parameters that they have little to no controlover, such as environmental impacts. Often, they use scenarios in their decision making process. Scenarios provide a common and intuitive way to communicate and characterize different uncertain outcomes in many decision support applications,especially in broad public debates. However, they often fall short of their potential, particularly when applied for groups with diverseinterests and worldviews, due to the difficulty of choosing a small number of scenarios to summarize the entire range of uncertain future outcomes. Scenario discovery addresses these problems by using statistical or data-mining algorithms to find easy-to-interpret, policy-relevant regions in the space of uncertain input parameters of computer simulation models. One of many approaches to scenario discovery is subgroup discovery, an approach from the domain of explainable Artificial Intelligence.In this thesis, we test and evaluate multiple different subgroup discovery methods for their applicabilty to scenario discovery applications.abilty to scenario discovery applications.)
Symbolic Performance Modeling + (Predicting software performance under diff … Predicting software performance under different configurations is a challenging task due to the large amount of possible configurations. Performance-influence models help stakeholders understand how configuration options and their interactions influence the performance of a program. A crucial part of the performance modeling process is the design of an experiment set that delivers performance measurements which are used as input for a machine learning algorithm that learns the performance model. An optimal experiment set should contain the minimal amount of experiments that produces a sufficiently accurate performance model.The topic of this thesis is Symbolic Performance Modeling, a new white-box approach to the analysis of the configuration options' influence on the software's performance. The approach utilizes taint analysis to determine where in the source code configuration options influence the software's performance and symbolic execution to determine whether the influence is significant. We assume that only loop constructs with non-constant iteration counts change the asymptotic behavior of the program. The Feature Taint Analysis provided by VaRA is used to determine which configuration options influence loops, while the Path Tracing provided by PhASAR is used to construct all control-flow paths leading to the loops and their respective path conditions. The SMT Solver Z3 is then used to derive value ranges from the path conditions for the configuration options which influence the loop constructs. We determine the significance of a configuration option's influence based on the size of its value range.We implement the proof-of-concept tool Symbolic Performance Modeling Value Generator to evaluate the approach with regard to its capabilities to analyze real-world applications and its performance. From the insights gained during the evaluation, we define limitations of the current implementation and propose improvements for future work. and propose improvements for future work.)
Enhancing Non-Invasive Human Activity Recognition by Fusioning Electrical Load and Vibrational Measurements + (Professional installation of stationary se … Professional installation of stationary sensors burdens the adoption of Activity Recognition Systems in households. This can be circumvented by utilizing sensors that are cheap, easy to set up and adaptable to a variety of homes. Since 72% of European consumers will have Smart Meters by 2020, it provides an omnipresent basis for Activity Recognition. This thesis investigates, how a Smart Meter’s limited recognition of appliance involving activities can be extended by Vibration Sensors. We provide an experimental setup to aggregate a dedicated dataset with a sampling frequency of 25,600 Hz. We evaluate the impact of combining a Smart Meter and Vibration Sensors on a system’s accuracy, by means of four developed Activity Recognition Systems. This results in the quantification of the impact. We found out that through combining these sensors, the accuracy of an Activity Recognition System rather strives towards the highest accuracy of a single underlying sensor, than jointly surpassing it.rlying sensor, than jointly surpassing it.)
Evidence-based Token Abstraction for Software Plagiarism Detection + (Programming assignments for students are t … Programming assignments for students are target of plagiarism. Especially for graded assignments, instructors want to detect plagiarism among the students. For larger courses, however, manual inspection of all submissions is a resourceful task. For this purpose, there are numerous tools that can help detect plagiarism in submissions. Many well-known plagiarism detection tools are token-based detectors. In an abstraction step, they map source code to a list of tokens, and such lists are then compared with each other. While there is much research in the area of comparison algorithms, the mapping is often only considered superficially. In this work, we conduct two experiments that address the issue of token abstraction. For that, we design different token abstractions and explain their differences. We then evaluate these abstractions using multiple datasets. We show that different abstractions have pros and cons, and that a higher abstraction level does not necessarily perform better. These findings are useful when adding support for new programming languages and for improving existing plagiarism detection tools. Furthermore, the results can be helpful to choose abstractions tailored to specific requirements.actions tailored to specific requirements.)
Theory-Guided Data Science for Battery Voltage Prediction: A Systematic Guideline + (Purely data-driven Data Science approaches … Purely data-driven Data Science approaches tend to underperform when applied to scientific problems, especially when there is little data available. Theory-guided Data Science (TGDS) incorporates existing problem specific domain knowledge in order to increase the performance of Data Science models. It has already proved to be successful in scientific disciplines like climate science or material research.Although there exist many TGDS methods, they are often not comparable with each other, because they were originally applied to different types of problems. Also, it is not clear how much domain knowledge they require. There currently exist no clear guidelines on how to choose the most suitable TGDS method when confronted with a concrete problem.Our work is the first one to compare multiple TGDS methods on a time series prediction task. We establish a clear guideline by evaluating the performance and required domain knowledge of each method in the context of lithium-ion battery voltage prediction. As a result, our work could serve as a starting point on how to select the right TGDS method when confronted with a concrete problem.d when confronted with a concrete problem.)
Using Architectural Design Space Exploration to Quantify Cost-to-Quality Relationship + (QUPER ist eine Methode um bei einer Releas … QUPER ist eine Methode um bei einer Release-Plannung, bei der eine bestimmte Qualitätsanforderung zentral ist, das Fällen von Entscheidungen einfacher zu machen. Die Methode ist genau dann äußerst hilfreich, wenn das Softwareprojekt mehrere konkurrierende Produkte auf dem Markt hat und eine bestimmte Qualitätsanforderung den Wert der Software für den Kunden stark beeinflusst. QUPER benötigt allerdings Schätzungen des Entwicklungsteams und ist somit stark von der Erfahrung dessen abhängig. Das Palladio Component Model in Kombination mit PerOpteryx können dabei helfen, diese groben Schätzungen durch genauere Information für ein kommendes Release zu ersetzen: Mit einem gegebenen Palladio-Modell und einer potentiellen Verbesserung für die Software kann uns PerOpteryx die genaue Verbesserung der Qualitätsanforderung geben. In dieser Arbeit werden zuerst die QUPER-Methode allein und dann QUPER mit Hilfe von PerOpteryx auf zwei exemplarische Softwareprojekte angewandt und die Ergebnisse verglichen.e angewandt und die Ergebnisse verglichen.)
Modularization approaches in the context of monolithic simulations + (Quality characteristics of a software syst … Quality characteristics of a software system such as performance or reliability can determineits success or failure. In traditional software engineering, these characteristics canonly be determined when parts of the system are already implemented and past the designprocess. Computer simulations allow to determine estimations of quality characteristicsof software systems already during the design process. Simulations are build to analysecertain aspects of systems. The representation of the system is specialised for the specific analysis. This specialisation often results in a monolithic design of the simulation.Monolithic structures, however, can induce reduced maintainability of the simulation anddecreased understandability and reusability of the representations of the system. Thedrawbacks of monolithic structures can be encountered by the concept of modularisation,where one problem is divided into several smaller sub-problems. This approach allows aneasier understanding and handling of the sub-problems.In this thesis an approach is provided to describe the coupling of newly developedand already existing simulations to a modular simulation. This approach consists of aDomain-Specific Language (DSL) developed with model-driven technologies. The DSLis applied in a case-study to describe the coupling of two simulations. The coupling ofthese simulations with an existing coupling approach is implemented according to thecreated description. An evaluation of the DSL is conducted regarding its completeness todescribe the coupling of several simulations to a modular simulation. Additionally, themodular simulation is examined regarding the accuracy of preserving the behaviour of themonolithic simulation. The results of the modular simulation and the monolithic versionare compared for this purpose. The created modular simulation is additionally evaluatedin regard to its scalability by analysis of the execution times when multiple simulationsare coupled. Furthermore, the effect of the modularisation on the simulation executiontimes is evaluated.The obtained evaluation results show that the DSL can describe the coupling of the twosimulations used in the case-study. Furthermore, the results of the accuracy evaluationsuggest that problems in the interaction of the simulations with the coupling approach exist.However, the results also show that the overall behaviour of the monolithic simulation ispreserved in its modular version. The analysis of the execution times suggest, that themodular simulation experiences an increase in execution time compared to the monolithicversion. Also, the results regarding the scalability show that the execution time of themodular simulation does not increase exponentially with the number of coupled simulations.ly with the number of coupled simulations.)
Parametrisierung der Spezifikation von Qualitätsannotationen in Software-Architekturmodellen + (Qualitätseigenschaften von komponentenbasi … Qualitätseigenschaften von komponentenbasierten Software-Systemen hängen sowohl von den eingesetzten Komponenten, als auch von ihrem eingesetzten Kontext ab. Während die kontextabhängige Parametrisierung für einzelne Qualitätsanalysemodelle, wie z.B. Performance, bereits fundiert wissenschaftlich analysiert wurde, ist dies für andere Qualitätsattribute, insbesondere für qualitativ beschreibende Modelle, noch ungeklärt. Die vorgestellte Arbeit stellt die Qualitätseffekt-Spezifikation vor, die eine kontextabhängige Analyse und Transformation beliebiger Qualitätsattribute erlaubt. Der Ansatz enthält eine eigens entworfene domänenspezifischen Sprache zur Modellierung von Auswirkungen in Abhängigkeit des Kontextes und dazu entsprechende Transformation der Qualitätsannotationen. Transformation der Qualitätsannotationen.)
Generalized Monte Carlo Dependency Estimation with improved Convergence + (Quantifying dependencies among variables i … Quantifying dependencies among variables is a fundamental task in data analysis. It allows to understand data and to identify the variables required to answer specific questions. Recent studies have positioned Monte Carlo Dependency Estimation (MCDE) as a state-of-the-art tool in this field.MCDE quantifies dependencies as the average discrepancy between marginal and conditional distributions. In practice, this value is approximated with a dependency estimator. However, the original implementation of this estimator converges rather slowly, which leads to suboptimal results in terms of statistical power. Moreover, MCDE is only able to quantify dependencies among univariate random variables, but not multivariate ones. In this thesis, we make 2 major improvements to MCDE. First, we propose 4 new dependency estimators with faster convergence. We show that MCDE equipped with these new estimators achieves higher statistical power. Second, we generalize MCDE to GMCDE (Generalized Monte Carlo Dependency Estimation) to quantify dependencies among multivariate random variables. We show that GMCDE inherits all the desirable properties of MCDE and demonstrate its superiority against the state-of-the-art dependency measures with experiments.-art dependency measures with experiments.)
Adaptives Online-Tuning für kontinuierliche Zustandsräume + (Raytracing ist ein rechenintensives Verfah … Raytracing ist ein rechenintensives Verfahren zur Erzeugung photorealistischer Bilder. Durch die automatische Optimierung von Parametern, die Einfluss auf die Rechenzeit haben, kann die Erzeugung von Bildern beschleunigt werden. Im Rahmen der vorliegenden Arbeit wurde der Auto-Tuner libtuning um ein generalisiertes Reinforcement Learning-Verfahren erweitert, das in der Lage ist, bestimmte Charakteristika der zu zeichnenden Frames bei der Auswahl geeigneter Parameterkonfigurationen zu berücksichtigen. Die hierfür eingesetzte Strategie ist eine ε-gierige Strategie, die für die Exploration das Nelder-Mead-Verfahren zur Funktionsminimierung aus libtuning verwendet. Es konnte gezeigt werden, dass ein Beschleunigung von bis zu 7,7 % in Bezug auf die gesamte Rechenzeit eines Raytracing-Anwendungsszenarios dieser Implementierung gegenüber der Verwendung von libtuning erzielt werden konnte.ndung von libtuning erzielt werden konnte.)
Integration of Reactions and Mappings in Vitruvius + (Realizing complex software projects is oft … Realizing complex software projects is often done by utilizing multiple programming or modelling languages. Separate parts of the software are relevant to certain development tasks or roles and differ in their representation. These separate representations are related and contain redundant information. Such redundancies exist for example with an implementation class for a component description, which has to implement methods with signatures as specified by the component. Whenever redundant information is affected in a development update, other representations that contain redundant information have to be updated as well. This additional development effort is required to keep the redundant information consistent and can be costly.Consistency preservation languages can be used to describe how consistency of representations can be preserved, so that in use with further development tools the process of updating redundant information is automated. However, such languages vary in their abstraction level and expressiveness. Consistency preservation languages with higher abstraction specify what elements of representations are considered consistent in a declarative manner. A language with less abstraction concerns how consistency is preserved after an update using imperative instructions. A common trade-off in the decision for selecting a fitting language is between expressiveness and abstraction. Higher abstraction on the one hand implies less specification effort, on the other hand it is restricted in expressiveness compared to a more specific language.In this thesis we present a concept for combining two consistency specification languages of different abstraction levels. Imperative constructs of a less abstract language are derived from declarative consistency expressions of a language of higher abstraction and combined with additional imperative constructs integrated into the combined language. The combined language grants the benefits of the more abstract language and enables realizing parts of the specification without being restricted in expressiveness. As a consequence a developer profits from the advantages of both languages, as previously a specification that can not be completely expressed with the more abstract language has to be realized entirely with the less abstract language.We realize the concepts by combining the Reactions and Mappings language of the VITRUVIUS project. The imperative Reactions language enables developers to specifytriggers for certain model changes and repair logic. As a more abstract language, Mappings specify consistency with a declarative description between elements of two representations and what conditions for the specific elements have to apply. We research the limits of expressiveness of the declarative description and depict, how scenarios are supported that require complex consistency specifications. An evaluation with a case study shows the applicability of the approach, because an existing project, prior using the Reactions language, can be realized with the combination concept. Furthermore, the compactness of the preservation specification is increased.e preservation specification is increased.)
On the semantics of similarity in deep trajectory representations + (Recently, a deep learning model (t2vec) fo … Recently, a deep learning model (t2vec) for trajectory similarity computation has been proposed. Instead of using the trajectories, it uses their deep representations to compute the similarity between them. At this current state, we do not have a clear idea how to interpret the t2vec similarity values, nor what they are exactly based on. This thesis addresses these two issues by analyzing t2vec on its own and then systematically comparing it to the the more familiar traditional models.Firstly, we examine how the model’s parameters influence the probability distribution (PDF) of the t2vec similarity values. For this purpose, we conduct experiments with various parameter settings and inspect the abstract shape and statistical properties of their PDF. Secondly, we consider that we already have an intuitive understanding of the classical models, such as Dynamic Time Warping (DTW) and Longest Common Subsequence (LCSS). Therefore, we use this intuition to analyze t2vec by systematically comparing it to DTW and LCSS with the help of heat maps.o DTW and LCSS with the help of heat maps.)
Implementation and Evaluation of CHQL Operators in Relational Database Systems to Query Large Temporal Text Corpora + (Relational database management systems hav … Relational database management systems have an important place in the informational revolution. Their release on the market facilitates the storing and analysis of data. In the last years, with the release of large temporal text corpora, it was proven that domain experts in conceptual history could also benefit from the performance of relational databases. Since the relational algebra behind them lacks special functionality for this case, the Conceptual History Query Language (CHQL) was developed. The first result of this thesis is an original implementation of the CHQL operators in a relational database, which is written in both SQL and its procedural extension. Secondly, we improved substantially the performance with the trigram indexes. Lastly, the query plan analysis reveals the problem behind the query optimizers choice of inefficient plans, that is the inability of predicting correctly the results from a stored function.rectly the results from a stored function.)
Analysis and Visualization of Semantics from Massive Document Directories + (Research papers are commonly classified in … Research papers are commonly classified into categories, and we can see the existing contributions as a massive document directory, with sub-folders. However, research typically evolves at an extremely fast pace; consider for instance the field of computer science. It can be difficult to categorize individual research papers, or to understand how research communities relate to each other.In this thesis we will analyze and visualize semantics from massive document directories. The results will be displayed using the arXiv corpus, which contains domain-specific (computer science) papers of the past thirty years. The analysis will illustrate and give insight about past trends of document directories and how their relationships evolve over time. how their relationships evolve over time.)
Anforderung-zu- Quelltextrückverfolgbarkeit mittels Wort- und Quelltexteinbettungen + (Rückverfolgbarkeitsinformationen helfen En … Rückverfolgbarkeitsinformationen helfen Entwickler beim Verständnis von Softwaresystemen und dienen als Grundlage für weitere Techniken wie der Abdeckungsanalyse. In dieser Arbeit wird untersucht, wie Einbettungen für die automatische Rückverfolgbarkeit zwischen Anforderungen und Quelltext eingesetzt werden können. Dazu werden verschiedene Möglichkeiten betrachtet, die Anforderungen und den Quelltext mit Einbettungen zu repräsentieren und anschließend aufeinander abzubilden, um Rückverfolgbarkeitsverbindungen zwischen ihnen zu erzeugen. Für eine Klasse existieren beispielsweise viele Optionen, welche Informationen bzw. welche Klassenelemente zur Berechnung einer Quelltexteinbettung berücksichtigt werden. Für die Abbildung werden zwischen den Einbettungen durch eine Metrik Ähnlichkeitswerte berechnet, mit deren Hilfe Aussagen über die Existenz einer Rückverfolgbarkeitsverbindung zwischen ihren repräsentierten Artefakten getroffen werden können.In der Evaluation wurden die verschiedenen Möglichkeiten für die Einbettung und Abbildung untereinander und mit anderen Arbeiten verglichen. Bezüglich des F1-Wertes erzeugen Quelltexteinbettungen mit Klassennamen, Methodensignaturen und -kommentaren sowie Abbildungsverfahren, die die Word Mover’s Distance als Ähnlichkeitsmetrik nutzen, die besten projektübergreifenden Ergebnisse. Das beste Verfahren erreicht auf dem Projekt LibEST, welches aus 14 Quelltext- und 52 Anforderungsartefakten besteht, einen F1-Wert von 60,1%. Die beste projektübergreifende Konfiguration erzielt einen durchschnittlichen F1-Wert von 39%. einen durchschnittlichen F1-Wert von 39%.)
Bestimmung der semantischen Funktion von Quelltextabschnitten + (Rückverfolgbarkeitsinformationen zwischen … Rückverfolgbarkeitsinformationen zwischen Quelltext und Anforderungen ermöglichen es Werkzeugen Programmierer besser bei der Navigation und der Bearbeitung von Quelltext zu unterstützen. Um solche Verbindungen automatisiert herstellen zu können, muss die Semantik der Anforderungen und des Quelltextes verstanden werden. Im Rahmen dieser Arbeit wird ein Verfahren zur Beschreibung der geteilten Semantik von Gruppierungen von Programmelementen entwickelt. Das Verfahren basiert auf dem statistischen Themenmodell LDA und erzeugt eine Menge von Schlagwörtern als Beschreibung dieser Semantik. Es werden natürlichsprachliche Inhalte im Quelltext der Gruppierungen analysiert und genutzt, um das Modell zu trainieren. Um Unsicherheiten in der Wahl der Parameter von LDA auszugleichen und die Robustheit der Schlagwortmenge zu verbessern, werden mehrere LDA-Modelle kombiniert. Das entwickelte Verfahren wurde im Rahmen einer Nutzerstudie evaluiert. Insgesamt wurde eine durchschnittliche Ausbeute von 0.73 und ein durchschnittlicher F1-Wert von 0.56 erreicht.chschnittlicher F1-Wert von 0.56 erreicht.)
Improving Document Information Extraction with efficient Pre-Training + (SAP Document Information Extraction (DOX) … SAP Document Information Extraction (DOX) is a service to extract logical entities from scanned documents based on the well-known Transformer architecture. The entities comprise header information such as document date or sender name, and line items from tables on the document with fields such as line item quantity. The model currently needs to be trained on a huge number of labeled documents, which is impractical. Also, this hinders the deployment of the model at large scale, as it cannot easily adapt to new languages or document types. Recently, pretraining large language models with self-supervised learning techniques have shown good results as a preliminary step, and allow reducing the amount of labels required in follow-up steps. However, to generalize self-supervised learning to document understanding, we need to take into account different modalities: text, layout and image information of documents. How to do that efficiently and effectively is unclear yet. The goal of this thesis is to come up with a technique for self-supervised pretraining within SAP DOX. We will evaluate our method and design decisions against SAP data as well as public data sets. Besides the accuracy of the extracted entities, we will measure to what extent our method lets us lower label requirements.r method lets us lower label requirements.)
Wichtigkeit von Merkmalen für die Klassifikation von SAT-Instanzen (Proposal) + (SAT gehört zu den wichtigsten NP-schweren … SAT gehört zu den wichtigsten NP-schweren Problemen der theoretischen Informatik, weshalb die Forschung vor allem daran interessiert ist, besonders effiziente Lösungsverfahren dafür zu finden. Deswegen wird eine Klassifizierung vorgenommen, indem ähnliche Probleminstanzen zu Instanzfamilien gruppiert werden, die man mithilfe von Verfahren des maschinellen Lernens automatisieren will. Die Bachelorarbeit beschäftigt sich unter anderem mit folgenden Themen: Mit welchen (wichtigsten) Eigenschaften kann eine Instanz einer bestimmten Familie zugeordnet werden? Wie erstellt man einen guten Klassifikator für dieses Problem? Welche Gemeinsamkeiten haben Instanzen, die oft fehlklassifiziert werden? Wie sieht eine sinnvolle Familieneinteilung aus?eht eine sinnvolle Familieneinteilung aus?)
Verification of Access Control Policies in Software Architectures + (Security in software systems becomes more … Security in software systems becomes more important as systems becomes more complex and connected. Therefore, it is desirable to to conduct security analysis on an architectural level. A possible approach in this direction are data-based privacy analyses. Such approaches are evaluated on case studies. Most exemplary systems for case studies are developed specially for the approach under investigation. Therefore, it is not that simple to find a fitting a case study. The thesis introduces a method to create usable case studies for data-based privacy analyses. The method is applied to the Community Component Modeling Example (CoCoME). The evaluation is based on a GQM plan and shows that the method is applicable. Also it is shown that the created case study is able to check if illegal information flow is present in CoCoME. Additionally, it is shown that the provided meta model extension is able to express the case study.tension is able to express the case study.)
Beyond Similarity - Dimensions of Semantics and How to Detect them + (Semantic similarity estimation is a widely … Semantic similarity estimation is a widely used and well-researched area. Current state-of-the-art approaches estimate text similarity with large language models. However, semantic similarity estimation often ignores fine-grain differences between semantic similar sentences. This thesis proposes the concept of semantic dimensions to represent fine-grain differences between two sentences. A workshop with domain experts identified ten semantic dimensions. From the workshop insights, a model for semantic dimensions was created. Afterward, 60 participants decided via a survey which semantic dimensions are useful to users. Detectors for the five most useful semantic dimensions were implemented in an extendable framework. To evaluate the semantic dimensions detectors, a dataset of 200 sentence pairs was created. The detectors reached an average F1 score of 0.815.tors reached an average F1 score of 0.815.)
Faster Feedback Cycles via Integration Testing Strategies for Serverless Edge Computing + (Serverless computing allows software engin … Serverless computing allows software engineers to develop applications in the cloud without having to manage the infrastructure. The infrastructure is managed by the cloud provider. Therefore, software engineers treat the underlying infrastructure as a black box and focus on the business logic of the application. This lack of inside knowledge leads to an increased testing difficulty as applications tend to be dependent on the infrastructure and other applications running in the cloud environment. While isolated unit and functional testing is possible, integration testing is a challenge, as reliable results are often only achieved after deploying to the deployment environment because infrastructure specifics and other cloud services are only available in the actual cloud environment. This leads to a laborious development process. For this reason, this thesis deals with creating testing strategies for serverless edge computing to reduce feedback cycles and speed up development time. For evaluation, the developed testing strategies are applied to Lambda@Edge in AWS.ategies are applied to Lambda@Edge in AWS.)
Influence of Load Profile Perturbation and Temporal Aggregation on Disaggregation Quality + (Smart Meters become more and more popular. … Smart Meters become more and more popular. With Smart Meter, new privacy issues arise. A prominent privacy issue is disaggregation, i.e., the determination of appliance usages from aggregated Smart Meter data. The goal of this thesis is to evaluate load profile perturbation and temporal aggregation techniques regarding their ability to prevent disaggregation. To this end, we used a privacy operator framework for temporal aggregation and perturbation, and the NILM TK framework for disaggregation. We evaluated the influence on disaggregation quality of the operators from the framework individually and in combination. One main observation is that the de-noising operator from the framework prevents disaggregation best.he framework prevents disaggregation best.)
Modelling and Enforcing Access Control Requirements for Smart Contracts + (Smart contracts are software systems emplo … Smart contracts are software systems employing the underlying blockchain technology to handle transactions in a decentralized and immutable manner. Due to the immutability of the blockchain, smart contracts cannot be upgraded after their initial deploy. Therefore, reasoning about a contract’s security aspects needs to happen before the deployment. One common vulnerability for smart contracts is improper access control, which enables entities to modify data or employ functionality they are prohibited from accessing. Due to the nature of the blockchain, access to data, represented through state variables, can only be achieved by employing the contract’s functions. To correctly restrict access on the source code level, we improve the approach by Reiche et al. who enforce access control policies based on a model on the architectural level.This work aims at correctly enforcing role-based access control (RBAC) policies for Solidity smart contract systems on the architectural and source code level. We extend the standard RBAC model by Sandhu, Ferraiolo, and Kuhn to also incorporate insecure information flows and authorization constraints for roles. We create a metamodel to capture the concepts necessary to describe and enforce RBAC policies on the architectural level. The policies are enforced in the source code by translating the model elements to formal specifications. For this purpose, an automatic code generator is implemented. To reason about the implemented smart contracts on the source code level, tools like solc-verify and Slither are employed and extended. Furthermore, we outline the development process resulting from the presented approach.To evaluate our approach and uncover problems and limitations, we employ a case study using the three smart contract software systems Augur, Fizzy and Palinodia. Additionally, we apply a metamodel coverage analysis to reason about the metamodel’s and the generator’s completeness. Furthermore, we provide an argumentation concerning the approach’s correct enforcement.This evaluation shows how a correct enforcement can be achieved under certain assumptions and when information flows are not considered. The presented approach can detect 100% of manually introduced violations during the case study to the underlying RBAC policies. Additionally, the metamodel is expressive enough to describe RBAC policies and contains no unnecessary elements, since approximately 90% of the created metamodel are covered by the implemented generator. We identify and describe limitations like oracles or public variables.itations like oracles or public variables.)
Methodology for Evaluating a Domain-Specific Model Transformation Language + (Sobald ein System durch mehrere Modelle be … Sobald ein System durch mehrere Modelle beschrieben wird, können sich diese verschiedenen Beschreibungen auch gegenseitig widersprechen. Modelltransformationen sind ein geeignetes Mittel, um das selbst dann zu vermeiden, wenn die Modelle von mehreren Parteien parallel bearbeitet werden. Es gibt mittlerweile reichhaltige Forschungsergebnisse dazu, Änderungen zwischen zwei Modellen zu transformieren. Allerdings ist die Herausforderung, Modelltransformationen zwischen mehr als zwei Modellen zu entwickeln, bislang unzureichend gelöst. Die Gemeinsamkeiten-Sprache ist eine deklarative, domänenspezifische Programmiersprache, mit der multidirektionale Modelltransformationen programmiert werden können, indem bidirektionale Abbildungsspezifikationen kombiniert werden. Da sie bis jetzt jedoch nicht empirisch validiert wurde, stellt es eine offene Frage dar, ob die Sprache dazu geeignet ist, realistische Modelltransformationen zu entwickeln, und welche Vorteile die Sprache gegenüber einer alternativen Programmiersprache für Modelltransformationen bietet.In dieser Abschlussarbeit entwerfe ich eine Fallstudie, mit der die Gemeinsamkeiten-Sprache evaluiert wird. Ich bespreche die Methodik und die Validität dieser Fallstudie. Weiterhin präsentiere ich Kongruenz, eine neue Eigenschaft für bidirektionale Modelltransformationen. Sie stellt sicher, dass die beiden Richtungen einer Transformation zueinander kompatibel sind. Ich leite aus praktischen Beispielen ab, warum wir erwarten können, dass Transformationen normalerweise kongruent sein werden. Daraufhin diskutiere ich die Entwurfsentscheidungen hinter einer Teststrategie, mit der zwei Modelltransformations- Implementierungen, die beide dieselbe Konsistenzspezifikation umsetzen, getestet werden können. Die Teststrategie beinhaltet auch einen praktischen Einsatzzweck von Kongruenz. Zuletzt stelle ich Verbesserungen der Gemeinsamkeiten-Sprache vor.Die Beiträge dieser Abschlussarbeit ermöglichen gemeinsam, eine Fallstudie zu Programmiersprachen für Modelltransformationen umzusetzen. Damit kann ein besseres Verständnis der Vorteile dieser Sprachen erzielt werden. Kongruenz kann die Benutzerfreundlichkeit beliebiger Modelltransformationen verbessern und könnte sich als nützlich herausstellen, um Modelltransformations-Netzwerke zu konstruieren. Die Teststrategie kann auf beliebige Akzeptanztests für Modelltransformationen angewendet werden. Modelltransformationen angewendet werden.)
Modeling of Security Patterns in Palladio + (Software itself and the contexts, it is us … Software itself and the contexts, it is used in, typically evolve over time. Analyzing and ensuring security of evolving software systems in contexts, that are also evolving, poses many difficulties. In my thesis I declared a number of goals and propose processes for the elicitation of attacks, their prerequisites and mitigating security patterns for a given architecture model and for annotation of it with security-relevant information. I showed how this information can be used to analyze the systems security, in regards of modeled attacks, using an attack validity algorithm I specify. Process and algorithm are used in a case study on CoCoME in order to show the applicability of each of them and to analyze the fulfillment of the previously stated goals. Security catalog meta-models and instances of catalogs containing a number of elements have been provided.g a number of elements have been provided.)
Multi-model Consistency through Transitive Combination of Binary Transformations + (Software systems are usually described thr … Software systems are usually described through multiple models that address different development concerns. These models can contain shared information, which leads to redundant representations of the same information and dependencies between the models. These representations of shared information have to be kept consistent, for the system description to be correct. The evolution of one model can cause inconsistencies with regards to other models for the same system. Therefore, some mechanism of consistency restoration has to be applied after changes occurred. Manual consistency restoration is error-prone and time-consuming, which is why automated consistency restoration is necessary. Many existing approaches use binary transformations to restore consistency for a pair of models, but systems are generally described through more than two models. To achieve multi-model consistency preservation with binary transformations, they have to be combined through transitive execution.In this thesis, we explore transitive combination of binary transformations and we study what the resulting problems are. We develop a catalog of six failure potentials that can manifest in failures with regards to consistency between the models. The knowledge about these failure potentials can inform a transformation developer about possible problems arising from the combination of transformations. One failure potential is a consequence of the transformation network topology and the used domain models. It can only be avoided through topology adaptations. Another failure potential emerges, when two transformations try to enforce conflicting consistency constraints. This can only be repaired through adaptation of the original consistency constraints. Both failure potentials are case-specific and cannot be solved without knowing which transformations will be combined. Furthermore, we develop two transformation implementation patterns to mitigate two other failure potentials. These patterns can be applied by the transformation developer to an individual transformation definition, independent of the combination scenario. For the remaining two failure potentials, no general solution was found yet and further research is necessary.We evaluate the findings with a case study that involves two independently developed transformations between a component-based software architecture model, a UML class diagram and its Java implementation. All failures revealed by the evaluation could be classified with the identified failure potentials, which gives an initial indicator for the completeness of our failure potential catalog. The proposed patterns prevented all failures of their targeted failure potential, which made up 70% of all observed failures, and shows that the developed implementation patterns are applicable and help to mitigate issues occurring from transitively combining binary transformations.sitively combining binary transformations.)
Abstrakte und konsistente Vertraulichkeitsspezifikation von der Architektur bis zum Code + (Software-Systeme können sensible Informati … Software-Systeme können sensible Informationen verarbeiten. Um ihre Vertraulichkeit zu gewährleisten, können sowohl das Architekturmodell, als auch seine Implementierung hinsichtlich des Informationsflusses untersucht werden. Dazu wird eine Vertraulichkeitsspezifikation definiert. Beide Modellebenen besitzen eine Repräsentation der gleichen Spezifikation. Wird das System weiterentwickelt, kann sie sich auf beiden Ebenen verändern und dementsprechend widersprüchliche Aussagen enthalten. Möchte man die Vertraulichkeit der Informationen verifizieren, müssen die Spezifikationselemente im Quellcode in einem zusätzlichen Schritt in eine weitere Sprache übersetzt werden. Die Bachelorarbeit beschäftigt sich mit der Transformation der unterschiedlichen Repräsentationen der Vertraulichkeitsspezifikation eines Software-Systems. Das beinhaltet ein Abbildungskonzept zur Konsistenzhaltung der Vertraulichkeitsspezifikation und die Übersetzung in eine Sprache, die zur Verifikation benutzt werden kann. die zur Verifikation benutzt werden kann.)
Automatisiertes GUI-basiertes Testen einer Passwortmanager-Applikation mit Neuroevolution + (Software-Testing ist essenziell zur Gewähr … Software-Testing ist essenziell zur Gewährleistung der Qualität und Funktionalität von Softwareprodukten. Es existieren sowohl manuelle als auch automatisierte Methoden. Allerdings weisen sowohl automatisierte Verfahren als auch menschliche und skriptbasierte Tests bezüglich Kosteneffizienz und Zeitaufwand Einschränkungen auf. Monkey-Testing, gekennzeichnet durch zufällige Klicks auf der Benutzeroberfläche, berücksichtigt dabei oft nicht ausreichend die Logik der Applikation.Diese Bachelorarbeit konzentriert sich auf die automatisierte neuroevolutionäre Testmethode, die neuronale Netze als Testagenten nutzt und diese mittels evolutionärer Algorithmen über mehrere Generationen hinweg verfeinert. Zur Evaluierung dieser Agenten und zum Vergleich mit Monkey-Testing wurde eine simulierte Version einer Passwort-Manager Applikation eingesetzt. Dabei wurde eine Belohnungsstruktur innerhalb der simulierten Anwendung implementiert. Die Ergebnisse verdeutlichen, dass das neuroevolutionäre Testverfahren im Hinblick auf die erzielten Belohnungen im Vergleich zum Monkey-Testing signifikant besser performt. Dies führt zu einer besseren Berücksichtigung der Anwendungslogik im Testprozess.tigung der Anwendungslogik im Testprozess.)
GUI-basiertes Testen einer Lernplattform-Anwendung durch Nutzung von Neuroevolution + (Software-Testing ist notwendig, um die Qua … Software-Testing ist notwendig, um die Qualität und Funktionsfähigkeit von Softwareartefakten sicherzustellen. Es gibt sowohl automatisierte als auch manuelle Testverfahren. Allerdings sind automatisierte Verfahren, sowie menschliches Testen und skriptbasiertes Testen in Bezug auf Zeitaufwand und Kosten weniger gut skalierbar. Monkey-Testing, das durch zufällige Klicks auf der Benutzeroberfläche gekennzeichnet ist, berücksichtigt die Applikationslogik oft nicht ausreichend.Der Fokus dieser Bachelorarbeit liegt auf dem automatisierten neuroevolutionären Testverfahren, das neuronale Netze als Testagenten verwendet und sie mithilfe evolutionärer Algorithmen über mehrere Generationen hinweg verbessert. Um das Training der Agenten zu ermöglichen und den Vergleich zum Monkey-Testing zu ermöglichen, wurde eine simulierte Version der Lernplattform Anki implementiert. Zur Beurteilung der Testagenten wurde eine Belohnungsstruktur in der simulierten Anwendung entwickelt.Die Ergebnisse zeigen, dass das neuroevolutionäre Testverfahren im Vergleich zum Monkey-Testing in Bezug auf erreichte Belohnungen signifikant besser abschneidet. Dadurch wird die Applikationslogik im Testprozess besser berücksichtigt.ogik im Testprozess besser berücksichtigt.)
Entity Linking für Softwarearchitekturdokumentation + (Softwarearchitekturdokumentationen enthalt … Softwarearchitekturdokumentationen enthalten Fachbegriffe aus der Domäne der Softwareentwicklung. Wenn man diese Begriffe findet und zu den passenden Begriffen in einer Datenbank verknüpft, können Menschen und Textverarbeitungssysteme diese Informationen verwenden, um die Dokumentation besser zu verstehen. Die Fachbegriffe in Dokumentationen entsprechen dabei Entitätserwähnungen im Text.In dieser Ausarbeitung stellen wir unser domänenspezifisches Entity-Linking-System vor. Das System verknüpft Entitätserwähnungen innerhalb von Softwarearchitekturdokumentationen zu den zugehörigen Entitäten innerhalb einer Wissensbasis. Das System enthält eine domänenspezifische Wissensbasis, ein Modul zur Vorverarbeitung und ein Entity-Linking-System.erarbeitung und ein Entity-Linking-System.)
Entwicklung einer Entwurfszeit-DSL zur Formalisierung von Runtime Adaptationsstrategien für SAS zum Zweck der Strategie-Optimierung + (Softwaresysteme der heutigen Zeit werden z … Softwaresysteme der heutigen Zeit werden zunehmend komplexer und unterliegen immermehr variierenden Bedingungen. Dadurch gewinnen selbst-adaptive Systeme an Bedeutung, da diese sich neuen Bedingungen dynamisch anpassen können, indem sie Veränderungen an sich selbst vornehmen. Domänenspezifische Modellierungssprachen (DSL) zur Formalisierung von Adaptionsstrategien stellen ein wichtiges Mittel dar, um den Entwurf von Rückkopplungsschleifen selbst-adaptiver Softwaresysteme zu modellieren und zu optimieren. Hiermit soll eine Bachelorarbeit vorgeschlagen werden, die sich mit der Fragestellung befasst, wie eine Optimierung von Adaptionsstrategien in einer DSL zur Entwurfszeit beschrieben werden kann. zur Entwurfszeit beschrieben werden kann.)
Preventing Code Insertion Attacks on Token-Based Software Plagiarism Detectors + (Some students tasked with mandatory progra … Some students tasked with mandatory programming assignments lack the time or dedication to solve the assignment themselves. Instead, they plagiarize a peer’s solution by slightly modifying the code. However, there exist numerous tools that assist in detecting these kinds of plagiarism. These tools can be used by instructors to identify plagiarized programs. The most used type of plagiarism detection tools is token-based plagiarism detectors. They are resilient against many types of obfuscation attacks, such as renaming variables or whitespace modifications. However, they are susceptible to inserting lines of code that do not affect the program flow or result.The current working assumption was that the successful obfuscation of plagiarism takes more effort and skill than solving the assignment itself. This assumption was broken by automated plagiarism generators, which exploit this weakness. This work aims to develop mechanisms against code insertions that can be directly integrated into existing token-based plagiarism detectors. For this, we first develop mechanisms to negate the negative effect of many types of code insertion. Then we implement these mechanisms prototypically into a state-of-the-art plagiarism detector. We evaluate our implementation by running it on a dataset consisting of real student submissions and automatically generated plagiarism. We show that with our mechanisms, the similarity rating of automatically generated plagiarism increases drastically. Consequently, the plagiarism generator we use fails to create usable plagiarisms.we use fails to create usable plagiarisms.)
Software Plagiarism Detection on Intermediate Representation + (Source code plagiarism is a widespread pro … Source code plagiarism is a widespread problem in computer science education. To counteract this, software plagiarism detectors can help identify plagiarized code. Most state-of-the-art plagiarism detectors are token-based. It is common to design and implement a new dedicated language module to support a new programming language. This process can be time-consuming, furthermore, it is unclear whether it is even necessary. In this thesis, we evaluate the necessity of dedicated language modules for Java and C/C++ and derive conclusions for designing new ones. To achieve this, we create a language module for the intermediate representation of LLVM. For the evaluation, we compare it to two existing dedicated language modules in JPlag. While our results show that dedicated language modules are better for plagiarism detection, language modules for intermediate representations show better resilience to obfuscation attacks. better resilience to obfuscation attacks.)
Portables Auto-Tuning paralleler Anwendungen + (Sowohl Offline- als auch Online-Tuning ste … Sowohl Offline- als auch Online-Tuning stellen gängige Lösungen zur automatischen Optimierung von parallelen Anwendungen dar. Beide Verfahren haben ihre individuellen Vor- und Nachteile: das Offline-Tuning bietet minimalen negativen Einfluss auf die Laufzeiten der Anwendung, die getunten Parameterwerte sind allerdings nur auf im Voraus bekannter Hardware verwendbar. Online-Tuning hingegen bietet dynamische Parameterwerte, die zur Laufzeit der Anwendung und damit auf der Zielhardware ermittelt werden, dies kann sich allerdings negativ auf die Laufzeit der Anwendung ausüben.Wir versuchen die Vorteile beider Ansätze zu verschmelzen, indem im Voraus optimierte Parameterkonfigurationen auf der Zielhardware, sowie unter Umständen mit einer anderen Anwendung, verwendet werden. Wir evaluieren sowohl die Hardware- als auch die Anwendungsportabilität der Konfigurationen anhand von fünf Beispielanwendungen.ionen anhand von fünf Beispielanwendungen.)
DomainML: A modular framework for domain knowledge-guided machine learning + (Standard, data-driven machine learning app … Standard, data-driven machine learning approaches learn relevant patterns solely from data. In some fields however, learning only from data is not sufficient. A prominent example for this is healthcare, where the problem of data insufficiency for rare diseases is tackled by integrating high-quality domain knowledge into the machine learning process.Despite the existing work in the healthcare context, making general observations about the impact of domain knowledge is difficult, as different publications use different knowledge types, prediction tasks and model architectures. It further remains unclear if the findings in healthcare are transferable to other use-cases, as well as how much intellectual effort this requires.With this Thesis we introduce DomainML, a modular framework to evaluate the impact of domain knowledge on different data science tasks. We demonstrate the transferability and flexibility of DomainML by applying the concepts from healthcare to a cloud system monitoring. We then observe how domain knowledge impacts the model’s prediction performance across both domains, and suggest how DomainML could further be used to refine both the given domain knowledge as well as the quality of the underlying dataset. as the quality of the underlying dataset.)
State of the Art: Multi Actor Behaviour and Dataflow Modelling for Dynamic Privacy + (State of the Art Vortrag im Rahmen der Praxis der Forschung.)
Data-Preparation for Machine-Learning Based Static Code Analysis + (Static Code Analysis (SCA) has become an i … Static Code Analysis (SCA) has become an integral part of modern software development, especially since the rise of automation in the form of CI/CD. It is an ongoing question of how machine learning can best help improve SCA's state and thus facilitate maintainable, correct, and secure software. However, machine learning needs a solid foundation to learn on. This thesis proposes an approach to build that foundation by mining data on software issues from real-world code. We show how we used that concept to analyze over 4000 software packages and generate over two million issue samples. Additionally, we propose a method for refining this data and apply it to an existing machine learning SCA approach.an existing machine learning SCA approach.)
Creating Study Plans by Generating Workflow Models from Constraints in Temporal Logic + (Students are confronted with a huge amount … Students are confronted with a huge amount of regulations when planning their studies at a university. It is challenging for them to create a personalized study plan while still complying to all official rules. The STUDYplan software aims to overcome the difficulties by enabling an intuitive and individual modeling of study plans. A study plan can be interpreted as a sequence of business process tasks that indicate courses to make use of existing work in the business process domain. This thesis focuses on the idea of synthesizing business process models from declarative specifications that indicate official and user-defined regulations for a study plan. We provide an elaborated approach for the modeling of study plan constraints and a generation concept specialized to study plans. This work motivates, discusses, partially implements and evaluates the proposed approach.ments and evaluates the proposed approach.)
A comparative study of subgroup discovery methods + (Subgroup discovery is a data mining techni … Subgroup discovery is a data mining technique that is used to extract interesting relationships in a dataset related to to a target variable. These relationships are described in the form of rules. Multiple SD techniques have been developed over the years. This thesis establishes a comparative study between a number of these techniques in order to identify the state-of-the-art methods. It also analyses the effects discretization has on them as a preprocessing step . Furthermore, it investigates the effect of hyperparameter optimization on these methods. Our analysis showed that PRIM, DSSD, Best Interval and FSSD outperformed the other subgroup discovery methods evaluated in this study and are to be considered state-of-the-art . It also shows that discretization offers an efficiency improvement on methods that do not employ internal discretization. It has a negative impact on the quality of subgroups generated by methods that perform it internally. The results finally demonstrates that Apriori-SD and SD-Algorithm were the most positively affected by the hyperparameter optimization.fected by the hyperparameter optimization.)
Software Testing + (TBA)
Exploring Modern IDE Functionalities for Consistency Preservation + (TBA)
Exploring the Traceability of Requirements and Source Code via LLMs + (TBA)
Preventing Refactoring Attacks on Software Plagiarism Detection through Graph-Based Structural Normalization + (TBD)
Generation of Checkpoints for Hardware Architecture Simulators + (TBD)
Konzept und Integration eines Deltachain Prototyps + (TBD)
Data-Driven Approaches to Predict Material Failure and Analyze Material Models + (Te prediction of material failure is usefu … Te prediction of material failure is useful in many industrial contexts such as predictive maintenance, where it helps reducing costs by preventing outages. However, failure prediction is a complex task. Typically, material scientists need to create a physical material model to run computer simulations. In real-world scenarios, the creation of such models is ofen not feasible, as the measurement of exact material parameters is too expensive. Material scientists can use material models to generate simulation data. Tese data sets are multivariate sensor value time series. In this thesis we develop data-driven models to predict upcoming failure of an observed material. We identify and implement recurrent neural network architectures, as recent research indicated that these are well suited for predictions on time series. We compare the prediction performance with traditional models that do not directly predict on time series but involve an additional step of feature calculation. Finally, we analyze the predictions to fnd abstractions in the underlying material model that lead to unrealistic simulation data and thus impede accurate failure prediction. Knowing such abstractions empowers material scientists to refne the simulation models. The updated models would then contain more relevant information and make failure prediction more precise. and make failure prediction more precise.)
Improving SAP Document Information Extraction via Pretraining and Fine-Tuning + (Techniques for extracting relevant informa … Techniques for extracting relevant information from documents have made significant progress in recent years and became a key task in the digital transformation. With deep neural networks, it became possible to process documents without specifying hard-coded extraction rules or templates for each layout. However, such models typically have a very large number of parameters. As a result, they require many annotated samples and long training times. One solution is to create a basic pretrained model using self-supervised objectives and then to fine-tune it using a smaller document-specific annotated dataset. However, implementing and controlling the pretraining and fine-tuning procedures in a multi-modal setting is challenging. In this thesis, we propose a systematic method that consists in pretraining the model on large unlabeled data and then to fine-tune it with a virtual adversarial training procedure. For the pretraining stage, we implement an unsupervised informative masking method, which improves upon standard Masked-Language Modelling (MLM). In contrast to randomly masking tokens like in MLM, our method exploits Point-Wise Mutual Information (PMI) to calculate individual masking rates based on statistical properties of the data corpus, e.g., how often certain tokens appear together on a document page. We test our algorithm in a typical business context at SAP and report an overall improvement of 1.4% on the F1-score for extracted document entities. Additionally, we show that the implemented methods improve the training speed, robustness and data-efficiency of the algorithm.ness and data-efficiency of the algorithm.)
Analyse von Zeitreihen-Kompressionsmethoden am Beispiel von Google N-Grams + (Temporal text corpora like the Google Ngra … Temporal text corpora like the Google Ngram dataset usually incorporate a vast number of words and expressions, called ngrams, and their respective usage frequencies over the years. The large quantity of entries complicates working with the dataset, as transformations and queries are resource and time intensive. However, many use-cases do not require the whole corpus to have a sufficient dataset and achieve acceptable results. We propose various compression methods to reduce the absolute number of ngrams in the corpus. Additionally, we utilize time-series compression methods for quick estimations about the properties of ngram usage frequencies. As basis for our compression method design and experimental validation serve CHQL (Conceptual History Query Language) queries on the Google Ngram dataset. The goal is to find compression methods that reduce the complexity of queries on the corpus while still maintaining good results.rpus while still maintaining good results.)
Analyse von Zeitreihen-Kompressionsmethoden am Beispiel von Google N-Gram + (Temporal text corpora like the Google Ngra … Temporal text corpora like the Google Ngram Data Set usually incorporate a vast number of words and expressions, called ngrams, and their respective usage frequencies over the years. The large quantity of entries complicates working with the data set, as transformations and queries are resource and time intensive. However, many use cases do not require the whole corpus to have a sufficient data set and achieve acceptable query results. We propose various compression methods to reduce the total number of ngrams in the corpus. Specially, we propose compression methods that, given an input dictionary of target words, find a compression tailored for queries on a specific topic. Additionally, we utilize time-series compression methods for quick estimations about the properties of ngram usage frequencies. As basis for our compression method design and experimental validation serve CHQL (Conceptual History Query Language) queries on the Google Ngram Data Set.age) queries on the Google Ngram Data Set.)
Implementation and Evaluation of CHQL Operators in Relational Database Systems + (The IPD defined CHQL, a query algebra that … The IPD defined CHQL, a query algebra that enables to formalize queries about conceptual history. CHQL is currently implemented in MapReduce which offers less flexibility for query optimization than relational database systems does. The scope of this thesis is to implement the given operators in SQL and analyze performance differences by identifying limiting factors and query optimization on the logical and physical level. At the end, we will provide efficient query plans and fast operator implementations to execute CHQL queries in relational database systems.QL queries in relational database systems.)
The Kconfig Variability Framework as a Feature Model + (The Kconfig variability framework is used … The Kconfig variability framework is used to develop highly variable software such as the Linux kernel, ZephyrOS and NuttX. Kconfig allows developers to break down their software in modules and define the dependencies between these modules, so that when a concrete configuration is created, the semantic dependencies between the selected modules are fulfilled, ensuring that the resulting software product can function. Kconfig has often been described as a tool of define software product lines (SPLs), which often occur within the context of feature-oriented programming (FOP). In this paper, we introduce methods to transform Kconfig files into feature models so that the semantics of the model defined in a Kconfig file are preserved. The resulting feature models can be viewed with FeatureIDE, which allows the further analysis of the Kconfig file, such as the detection of redundant dependencies and cyclic dependencies.dant dependencies and cyclic dependencies.)
Review of data efficient dependency estimation + (The amount and complexity of data collecte … The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance. Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing these challenges and more. We partly developed the criteria our self as well as took them from relevant publications. The existing publications formulated many of the criteria only qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.From 14 selected criteria, we focus on criteria concerning data efficiency and uncertainty estimation, because they are essential for lowering the cost of dependency estimation, but we will also check other criteria relevant for the application of algorithms.As a result, we will rank the algorithms in the different aspects given by the criteria, and thereby identify potential for improvement of the current algorithms.We do this in two steps, first we check general criteria in a qualitative analysis. For this we check if the algorithm is capable of guided sampling, if it is an anytime algorithm and if it uses incremental computation to enable early stopping, which all leads to more data efficiency.We also conduct a quantitative analysis on well-established and representative datasets for the dependency estimation algorithms, that performed well in the qualitative analysis.In these experiments we evaluate more criteria:The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.hich ensures we can rely on an estimation.)
Identifying Security Requirements in Natural Language Documents + (The automatic identification of requiremen … The automatic identification of requirements, and their classification according to their security objectives, can be helpful to derive insights into the security of a given system. However, this task requires significant security expertise to perform. In this thesis, the capability of modern Large Language Models (such as GPT) to replicate this expertise is investigated. This requires the transfer of the model's understanding of language to the given specific task. In particular, different prompt engineering approaches are combined and compared, in order to gain insights into their effects on performance. GPT ultimately performs poorly for the main tasks of identification of requirements and of their classification according to security objectives. Conversely, the model performs well for the sub-task of classifying the security-relevance of requirements. Interestingly, prompt components influencing the format of the model's output seem to have a higher performance impact than components containing contextual information.ponents containing contextual information.)
Predicting System Dependencies from Tracing Data Instead of Computing Them + (The concept of Artificial Intelligence for … The concept of Artificial Intelligence for IT Operations combines big data and machine learning methods to replace a broad range of IT operations including availability and performance monitoring of services. In large-scale distributed cloud infrastructures a service is deployed on different separate nodes. As the size of the infrastructure increases in production, the analysis of metrics parameters becomes computationally expensive. We address the problem by proposing a method to predict dependencies between metrics parameters of system components instead of computing them. To predict the dependencies we use time windowing with different aggregation methods and distributed tracing data that contain detailed information for the system execution workflow. In this bachelor thesis, we inspect the different representations of distributed traces from simple counting of events to more complex graph representations. We compare them with each other and evaluate the performance of such methods. evaluate the performance of such methods.)
Change Detection in High Dimensional Data Streams + (The data collected in many real-world scen … The data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect "when" a drift occurs and neglect the fact that various drifts may occur in different dimensions, i.e., they do not detect "where" a drift occurs. This is particularly problematic when data streams are high-dimensional. The goal of this Master’s thesis is to develop and evaluate a framework to efficiently and effectively detect “when” and “where” concept drift occurs in high-dimensional data streams. We introduce stream autoencoder windowing (SAW), an approach based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems compared to benchmarks on real data streams.mpared to benchmarks on real data streams.)
Automated Test Selection for CI Feedback on Model Transformation Evolution + (The development of the transformation mode … The development of the transformation model also comes with the appropriate system-level testing to verify its changes. Due to the complex nature of the transformation model, the number of tests increases as the structure and feature description become more detailed. However, executing all test cases for every change is costly and time-consuming. Thus, it is necessary to conduct a selection for the transformation tests. In this presentation, you will be introduced to a change-based test prioritization and transformation test selection approach for early fault detection.ection approach for early fault detection.)
Statistical Generation of High Dimensional Data Streams with Complex Dependencies + (The evaluation of data stream mining algor … The evaluation of data stream mining algorithms is an important task in current research. The lack of a ground truth data corpus that covers a large number of desireable features (especially concept drift and outlier placement) is the reason why researchers resort to producing their own synthetic data. This thesis proposes a novel framework ("streamgenerator") that allows to create data streams with finely controlled characteristics. The focus of this work is the conceptualization of the framework, however a prototypical implementation is provided as well. We evaluate the framework by testing our data streams against state-of-the-art dependency measures and outlier detection algorithms.measures and outlier detection algorithms.)
Statistical Generation of High-Dimensional Data Streams with Complex Dependencies + (The extraction of knowledge from data stre … The extraction of knowledge from data streams is one of the most crucial tasks of modern day data science. Due to their nature data streams are ever evolving and knowledge derrived at one point in time may be obsolete in the next period. The need for specialized algorithms that can deal with high-dimensional data streams and concept drift is prevelant.A lot of research has gone into creating these kind of algorithms. The problem here is the lack of data sets with which to evaluate them. A ground truth for a common evaluation approach is missing. A solution to this could be the synthetic generation of data streams with controllable statistical propoerties, such as the placement of outliers and the subspaces in which special kinds of dependencies occur. The goal of this Bachelor thesis is the conceptualization and implementation of a framework which can create high-dimensional data streams with complex dependencies.al data streams with complex dependencies.)
Theory-guided Load Disaggregation in an Industrial Environment + (The goal of Load Disaggregation (or Non-in … The goal of Load Disaggregation (or Non-intrusive Load Monitoring) is to infer the energy consumption of individual appliances from their aggregated consumption. This facilitates energy savings and efficient energy management, especially in the industrial sector.However, previous research showed that Load Disaggregation underperforms in the industrial setting compared to the household setting. Also, the domain knowledge available about industrial processes remains unused.The objective of this thesis was to improve load disaggregation algorithms by incorporating domain knowledge in an industrial setting. First, we identified and formalized several domain knowledge types that exist in the industry. Then, we proposed various ways to incorporate them into the Load Disaggregation algorithms, including Theory-Guided Ensembling, Theory-Guided Postprocessing, and Theory-Guided Architecture. Finally, we implemented and evaluated the proposed methods.mented and evaluated the proposed methods.)
Tuning of Explainable ArtificialIntelligence (XAI) tools in the field of textanalysis + (The goal of this bachelor thesis was to an … The goal of this bachelor thesis was to analyse classification results using a 2017 published method called shap. Explaining how an artificial neural network makes a decision is an interdisciplinary research subject combining computer science, math, psychology and philosophy. We analysed these explanations from a psychological standpoint and after presenting our findings we will propose a method to improve the interpretability of text explanations using text-hierarchies, without loosing much/any accuracy. Secondary, the goal was to test out a framework developed to analyse a multitude of explanation methods. This Framework will be presented next to our findings and how to use it to create your own analysis. This Bachelor thesis is addressed at people familiar with artificial neural networks and other machine learning methods.tworks and other machine learning methods.)
Specifying and Maintaining the Correspondence between Architecture Models and Runtime Observations + (The goal of this thesis is to provide a ge … The goal of this thesis is to provide a generic concept of a correspondence model (CM) to map high-level model elements to corresponding low-level model elements and to generate this mapping during implementation of the high-level model using a correspondence model generator (CGM). In order to evaluate our approach, we implement and integrate the CM for the iObserve project. Further we implement the proposed CMG and integrate it into ProtoCom, the source code generator used by the iObserve project. We first evaluate the feasibility of this approach by checking whether such a correspondence model can be specified as desired and generated by the CGM. Secondly, we evaluate the accuracy of the approach by checking the generated correspondences against a reference model.correspondences against a reference model.)
Intelligent Match Merging to Prevent Obfuscation Attacks on Software Plagiarism Detectors + (The increasing number of computer science … The increasing number of computer science students has prompted educators to rely on state-of-the-art source code plagiarism detection tools to deter the submission of plagiarized coding assignments. While these token-based plagiarism detectors are inherently resilient against simple obfuscation attempts, recent research has shown that obfuscation tools empower students to easily modify their submissions, thus evading detection. These tools automatically use dead code insertion and statement reordering to avoid discovery. The emergence of ChatGPT has further raised concerns about its obfuscation capabilities and the need for effective mitigation strategies.Existing defence mechanisms against obfuscation attempts are often limited by their specificity to certain attacks or dependence on programming languages, requiring tedious and error-prone reimplementation. In response to this challenge, this thesis introduces a novel defence mechanism against automatic obfuscation attacks called match merging. It leverages the fact that obfuscation attacks change the token sequence to split up matches between two submissions so that the plagiarism detector discards the broken matches. Match merging reverts the effects of these attacks by intelligently merging neighboring matches based on a heuristic designed to minimize false positives.Our method’s resilience against classic obfuscation attacks is demonstrated through evaluations on diverse real-world datasets, including undergrad assignments and competitive coding challenges, across six different attack scenarios. Moreover, it significantly improves detection performance against AI-based obfuscation. What sets our method apart is its language- and attack-independence while its minimal runtime overhead makes it seamlessly compatible with other defence mechanisms. compatible with other defence mechanisms.)
Efficient k-NN Search of Time Series in Arbitrary Time Intervals + (The k nearest neighbors (k-NN) of a time s … The k nearest neighbors (k-NN) of a time series are the k closest sequences within adataset regarding a distance measure. Often, not the entire time series, but only specifictime intervals are of interest, e.g., to examine phenomena around special events. Whilenumerous indexing techniques support the k-NN search of time series, none of themis designed for an efficient interval-based search. This work presents the novel indexstructure Time Series Envelopes Index Tree (TSEIT), that significantly speeds up the k-NNsearch of time series in arbitrary user-defined time intervals. in arbitrary user-defined time intervals.)