Suche mittels Attribut

Diese Seite stellt eine einfache Suchoberfläche zum Finden von Objekten bereit, die ein Attribut mit einem bestimmten Datenwert enthalten. Andere verfügbare Suchoberflächen sind die Attributsuche sowie der Abfragengenerator.

Liste der Ergebnisse

Collective Entity Matching for Linking Structures in Attributed Material Graphs + (In data analysis, entity matching (EM) or … In data analysis, entity matching (EM) or entity resolution is the task of finding the same entity within different data sources. When joining different data sets, it is a required step where the same entities may not always share a common identifier. When applied to graph data like knowledge graphs, ontologies, or abstractions of physical systems, the additional challenge of entity relationships comes into play. Now, not just the entities themselves but also their relationships and, therefore, their neighborhoods need to match. These relationships can also be used to our advantage, which builds the foundation for collective entity matching (CEM).In this bachelor thesis, we focus on a graph data set based on a material simulation with the intent to match entities between neighboring system states. The goal is to identify structures that evolve over time and link their states with a common identifier. Current CEM Algorithms assume perfect matches to be possible, i.e., every entity can be matched. We want to overcome this challenge and address the high imbalance of potential candidates and impossible matches. A third major challenge is the large volumes of data which requires our algorithm to be efficient.ch requires our algorithm to be efficient.)
Online Nyström MMD Approximation + (In data analysis, the ability to detect an … In data analysis, the ability to detect and understand critical shifts in information patterns holds immense significance. Whether it is monitoring real-time network traffic, identifying anomalies in financial markets, or tracking fluctuations in climate data, the ability to swiftly identify change points is crucial for effective decision-making. Since the default implementation of MMD is quadratic the algorithms to enable this however tend to exceed runtime limits for certain contexts, such as those where the speed and volume of incoming data is relatively high. In continuation of recent developments in change point detection optimization through estimators, notably RADMAN, we propose to integrate the “Nyström” estimator into a similar context of exponential bucketing to improve on this matter. This thesis will focus on the concept, the implementation and testing of this construct and its comparison to other recent approaches.its comparison to other recent approaches.)
Verfahren zur Reduktion von neuronalen Netzen - Analyse und Automatisierung + (In den vergangenen Jahren sind vermehrt An … In den vergangenen Jahren sind vermehrt Anwendungen von Neuronalen Netzen (NN) entstanden. Ein aktuelles Problem ist der beachtliche Ressourcenbedarf an Speicher, Rechenkapazität oder Energie, den nicht nur die Trainingsphasen, sondern auch die Anwendungsphasen von neuronalen Netzen erfordern. Aus diesem Grund ist eine erfolgreiche Verbreitung von neuronalen Netzen auf ressourcenbeschränkten Plattformen mit geringer Leistung momentan noch mit zahlreichen Herausforderungen verbunden.Die vorliegende Arbeit untersucht diese Problematik und stellt Techniken vor, wie vollständig trainierte neuronale Netze möglichst unter Erhaltung der Genauigkeit in der Anzahl ihrer Neuronen und Verbindungen reduziert werden können. Mithilfe von Experimenten in TensorFlow und Keras wird gezeigt, welche dieser Verfahren sich im Kontext von verschiedenen Praxisbeispielen eignen. Weiterhin beschreibt die Arbeit einen neuen Ansatz SNARE (Score-based Neural Architecture REduction) mit dem Ziel, eine Reduktion nicht nur auf einzelnen Schichten, sondern auf gesamten Netzwerken automatisiert durchzuführen. Die Tool-Implementierung von SNARE analysiert dazu zunächst die Struktur von trainierten Keras NNs mit TensorFlow Backend. Unter der Berücksichtigung von verschiedenen Kriterien wie dem FLOP-Beitrag werden anschließend iterativ Schichten ausgewählt, Reduktionsoperationen angewendet und durch erneutes Trainieren entstandene Fehler kompensiert.Ergebnisse zeigen, dass SNARE auf einer LeNet5-Architektur bei einem Genauigkeitsverlust von 0,39% eine Parameterreduktion um den Faktor 35 erreicht. Zusätzlich erzielte SNARE auf einem NN zur Erkennung von menschlichen Bewegungen aus mobilen Sensordaten eine Reduktionsrate von 245 bei gleicher Genauigkeit.ionsrate von 245 bei gleicher Genauigkeit.)
Ausgestaltung von Data-Science Methoden zur Bearbeitung ungelöster Mathematik-Probleme + (In der Mathematik gibt es unzählige ungelö … In der Mathematik gibt es unzählige ungelöste Probleme, welche die Wissenschaft beschäftigen.Dabei stellen sie eine wichtige Aufgabe und Herausforderung dar.Und es wird stetig versucht ihrer Lösung Schritt für Schritt näher zu kommen.Unter diesen bisher noch ungelösten Problemen der Mathematik ist auch das sogenannte „Frankl-Conjecture“ (ebenfalls bekannt unter dem Namen „Union-Closed Set Conjecture“).Diese Vermutung besagt, dass für jede, unter Vereinigung abgeschlossene Familie von Mengen, ein Element existiert, welches in mindestens der Hälfte der Familien-Mengen enthalten ist.Auch diese Arbeit hat das Ziel der Lösung dieses Problems Schritt für Schritt näher zu kommen, oder zumindest hilfreiche neue Werkzeuge für eine spätere Lösung bereitzustellen.Dafür wurde versucht eine Bearbeitung mit Hilfe von Data-Science-Methoden durchzuführen.Dies geschah, indem zunächst möglichst viele Beispiele für das Conjecture zufällig generiert wurden.Anschließend konnten diese generierten Beispiele betrachtet und weiter analysiert werden.e betrachtet und weiter analysiert werden.)
Optimierung von Inkrementellen Modellanalysen + (In der Modellgetriebenen Softwareentwicklu … In der Modellgetriebenen Softwareentwicklung sind Analysen der entstehenden Modelle notwendig, um Validierungen schon auf der Modellebene durchführen zu können, um so kostenintensiveren Fehlern vorzubeugen und Kosten zu sparen. Allerdings sind die Modelle stetigen Änderungen unterworfen, die sich auch auf die Analyseergebnisse auswirken können, die man gerne stets aktuell hätte. Da die Modelle sehr groß werden können, sich aber immer nur kleine Teile dieser Modelle ändern, ist es sinnvoll diese Analysen inkrementell zu gestalten. Ein Ansatz für solche inkrementellen Modellanalysen ist NMF Expressions, das im Hintergrund einen Abhängigkeitsgraphen der Analyse aufbaut und bei jeder atomaren Änderung des Modells aktualisiert. Die Effizienz der Analysen hängt dabei aber oft von der genauen Formulierung der Anfragen ab. Eine ungeschickte Formulierung kann somit zu einer ineffizienten Analyse führen. In der Datenbankwelt hingegen spielt die genaue Formulierung der Anfragen keine so große Rolle, da automatische Optimierungen der Anfragen üblich sind. In dieser Masterarbeit wird untersucht, inwieweit sich die Konzepte der Optimierungen von Anfragen aus der Datenbankwelt auf die Konzepte von inkrementelle Modellanalysen übertragen lassen. Am Beispiel von NMF Expression wird gezeigt, wie solche Optimierungen für inkrementelle Modellanalysen umgesetzt werden können. Die implementierten Optimierungen werden anhand von definierten Modellanalysen getestet und evaluiert.ten Modellanalysen getestet und evaluiert.)
Linking Software Architecture Documentation and Models + (In der Softwareentwicklung ist die Konsist … In der Softwareentwicklung ist die Konsistenz zwischen Artefakten ein wichtiges Thema. Diese Arbeit schlägt eine Struktur zur Erkennung von korrespondierenden und fehlenden Elementen zwischen einer Dokumentation und einem formalen Modell vor. Zunächst identifiziert und extrahiert der Ansatz die im Text beschriebenen Modell-instanzen und -beziehungen. Dann verbindet der Ansatz diese Textelemente mit ihren entsprechenden Gegenstücken im Modell. Diese Verknüpfungen sind mit Trace-Links vergleichbar. Der Ansatz erlaubt jedoch die Abstufung dieser Links. Darüber hinaus werden Empfehlungen für Elemente generiert, die nicht im Modell enthalten sind.Der Ansatz identifiziert Modellnamen und -typen mit einem F1-Wert von über 54%. 60% der empfohlenen Instanzen stimmen mit den in der Benutzerstudie gefundenen Instanzen überein. Bei der Identifizierung von Beziehungen und dem Erstellen von Verknüpfungen erzielte der Ansatz vielversprechende Ergebnisse. Die Ergebnisse können durch zukünftige Arbeiten verbessert werden.Dies ist realisierbar da der Entwurf eine einfache Erweiterung des Ansatzes erlaubt.einfache Erweiterung des Ansatzes erlaubt.)
Untersuchung des Einflusses von Kommunikationsmodellen auf die Zusammensetzbarkeit von Informationsflusseigenschaften + (In der Softwareentwicklung wird häufig das … In der Softwareentwicklung wird häufig das Prinzip verwendet, ein großes System aus kleineren Teilsystemen zusammenzusetzen. Dies erfordert eine Kommunikation zwischen den Teilsystemen, um Informationen auszutauschen. Allerdings kann dabei der Informationsfluss durch das Gesamtsystem unsicher werden und somit die Vertraulichkeit, eine der wichtigsten Sicherheitseigenschaften eines Systems, verletzt werden. Um sicheren Informationsfluss zu erzielen, müssen sogenannte Informationsflusseigenschaften erfüllt werden. Aus der Literatur ist bekannt, dass Informationsflusseigenschaften bei der Komposition von sicheren Systemen verletzt werden können. Das bedeutet, wenn zwei sichere Systeme zusammengesetzt werden, besteht die Möglichkeit, dass das Gesamtsystem unsicher wird. Hierbei spielt die Art der Kommunikation zwischen den Teilsystemen eine entscheidende Rolle. Die Literatur liefert Ergebnisse, die zeigen, dass synchrone Kommunikation die Zusammensetzbarkeit verletzt, während asynchrone Kommunikation die Zusammensetzbarkeit gewährleistet. Allerdings existieren in der Literatur keine konkreten Ergebnisse darüber, wie sich Abstufungen von synchroner zu asynchroner Kommunikation auf die Zusammensetzbarkeit auswirken.In dieser Arbeit wird untersucht, wie sich verschiedene Kommunikationsformen zwi-schen synchroner und asynchroner Kommunikation auf die Zusammensetzbarkeit von Informationsflusseigenschaften auswirken. Hierfür werden generische Konzepte zur Modellierung asynchroner Kommunikationsformen entwickelt. Die Untersuchung erfolgt mithilfe von Timed Automata. Es wird ein Beispiel modelliert, in dem zwei sichere Systeme, die als Timed Automata modelliert sind, zusammengesetzt werden und unter synchroner Kommunikation ein unsicheres Gesamtsystem bilden. Anschließend wird die synchrone Kommunikation mithilfe der entwickelten Modellierungskonzepte durch asynchrone Kommunikationsformen ersetzt und für jede Form wird die Sicherheit des zusammengesetzten Systems überprüft. Zur Modellierung und Überprüfung des Gesamtsystems hinsichtlich des Erhalts von Informationsflusseigenschaften wird in dieser Arbeit das Werkzeug UPPAAL verwendet. Neben den Modellierungskonzepten liefert diese Arbeit konkrete Ergebnisse über dieAuswirkungen der Kommunikationsformen auf die Zusammensetzbarkeit, was einen weiteren Beitrag darstellt. Basierend auf diesen Ergebnissen werden die Eigenschaften einer Kommunikationsform abgeleitet, die für die Zusammensetzbarkeit erforderlich sind,sowie Eigenschaften, die sich negativ auswirken. Im Hinblick auf die abgeleiteten Eigenschaften wird für die prozedurale Kommunikationdiskutiert, wie diese sich auf die Zusammensetzbarkeit auswirkt. Dafür wird sie in die synchrone und asynchrone Kommunikation eingeordnet. und asynchrone Kommunikation eingeordnet.)
Quantitativer Vergleich von Metriken für mehrdimensionale Abhängigkeiten + (In der datengetriebenen Forschung ist das … In der datengetriebenen Forschung ist das Analysieren hochdimensionaler Daten von zentraler Bedeutung. Hierbei ist es nicht immer ausreichend lediglich Abhängigkeiten zwischen Paaren von Attributen zu erkennen. Häufig sind hier Abhängigkeiten zwischen mehreren Attributen vorhanden, welche sich zwischen den zweidimensionalen Paaren nicht feststellen lassen. Zur Erkennung monotoner Zusammenhänge zwischen beliebig vielen Dimensionen existiert bereits eine mehrdimensionale Erweiterung des Spearman Rangkorrelationskoeffizienten, für beliebige Abhängigkeiten existiert jedoch kein solches erprobtes Maß. Hier setzt diese Arbeit an und vergleicht die beiden multivariaten informationstheoretischen Metriken "allgemeine Redundanz" und "Interaktionsinformation" miteinander. Als Basislinie für diesen Vergleich dienen die Spearman Rangkorrelation, sowie das Kontrastmaß von HiCS.rrelation, sowie das Kontrastmaß von HiCS.)
Modellierung geschachtelter Freiheitsgrade zur automatischen Evaluation von Software-Architekturen + (In der modernen Software-Entwicklung wird … In der modernen Software-Entwicklung wird eine Vielzahl von Subsystemen von Drittanbietern wiederverwendet, deren Realisierungen und Varianten jeweils einen dedizierten Einfluss auf die Qualitätseigenschaften des Gesamtsystems implizieren. Doch nicht nur die Realisierung und Variante eines Subsystems, sondern auch die Platzierung in der Zielarchitektur haben einen Einfluss auf die resultierende Qualität.In dieser Arbeit wird der bestehende Ansatz zur Modellierung und Simulation von wiederverwendbaren Subsystemen in Palladio bzw. PerOpteryx um einen neuen Inklusionsmechanismus erweitert, der eine flexible, feingranulare Modellierung und anschließende automatisierte Qualitätsoptimierung der Platzierung von wiederverwendbaren Subsystemen ermöglicht. Dazu wird eine domänenspezifische Sprache definiert, die eine deklarativen Beschreibung der Einwebepunkte in einem Architekturmodell durch aspektorientierte Semantiken erlaubt. Mithilfe eines Modellwebers werden die wiederverwendbaren Subsysteme in eine annotierte Zielarchitektur eingewebt. Schließlich wird der Ansatz in die automatisierte Qualitätsoptimierung von PerOpteryx integriert, sodass der Architekt bei seinen Entwurfsentscheidungen bezüglich dieser Freiheitsgrade unterstützt wird. Das vorgestellte Verfahren wurde durch eine simulationsbasierte Fallstudie anhand von realen Applikationsmodellen evaluiert. Es hat sich gezeigt, dass der Ansatz geeignet ist, um eine Vielzahl von Architekturkandidaten automatisiert generieren bzw. evaluieren und somit einen Architekten bei seinen Entwurfsentscheidungen unterstützen zu können.urfsentscheidungen unterstützen zu können.)
Identification and refactoring of bad smells in model-based analyses + (In der modernen Softwareentwicklung sind m … In der modernen Softwareentwicklung sind modellbasierte Analysen weit verbreitet. Software-Metriken wie die Vorhersage der Cache-Nutzung haben heute ein breites Anwendungsspektrum. Diese Analysen bedürfen ebenso wie traditionelle objektorientierte Programme der Pflege. Bad Smells und ihre Auswirkungen in objektorientiertem Quellcode sind gründlich erforscht worden. Dies fehlt bei der modellbasierten Analyse. Wir haben uns mit objektorientierten Bad Smells beschäftigt und nach ähnlichen Problemen in der modellbasierten Analyse gesucht. Schlechte Gerüche in der Analyse sind ein Faktor, der zur Qualität der Analysesoftware beiträgt. Eine geringere Qualität erschwert den Entwicklungsprozess der Analyse. Wir haben zehn neue Bad Smells entdeckt. Wir haben Algorithmen zur Identifizierung und zum Refaktorisieren für sie entwickelt. Wir stellen Implementierungen der Identifizierungsalgorithmen zur Verfügung und bewerten sie an- hand realer Software. Wir haben versucht, Bad Smells in bestehender Analysesoftware wie Camunda zu erkennen. Wir haben diese Bad Smells in den vorhandenen Analysen gefunden.ells in den vorhandenen Analysen gefunden.)
Automated GUI Testing of Web Applications with Large Language Models + (In der vorgestellten Arbeit wird das Poten … In der vorgestellten Arbeit wird das Potential von Large Language Models (LLMs) für die Automatisierung von GUI-Tests in Webanwendungen untersucht, eine Methode, die gegenüber dem traditionellen Ansatz des Monkey-Testing einige Vorteile bietet. Vier leistungsfähige LLMs, nämlich WizardLM, Vicuna (beide basierend auf LLAMA), GPT-3.5-Turbo und GPT-4-Turbo, werden hinsichtlich ihrer Fähigkeit, umfangreiche und relevante Teile des Codes durch Interaktion mit der Benutzeroberfläche auszuführen, evaluiert. Die Evaluation umfasst Tests an einer einfachen, für diese Studie entwickelten Proof-of-Concept-Anwendung sowie an PHPLiteAdmin, einem komplexeren Open-Source-Datenbank-Management-Tool.Die Ergebnisse zeigen, dass insbesondere die GPT-basierten Modelle in bestimmten Szenarien eine höhere Effizienz als der traditionelle Monkey-Tester aufweisen, vor allem bei der Generierung von sinnvollen Texteingaben. Dies unterstreicht das Innovationspotential von LLMs im Bereich der Software-Tests, zeigt aber auch die Herausforderungen und Grenzen auf, die bei der Anwendung auf komplexere Systeme zu erwarten sind. Diese Arbeit leistet somit einen wichtigen Beitrag zur Diskussion über die Weiterentwicklung und Optimierung automatisierter Testverfahren in der Softwareentwicklung. Testverfahren in der Softwareentwicklung.)
Eine Sprache für die Spezifikation disziplinübergreifender Änderungsausbreitungsregeln + (In der Änderungsausbreitungsanalyse wird u … In der Änderungsausbreitungsanalyse wird untersucht, wie sich Änderungen in Systemen ausbreiten. Dazu werden unter anderem Algorithmen entwickelt, die identifizieren, welche Elemente in einem System von einer Änderung betroffen sind. Für die Anpassung bestehender Algorithmen existiert keine spezielle Sprache, weshalb Domänenexperten universelle Programmiersprachen, wie Java, verwenden müssen, um Änderungsausbreitungen zu formulieren. Durch den imperativen Charakter von Java, benötigen Domänenexperten mehr Code und mehr Wissen über Implementierungsdetails, als sie mit einer, auf die Änderungs- ausbreitungsanalyse zugeschnittenen, Sprache bräuchten. Eine Sprache sollte stets an den Algorithmus der jeweiligen Änderungsausbreitungsanalyse angepasst sein. Für den in dieser Arbeit betrachteten Ansatz zur Änderungsausbreitungsanalyse mit der Bezeichnung Karlsruhe Architectural Maintainability Prediction (KAMP), besteht noch keine spezielle Sprache. KAMP ist ein Ansatz zur Bewertung architekturbasierter Änderungsanfragen, der in einem gleichnamigen Softwarewerkzeug implementiert ist. Diese Arbeit präsentiert mit der Change Propagation Rule Language (CPRL) eine spezielle Sprache für den, in KAMP verwendeten, Algorithmus der Änderungsausbreitungsanalyse. Zum Abschluss wird der Vorteil der entwickelten Sprache, gegenüber drei konkurrierenden Sprachen, ermittelt. Die Arbeit kommt zum Schluss, dass CPRL kompakter als konkurrierende Sprachen ist und es gleichzeitig erlaubt, die Mehrheit an denkbaren Änderungsausbreitungen zu beschreiben.ren Änderungsausbreitungen zu beschreiben.)
Untersuchung der Auswirkungen von Messdatenverschleierung auf Disaggregations-Qualität + (In diesem Vortrag geht es um den Schutz de … In diesem Vortrag geht es um den Schutz der Privatsphäre im Kontext von Smart Meter Daten. Im Rahmen einer Bachelorthesis werden Ansätze zur Verschleierung von Smart Meter Daten mittels bekannten Algorithmen zur Disaggregation evaluiert. Disaggregation bezeichnet dabei das extrahieren von Geräteverwendungen aus aggregierten Smart Meter Daten.dungen aus aggregierten Smart Meter Daten.)
Statische Extraktion von Laufzeit-Indikatoren + (In dieser Arbeit geht es um die Analyse vo … In dieser Arbeit geht es um die Analyse von LLVM-Quellcode mit dem Ziel, einen Indikator für die Anzahl der CPU-Instruktionen zu finden. Ein Indikator ist ein geschlossener Term, der für eine bestimmte Eingabe die Anzahl der CPU-Instruktionen eines Stück Codes liefert. Diese Definition korreliert mit der Eingabegröße eines Programmes. Wir analysieren den Kontrollflussgraph und Schleifenbedingungen, um Variablen im Code zu finden, die stellvertretend für die Eingabegröße stehen. Diese Indikator-Ermittlung ist ein Fundament für bessere Online-Autotuner in der Zukunft, die sich automatisch auf Eingaben wechselnder Größen einstellen können.aben wechselnder Größen einstellen können.)
Platzierung von Versteckten Ausreißern in Nutzerdaten + (In dieser Arbeit werden Methoden entwickel … In dieser Arbeit werden Methoden entwickelt um versteckte Ausreißer in Datensätzen zu platzieren. Versteckte Ausreißer sind dabei abweichende Datenpunkte die im Gesamtraum als abweichend erkannte werden können, aber in gewissen Teilräumen als normale Datenpunkte erscheinen. Zusätzlich werden benutzerdefinierte Einschränkungen entwickelt, die es einem Benutzer erlauben, den Bereich in dem versteckte Ausreißer platziert werden sollen, einzuschränken. Die Verfahren werden in unterschiedlichen Szenarien mit realen und synthetischen Daten evaluiert. realen und synthetischen Daten evaluiert.)
Modellgetriebene Konsistenzerhaltung von Automationssystemen + (In dieser Arbeit werden Verfahren entwicke … In dieser Arbeit werden Verfahren entwickelt, um die den Datenaustausch in Fabrikanlagen durch die Anwendung von modell- und änderungsgetriebener Konsistenzerhaltung, wie sie für die Softwaretechnik entwickelt wurde, zu unterstützen. In der Arbeit fokussieren wir uns dabei besonders auf die Eingabe einer fehlerhaften (nicht auflösbaren) Referenz. Dafür kategorisieren wir die Eigenschaften der Referenzen und des Typs des jeweiligen Fehlers und entwickeln basierend darauf ein Regelwerk. Zum anderen werden in CAEX Prototypen genutzt, um Objekte zu instantiieren. Dabei hängt es von den individuellen Eigenschaften ab, ob die Prototypen und Klone im Anschluss daran konsistent gehalten werden sollen. Hierfür entwickeln wir wiederum Kategorien für die jeweiligen Eigenschaften, und aufbauend darauf ein Regelwerk. Beispielsweise sollte bei einem Prototypen für einen Roboter eine Änderung an seiner Hardware nicht auf Klone übertragen werden, die bereits in Fabriken eingesetzt werden. Diesen Ansatz implementierten wir mithilfe des VITRUVIUS-Frameworks, das ein Framework zur modell- und änderungsgetriebenen Konsistenzerhaltung darstellt. Anhand dessen konnten wir die Funktionalität unserer Implementierung zeigen. Durch ein Beispielmodell konnten wir zeigen, dass unsere Kategorisierungen von Referenzen, Fehlertypen, Eigenschaften und Klonen in der Fabrikanlagenplanung anwendbar sind.n der Fabrikanlagenplanung anwendbar sind.)
Analyse von KI-Ansätzen für das Trainieren virtueller Roboter mit Gedächtnis + (In dieser Arbeit werden mehrere rekurrente … In dieser Arbeit werden mehrere rekurrente neuronale Netze verglichen.Es werden LSTMs, GRUs, CTRNNs und Elman Netze untersucht. Die Netze werden dabei untersucht sich einen Punkt zu merken und anschließend nach dem Punkt mit einem virtuellen Roboterarm zu greifen.Bei LSTM, GRU und Elman Netzen wird auch untersucht wie die Netze die Aufgabe lösen, wenn jedes Neuron nur auf den eigenen Speicher zugreifen kann.Dabei hat sich herausgestellt, dass LSTMs und GRUs deutlich besser bei den Experimenten bewertet werden als CTRNNs und Elman Netze.Außerdem werden die Rechenzeit und der Zusammenhang zwischen der Anzahl der zu trainierenden Parameter und der Ergebnisse der Experimente verglichen.der Ergebnisse der Experimente verglichen.)
Pattern-Based Heterogeneous Parallelization + (In dieser Arbeit werden zwei neue Arten de … In dieser Arbeit werden zwei neue Arten der Codegenerierung durch den automatisch parallelisierenden Übersetzer Aphes für beschleunigte Ausführung vorgestellt. Diese basieren auf zwei zusätzlich erkannten Mustern von implizitem Parallelismus in Eingabeprogrammen, nämlich Reduktionen in Schleifen und rekursive Funktionen die das Teile-und-herrsche-Muster umsetzen. Aphes hebt sich in zwei Punkten von herkömmlichen parallelisierenden Übersetzern hervor, die über das reine Parallelisieren hinausgehen: Der erste Punkt ist, dass Aphes sich auf heterogene Systeme spezialisiert. Das zweite Hervorstellungsmerkmal ist der Einsatz von Online-Autotuning. Beide Aspekte wurden während der Umsetzung dieser Arbeit beachtet. Aus diesem Grund setzen die von uns implementierten Code-Generatoren sowohl lokale Beschleunigung über OpenMP und C++11 Threads als auch entfernte Beschleunigung mittels Nvidias CUDA um. Desweiteren setzt der generierte Code weiter auf die bereits in Aphes vorhandene Infrastruktur zum Autotuning des generierten Maschinencodes zur Laufzeit.Während unserer Tests ließen sich in mit Aphes kompilierten Programmen mit Reduktionen in Schleifen Beschleunigungen von bis zu Faktor 50 gegenüber mit Clang kompilierten Programmen beobachten. Von Aphes transformierter Code mit rekursiven Funktionen erzielte Beschleunigungswerte von 3,15 gegenüber herkömmlich mit GCC und Clang generierten ausführbaren Dateien des gleichen Programms. In allen Fällen war der Autotuner in der Lage, innerhalb der ersten 50 Ausführungsiterationen des zu optimierenden Kernels zu konvergieren. Allerdings wiesen die konvergierten Ausführungszeiten teils erheblicheUnterschiede zwischen den Testläufen auf. Unterschiede zwischen den Testläufen auf.)
Dynamisches Autotuning mehrerer nominaler Parameter + (In dieser Arbeit wird dieses Problem unter … In dieser Arbeit wird dieses Problem unter Zuhilfenahme des Wissens über kausale Abhängigkeiten verschiedener Tuningaufgaben vereinfacht. Da sich die Fragen nach einigen Parameterwerten oft nur dann stellen, wenn andere Parameter gewisse Werte einnehmen, ist es unsinnig, erstere in jedem Fall in den Optimierungsprozess einzubeziehen. Insbesondere erlaubt das entwickelte Verfahren das verlustfreie, simultane Autotuning voneinander abhängiger nominaler und Verhältnisparameter, ohne auf möglicherweise wertvolle Informationen über deren gegenseitige Einflussnahme aufeinander zu verzichten.e Einflussnahme aufeinander zu verzichten.)
Schematisierung von Entwurfsentscheidungen in natürlichsprachiger Softwarearchitekturdokumentation + (In dieser Arbeit wird ein Schema entwickel … In dieser Arbeit wird ein Schema entwickelt, um Architekturentscheidungen aus Softwarearchitekturdokumentationen einzuordnen. Somit solldas Einordnen und Wiederverwenden von Entscheidungen in Softwarearchitekturdokumentation erleichtert werden.In meinem Ansatz wird ein Schema zur Einordnung entwickelt, das sich an aktuelle Literatur anlehnt und drei grundsätzliche Arten von Entscheidungen unterscheidet: Existenzentscheidungen, Eigenschaftenentscheidungen und Umgebungsentscheidungen.Zur Evaluation wurden Open-Source-Softwareprojekte mit natürlichsprachiger Softwarearchitekturdokumentationen betrachtet und iterativ überprüft, wo das aktuelle Schema verbessert werden kann. Zum Schluss wird vorgestellt, welche der Entscheidungsklassen sich im Palladio Component Model abbilden lassen. Palladio Component Model abbilden lassen.)
Entwicklungsmethoden für Produktfamilien + (In dieser Masterarbeit werden Methodiken e … In dieser Masterarbeit werden Methodiken erarbeitet, welche die Entwicklung von Produktlinien in der Modellbasierten Systementwicklung (MBSE) unterstützen sollen.Für die Verhaltensbeschreibung von Systemen werden unter anderem Aktivitätsdiagramme verwendet, die keine expliziten Konstrukte zur Modellierung von Variabilität anbieten. Deshalb wird in dieser Arbeit ein Ansatz zur Modellierung von Variabilität in Aktivitätsdiagrammen vorgestellt, der Metamodell-unabhängig ist und somit nicht nur für Aktivitätsdiagramme verwendet werden kann. Dieser Ansatz wird mit gängigen Ansätzen der Variabilitätsmodellierung verglichen und es wird unter anderem untersucht, inwieweit dieser Ansatz die Elementredundanz im Vergleich zu den anderen Ansätzen verringert. Anschließend wird erarbeitet, wie Aktivitätsdiagramme und gefärbte Petri-Netze untereinanderkonsistent gehalten werden können. Dazu werden deren Gemeinsamkeiten und Unterschiede herausgearbeitet, um Konsistenzhaltungsregeln zu definieren und die Grenzen der Konsistenzhaltung zu finden.Zum Abschluss wird skizziert, was notwendig ist, um die beiden Ansätze miteinander zu kombinieren, um eine Verhaltensbeschreibung einer Produktlinie aus Aktivitätsdiagrammen und gefärbten Petri-Netze zu erhalten, bei denen stets die Aktivitätsdiagramme und Petri-Netze der einzelnen Produktkonfigurationen konsistent zueinander sind.onfigurationen konsistent zueinander sind.)
Modeling Dynamic Systems using Slope Constraints: An Application Analysis of Gas Turbines + (In energy studies, researchers build model … In energy studies, researchers build models for dynamic systems to predict the produced electrical output precisely. Since experiments are expensive, the researchers rely on simulations of surrogate models. These models use differential equations that can provide decent results but are computationally expensive. Further, transition phases, which occur when an input change results in a delayed change in output, are modeled individually and therefore lacking generalizability.Current research includes Data Science approaches that need large amounts of data, which are costly when performing scientific experiments. Theory-Guided Data Science aims to combine Data Science approaches with domain knowledge to reduce the amount of data needed while predicting the output precisely.However, even state-of-the-art Theory-Guided Data Science approaches lack the possibility to model the slopes occuring in the transition phases. In this thesis we aim to close this gap by proposing a new loss constraint that represents both transition and stationary phases. Our method is compared with theoretical and Data Science approaches on synthetic and real world data.proaches on synthetic and real world data.)
Local Outlier Factor for Feature‐evolving Data Streams + (In high-volume data streams it is often un … In high-volume data streams it is often unpractical to monitor all observations -- often we are only interested in deviations from the normal operation. Detecting outlying observations in data streams is an active area of research. However, most approaches assume that the data's dimensionality, i.e., the number of attributes, stays constant over time. This assumption is unjustified in many real-world use cases, such as sensor networks or computer cluster monitoring.Feature-evolving data streams do not impose this restriction and thereby pose additional challenges.In this thesis, we extend the well-known Local Outlier Factor (LOF) algorithm for outlier detection from the static case to the feature-evolving setting. Our algorithm combines subspace projection techniques with an appropriate index structure using only bounded computational resources. By discarding old observations our approach also deals with concept drift. We evaluate our approach against the respective state-of-the-art methods in the static case, the streaming case, and the feature-evolving case.aming case, and the feature-evolving case.)
Architectural Generation of Context-based Attack Paths + (In industrial processes (Industry 4.0) and … In industrial processes (Industry 4.0) and other fields in our lives like the energy or health sector, the confidentiality of data becomes increasingly important. For the protection of confidential information on critical systems, it is crucial to be able to find relevant attack paths in different access-control contexts to a critical element. In order to minimize costs, it is important to already consider this issue in the design phase of the software architecture. There are already approaches considering the topic of attack path generation. However, they do not consider software architecture modeling or they do not consider both vulnerabilities and access control mechanisms. Hence, this thesis presents an approach for finding all potential attack paths in a software architecture model considering access control and vulnerabilities. However, all attack paths are often to many, so the approach presented here introduces and utilizes meaningful filter criteria based on wide-spread vulnerability classification standards.ad vulnerability classification standards.)
Fallstudie zur Privatsphäre in Connected-Car Systemen + (In jedem Software-System, in dem Nutzerdat … In jedem Software-System, in dem Nutzerdaten anfallen, muss deren Verarbeitung strengen Auflagen unterliegen. Das bislang strengste und am weitesten verbreitete dieser Gesetze ist die Europäische Datenschutz-Grundverordnung. Um unter dieser Verordnung Daten legal zu verarbeiten, ist es für Software-Entwickler sehr günstig, diese so früh wie möglich im Entwicklungsprozess zu berücksichtigen.Eine Möglichkeit, um datenschutzrechtliche Verstöße zur Designzeit festzustellen, ist die Datenflussanalyse. Dabei werden dem konventionellen Software-Modell noch Eigenschaften hinzugefügt, ebenso wie den modellierten Daten. Aus dem Aufruf-Graphen kann dann ein Datenflussdiagramm erstellt werden, welches anzeigt, welche Daten von welchen Komponenten wohin fließen. Diese Arbeit beschreibt eine Fallstudie, in welcher die Datenflussanalyse in einem konkreten System untersucht wird. Zunächst werden Anforderungen aufgestellt, welche eine Fallstudie der Bereiche Mobilität und Datenschutz erfüllen muss. Der wissenschaftliche Beitrag dieser Arbeit liegt dann in diesen Anforderungen sowie der testweisen Durchführung der Fallstudie. Dabei wird ein fiktives Ride-Pooling Unternehmen modelliert. Das Modell wird mithilfe der Datenflussanalyse untersucht, und aus den Ergebnissen werden Schlüsse über die Analysegezogen. werden Schlüsse über die Analyse gezogen.)
Predictability of Classiﬁcation Performance Measures with Meta-Learning + (In machine learning, classification is the … In machine learning, classification is the problem of identifying to which of a set of categories a new instance belongs. Usually, we cannot tell how the model performs until it is trained. Meta-learning, which learns about the learning algorithms themselves, can predict the performance of a model without training it based on meta-features of datasets and performance measures of previous runs. Though there is a rich variety of meta-features and performance measures on meta-learning, existing works usually focus on which meta-features are likely to correlate with model performance using one particular measure. The effect of different types of performance measures remain unclear as it is hard to draw a comparison between results of existing works, which are based on different meta-data sets as well as meta-models. The goal of this thesis is to study if certain types of performance measures can be predicted better than other ones and how much does the choice of the meta-model matter, by constructing different meta-regression models on same meta-features and different performance measures. We will use an experimental approach to evaluate our study.perimental approach to evaluate our study.)
Benchmarking Tabular Data Synthesis Pipelines for Mixed Data + (In machine learning, simpler, interpretabl … In machine learning, simpler, interpretable models require significantly more training data than complex, opaque models to achieve reliable results. This is a problem when gathering data is a challenging, expensive or time-consuming task. Data synthesis is a useful approach for mitigating these problems.An essential aspect of tabular data is its heterogeneous structure, as it often comes in ``mixed data´´, i.e., it contains both categorical and numerical attributes. Most machine learning methods require the data to be purely numerical. The usual way to deal with this is a categorical encoding.In this thesis, we evaluate a proposed tabular data synthesis pipeline consisting of a categorical encoding, followed by data synthesis and an optional relabeling of the synthetic data by a complex model. This synthetic data is then used to train a simple model. The performance of the simple model is used to quantify the quality of the generated data. We surveyed the current state of research in categorical encoding and tabular data synthesis and performed an extensive benchmark on a motivated selection of encoders and generators.ated selection of encoders and generators.)
Bad Smells and Antipatterns in Metamodeling + (In modern software development, metamodels … In modern software development, metamodels play an important role as they build the basis for domain-specific modeling languages, which are used for system design, simulation and code generation. Like any artifact in a software-development process, these languages and their respective models need to evolve over time. However, if metamodels that define those languages are badly designed, the evolution process is complicated and therefore additional effort has to be spent for maintenance. Such design problems are considered as a bad smell. Existing approaches to detect smells in metamodels deal mainly with simple defects or focus only on a small number of smells. Therefore, we present a comprehensive investigation of bad smells and antipatterns by reviewing design smells of object-oriented programming and, if possible, transfer them to metamodeling. These smells are in part automatically detectable, thus, we provide tool support with suitable detection methods as an extension for EMF Refactor. We evaluate this approach by testing every automatically detectable smell with appropriate models and an application of the tool support on an already existing large metamodel to evaluate the suggested refactorings.el to evaluate the suggested refactorings.)
Semi-automatic Consistency Preservation of Models + (In order to manage the high complexity of … In order to manage the high complexity of developing software systems, oftentimes several models are employed describing different aspects of the system under development. Models often contain redundant or dependent information, meaning changes to one model without adjustments to others representing the same concepts lead to inconsistencies, which need to be repaired automatically. Otherwise, developers would have to know all dependencies to preserve consistency by hand.For automated consistency preservation, model transformations can be used to specify how elements from one model correspond to those of another and define consistency preservation operations to fix inconsistencies. In this specification, it is not always possible to determine one generally correct way of preserving consistency without insight into the intentions of the developer responsible for making the changes. To be able to factor in underlying intentions, user interactions used to clarify the course of consistency preservation in ambiguous cases are needed. Existing approaches either do not consider user interactions during consistency preservation or provide an unstructured set of interaction options. In this thesis, we therefore identify a structured classification of user interaction types to employ during consistency preservation. By applying those types in preexisting case studies for consistency preservation between models in different application domains, we were able to show the applicability of these types in terms of completeness and appropriateness.Furthermore, software projects are rarely developed by a single person, meaning that multiple developers may work on the same models in different development branches and combine their work at some point using a merge operation. One reasonable option to merge different development branches of models is to track model changes and merge the change sequences by applying one after another. Since the model state changed due to changes made in the one branch, the changes in the other branch can potentially lead to different user decisions being necessary for consistency preservation. Nevertheless, most necessary decisions will be the same, which is why it would be useful to reuse the previously applied choices if possible. To achieve this, we provide a concept for storing and reapplying decisions during consistency preservation in this thesis. Thus, we establish which information is necessary and reasonable to represent a user interaction and allow for its correct reuse. By applying the reuse mechanism to a change scenario with several user interactions in one of the case studies mentioned above, we were able to show the feasibility of our overall concept for correctly reusing changes.all concept for correctly reusing changes.)
Review of dependency estimation with focus on data efficiency + (In our data-driven world, large amounts of … In our data-driven world, large amounts of data are collected in all kinds of environments. That is why data analysis rises in importance. How different variables influence each other is a significant part of knowledge discovery and allows strategic decisions based on this knowledge. Therefore, high-quality dependency estimation should be accessible to a variety of people. Many dependency estimation algorithms are difficult to use in a real-world setting. In addition, most of these dependency estimation algorithms need large data sets to return a good estimation. In practice, gathering this amount of data may be costly, especially when the data is collected in experiments with high costs for materials or infrastructure. I will do a comparison of different state-of-the-art dependency estimation algorithms. A list of 14 different criteria I but together, will be used to determine how promising the algorithm is. This study focuses especially on data efficiency and uncertainty of the dependency estimation algorithms. An algorithm with a high data efficiency can give a good estimation with a small amount of data. A degree of uncertainty helps to interpret the result of the estimator. This allows better decision-making in practice. The comparison includes a theoretical analysis and conducting different experiments with dependency estimation algorithms that performed well in the theoretical analysis.erformed well in the theoretical analysis.)
Relevance-Driven Feature Engineering + (In predictive maintenance scenarios, failu … In predictive maintenance scenarios, failure classification is challenging because large high-dimensional data volumes are being generated continuously in modern factories. Currently complex error analysis occurs manually based on recorded data in our industry use-case. The resulting misclassification leads to longer rework times. Our goal is to perform automated failure detection. In particular, this thesis builds a classification model to detect faulty engines in the vehicle manufacturing process. The work’s first part focuses on the binary anomaly detection classification problem and aims to predict an engine’s deficiency status. Here, we manage to recognize more than 90% of the faulty engines. In the second part, we extend our analysis to the multi-class classification problem with high-unbalanced classes. Here, our objective is to forecast the exact type of failure. To some extent, this situation shows similarities with the microarray analysis – we observe high-dimensional data with few instances available. This thesis develops a relevance-driven feature engineering meta-algorithm framework. We study the integration of feature relevance evaluation in the construction process of new features. We also use ensemble feature selection algorithms and define our own criteria to determine the relevance of feature subsets. These criteria are integrated in the feature engineering process in order to accelerate it by ignoring parts of the search space without significantly degrading the data quality. significantly degrading the data quality.)
Instrumentation with Runtime Monitors for Extraction of Performance Models during Software Evolution + (In recent times, companies are increasingl … In recent times, companies are increasingly looking to migrate their legacy software system to a microservice architecture. This large-scale refactor is often motivated by concerns over high levels of interdependency, developer productivity problems and unknown boundaries for functionality. However, modernizing legacy software systems has proven to be a difficult and complex process to execute properly. This thesis intends to provide a mean of decision support for this migration process in the form of an accurate and meaningful performance monitoring instrumentation and a performance model of said system. It specifically presents an instrumentation concept that incurs minimal performance overhead and is generally compatible with legacy systems implemented using object-oriented programming paradigms. In addition, the concept illustrates the extraction of performance model specifics with the monitoring data. This concept was developed on an enterprise legacy system provided by Capgemini. This concept was then implemented on this system. A subsequent case study was conducted to evaluate the quality of the concept.ed to evaluate the quality of the concept.)
Traceability Link Recovery for Relations in Natural Language Software Architecture Documentation and Software Architecture Models + (In software development, software architec … In software development, software architecture plays a vital role in developing and maintaining software systems. It is communicated through artifacts such as software architecture documentation (SAD) and software architecture models (SAM). However, maintaining consistency and traceability between these artifacts can be challenging. If there are inconsistencies or missing links, it can lead to errors, misunderstandings, and increased maintenance costs. This thesis proposes an approach for recovering traceability links of software architecture relations between natural language SAD and SAM. The approach involves the use of Pre-trained Language Models (PLMs) such as BERT and ChatGPT and supports different extraction modes and prompt engineering techniques for ChatGPT, as well as different model variants and training strategies for BERT. The proposed approach is integrated with ArDoCo, a tool that detects inconsistencies and recovers trace links between software artifacts. ArDoCo is used for pre-processing the SAD text and parsing the SAM, thus facilitating the traceability link recovery process. In order to assess the performance of the framework, a gold standard of SAD and SAM created from open-source projects is utilized. The evaluation shows that the ChatGPT approach has promising results in relation extraction with a recall of 0.81 and in traceability link recovery with an F1-score of 0.83, while BERT-based models struggle due to the lack of domain-specific training data.the lack of domain-specific training data.)
Coreference Resolution for Software Architecture Documentation + (In software engineering, software architec … In software engineering, software architecture documentation plays an important role. It contains many essential information regarding reasoning and design decisions. Therefore, many activities are proposed to deal with documentation for various reasons, e.g., extract- ing information or keeping different forms of documentation consistent. These activities often involve automatic processing of documentation, for example traceability link recovery (TLR). However, there can be problems for automatic processing when coreferences are present in documentation. A coreference occurs when two or more mentions refer to the same entity. These mentions can be different and create ambiguities, for example when there are pronouns. To overcome this problem, this thesis proposes two contributions to resolve coreferences in software architecture documentation.The first contribution is to explore the performance of existing coreference resolution models for software architecture documentation. The second is to divide coreference resolution into many more specific type of resolutions, like pronoun resolution, abbreviation resolution, etc. resolution, abbreviation resolution, etc.)
Automatic Context-Based Policy Generation from Usage- and Misusage-Diagrams + (In systems with a very dynamic process lik … In systems with a very dynamic process like Industry 4.0, contexts of allparticipating entities often change and a lot of data exchange happens withexternal organizations such as suppliers or producers which brings concernabout unauthorized data access. This creates the need for access controlsystems to be able to handle such a combination of a highly dynamic system andthe arising concern about the security of data. In many situations thedecision for access control depends on the context information of therequester. Another problem of dynamic system is that the manual developmentof access policies can be time consuming and expensive. Approaches usingautomated policy generation have shown to reduce this effort.In this master thesis we introduce a concept which combines context basedmodel-driven security with automated policy generation and evaluate if itis a suitable option for the creation of access control systems and if itcan reduce the effort in policy generation. The approach makes use of usageand misusage diagrams which are on a high architectural abstraction levelto derive and combine access policies for data elements which are locatedon a lower abstraction level. are located on a lower abstraction level.)
Encryption-aware SQL query log rewriting for LIKE predicates + (In the area of workflow analysis, the work … In the area of workflow analysis, the workflow in respect to e.g. a working process canbe analyzed by looking into the data which was used for the working process or createdduring the working process. The main contribution of this work is to extend CoVER in such a way that it supports LIKE predicates with order preserving encryption.edicates with order preserving encryption.)
Design Space Evaluation for Confidentiality under Architectural Uncertainty + (In the early stages of developing a softwa … In the early stages of developing a software architecture, many properties of the final system are yet unknown, or difficult to determine. There may be multiple viable architectures, but uncertainty about which architecture performs the best. Software architects can use Design Space Exploration to evaluate quality properties of architecture candidates to find the optimal solution.Design Space Exploration can be a resource intensive process. An architecture candidate may feature certain properties which disqualify it from consideration as an optimal candidate, regardless of its quality metrics. An example for this would be confidentiality violations in data flows introduced by certain components or combinations of components in the architecture. If these properties can be identified early, quality evaluation can be skipped and the candidate discarded, saving resources.Currently, analyses for identifying such properties are performed disjunct from the design space exploration process. Optimal candidates are determined first, and analyses are then applied to singular architecture candidates. Our approach augments the PerOpteryx design space exploration pipeline with an additional architecture candidate filter stage, which allows existing generic candidate analyses to be integrated into the DSE process. This enables automatic execution of analyses on architecture candidates during DSE, and early discarding of unwanted candidates before quality evaluation takes place.We use our filter stage to perform data flow confidentiality analyses on architecture candidates, and further provide a set of example analyses that can be used with the filter. We evaluate our approach by running PerOpteryx on case studies with our filter enabled. Our results indicate that the filter stage works as expected, able to analyze architecture candidates and skip quality evaluation for unwanted candidates.uality evaluation for unwanted candidates.)
Token-Based Plagiarism Detection for Statecharts + (In the field of software engineering, exis … In the field of software engineering, existing plagiarism detection systems have primarily focused on detecting cases of plagiarism in code. However, other artefacts such as models also play a crucial role in the development process. Statecharts, in particular, are used to model the behavior of a system. This thesis investigates the applicability and challenges of applying token-based plagiarism detection systems to statecharts. We extend the plagiarism detector JPlag to support detecting cases of plagiarism in statecharts. Our approach is evaluated using a dataset of student assignments from a modeling course, where we generate plagiarized statecharts by adopting common obfuscation attacks. We study the effects of the token-extraction strategy, sorting techniques and the minimum token match parameter. The results suggest that an approach tailored to the specific kind of model, such as statecharts, works better than a generic solution for models.better than a generic solution for models.)
Developing a Framework for Mining Temporal Data from Twitter as Basis for Time-Series Correlation Analysis + (In the last decade, ample research has bee … In the last decade, ample research has been produced regarding the value of user-generated data from microblogs as a basis for time series analysis in various fields.In this context, the objective of this thesis is to develop a domain-agnostic framework for mining microblog data (i.e., Twitter). Taking the subject related postings of a time series (e.g., inflation) as its input, the framework will generate temporal data sets that can serve as basis for time series analysis of the given target time series (e.g., inflation rate).To accomplish this, we will analyze and summarize the prevalent research related to microblog data-based forecasting and analysis, with a focus on the data processing and mining approach. Based on the findings, one or several candidate frameworks are developed and evaluated by testing the correlation of their generated data sets against the target time series they are generated for.While summative research on microblog data-based correlation analysis exists, it is mainly focused on summarizing the state of the field. This thesis adds to the body of research by applying summarized findings and generating experimental evidence regarding the generalizability of microblog data mining approaches and their effectiveness.mining approaches and their effectiveness.)
Evaluation architekturbasierter Performance-Vorhersage im Kontext automatisierter Fahrzeuge + (In the past decades, there has been an inc … In the past decades, there has been an increased interest in the development of automated vehicles. Automated vehicles are vehicles that are able to drive without the need for constant interaction by a human driver. Instead they use multiple sensors to observe their environment and act accordingly to observed stimuli. In order to avoid accidents, the reaction to these stimuli needs to happen in a sufficiently short amount of time. To keep implementation overhead and cost low, it is highly beneficial to know the reaction time of a system as soon as possible. Thus, being able to assess their performance already at design time allows system architects to make informed decisions when comparing software components for the use in automated vehicles. In the presented thesis, I analysed the applicability of architecture-based performance prediction in the context of automated vehicles using the Palladio Approach. In particular, I focused on the prediction of design-time worst-case reaction time as the reaction ability of automated vehicles, which is a crucial metric when assessing their performance.l metric when assessing their performance.)
Meta-Learning for Encoder Selection + (In the process of machine learning, the da … In the process of machine learning, the data to be analyzed is often not only numerical but also categorical data. Therefore, encoders are developed to convert categorical data into the numerical world. However, different encoders may have other impacts on the performance of the machine learning process. To this end, this thesis is dedicated to understanding the best encoder selection using meta-learning approaches. Meta-learning, also known as learning how to learn, serves as the primary tool for this study. First, by using the concept of meta-learning, we find meta-features that represent the characteristics of these data sets. After that, an iterative machine learning process is performed to find the relationship between these meta-features and the best encoder selection. In the experiment, we analyzed 50 datasets, those collected from OpenML. We collected their meta-features and performance with different encoders. After that, the decision tree and random forest are chosen as the meta-models to perform meta-learning and find the relationship between meta-features and the performance of the encoder or the best encoder. The output of these steps will be a ruleset that describes the relationship in an interpretable way and can also be generalized to new datasets.d can also be generalized to new datasets.)
Meta-learning for Encoder Selection + (In the real world, mixed-type data is comm … In the real world, mixed-type data is commonly used, which means it contains both categorical and numerical data. However, most algorithms can only learn from numerical data. This makes the selection of encoder becoming very important. In this presentation, I will present an approach by using ideas from meta-learning to predict the performance from the meta-features and encoders.mance from the meta-features and encoders.)
Robust Subspace Search + (In this thesis, the idea of finding robust … In this thesis, the idea of finding robust subspaces with help of an iterative process is being discussed. The process firstly aims for subspaces where hiding outliers is feasible. Subsequently, the subspaces used in the first part are being adjusted. In doing so, the convergence of this iterative process can reveal valuable insights in systems where the existence of hidden outliers poses a high risk (e.g. power station). The main part of this thesis will deal with the aspect of hiding outliers in high dimensional data spaces and the challenges resulting from such spaces.the challenges resulting from such spaces.)
Architectural Uncertainty Analysis for Access Control Scenarios in Industry 4.0 + (In this thesis, we present our approach to … In this thesis, we present our approach to handle uncertainty in access control during design time. We propose the concept of trust as a composition of environmental factors that impact the validity of and consequently trust in access control properties. We use fuzzy inference systems as a way of defining how environmental factors are combined. These trust values are than used by an analysis process to identify issues which can result from a lack of trust.We extend an existing data flow diagram approach with our concept of trust. Our approach of adding knowledge to a software architecture model and providing a way to analyze model instances for access control violations shall enable software architects to increase the quality of models and further verify access control requirements under uncertainty. We evaluate the applicability based on the availability, the accuracy and the scalability regarding the execution time. scalability regarding the execution time.)
Surrogate models for crystal plasticity - predicting stress, strain and dislocation density over time (Defense) + (In this work, we build surrogate models to … In this work, we build surrogate models to approximate the deformation behavior of face-centered cubic crystalline structures under load, based on the continuum dislocation dynamics (CDD) simulation. The CDD simulation is a powerful tool for modeling the stress, strain, and evolution of dislocations in a material, but it is computationally expensive. Surrogate models provide approximations of the results at a much lower computational cost. We propose two approaches to building surrogate models that only require the simulation parameters as inputs and predict the sequences of stress, strain, and dislocation density. The approaches comprise the use of time-independent multi-target regression and recurrent neural networks. We demonstrate the effectiveness by providing an extensive study of different implementations of both approaches. We find that, based on our dataset, a gradient-boosted trees model making time-independent predictions performs best in general and provides insights into feature importance. The approach significantly reduces the computational cost while still producing accurate results.st while still producing accurate results.)
Approximating an Ngram Corpus with Probabilistic Methods + (In this work, we consider ngram corpora, i … In this work, we consider ngram corpora, i.e., a set of word chains of different lengths and its usage frequency in natural language. For example, the 3-gram "bag of words" may be used 200 times. Obviously, there exists a dependence between the usage frequency of (1) the unigrams "bag", "of", and "words", (2) the bigrams "bag of" and "of words", and (3) the trigram "bag of words". This connection is partially used in language models to implement grammar correction or speech recognition. From a database point of view, the ngram corpus contains either redundant information or information that can be well estimated. This is an indication that we can achieve a high reduction of the corpus size while still providing its information with high accuracy.In this work, we research the connection between n- and (n+1)-grams and vice versa. Our objective is to store only a part of the full ngram corpus and estimate the rest of the corpus.orpus and estimate the rest of the corpus.)
Architecture-based Uncertainty Impact Analysis for Confidentiality + (In times of highly interconnected systems, … In times of highly interconnected systems, confidentiality becomes a crucial security quality attribute. As fixing confidentiality breaches becomes costly the later they are found, software architects should address confidentiality early in the design time. During the architectural design process, software architects take Architectural Design Decisions (ADDs) to handle the degrees of freedom, i.e. uncertainty. However, ADDs are often subjected to assumptions and unknown or imprecise information. Assumptions may turn out to be wrong so they have to be revised which re-introduces uncertainty. Thus, the presence of uncertainty at design time prevents from drawing precise conclusions about the confidentiality of the system. It is, therefore, necessary to assess the impact of uncertainties at the architectural level before making a statement about confidentiality. To address this, we make the following contributions: First, we propose a novel uncertainty categorization approach to assess the impact of uncertainties in software architectures. Based on that, we provide an uncertainty template that enables software architects to structurally derive types of uncertainties and their impact on architectural element types for a domain of interest. Second, we provide an Uncertainty Impact Analysis (UIA) that enables software architects to specify which architectural elements are directly affected by uncertainties. Based on structural propagation rules, the tool automatically derives further architectural elements which are potentially affected. Using the large-scale open-source contract tracing application called Corona Warn App (CWA) as a case study, we show that the UIA achieves 100% recall while maintaining 44%-91% precision when analyzing the impact of uncertainties on architectural elements.f uncertainties on architectural elements.)
Domain-specific Language for Data-driven Design Time Analyses and Result Mappings for Logic Programs + (In today's connected world, exchanging dat … In today's connected world, exchanging data is essential to many business applications. In order to cope with security requirements early, design time data flow analyses have been proposed. These approaches transform the modeled architecture into underlying formalisms such as logic programs. Constraints that check requirements often have to be formulated in terms of the underlying formalism. This requires architects to know about the formalism, the transformed architecture and the verification environment. We aim to bridge this gap between the architectural domain and the underlying formalism. We propose a domain-specific language (DSL) which enables architects to define individual constraints in terms of the architecture. Our approach maps the constraints and results between the architectural and the formalism automatically. Our evaluation indicates good overall expressiveness, usability and space efficiency for different sized data flow restrictions.or different sized data flow restrictions.)
Evaluating Subspace Search Methods with Hidden Outlier + (In today’s world, most datasets do not hav … In today’s world, most datasets do not have only a small number of attributes. The highnumber of attributes, which are referred to as dimensions, hinder the search of objectsthat normally not occur. For instance, consider a money transaction that has been notlegally carried out. Such objects are called outlier. A common method to detect outliersin high dimensional datasets are based on the search in subspaces of the dataset. Thesesubspaces have the characteristics to reveal possible outliers. The most common evaluation of algorithms searching for subspaces is based on benchmark datasets. However, thebenchmark datasets are often not suitable for the evaluation of these subspace search algorithms. In this context, we present a method that evaluates subspace search algorithmswithout relying on benchmark datasets by hiding outliers in the result set of a subspacesearch algorithm.result set of a subspace search algorithm.)
Verfeinerung von Zugriffskontrollrichtlinien unter Berücksichtigung von Ungewissheit in der Entwurfszeit + (In unserer vernetzten und digitalisierten … In unserer vernetzten und digitalisierten Welt findet ein zunehmender Austausch von Daten statt. Um die persönlichen Daten von Nutzern zu schützen, werden rechtliche Vorgaben in Form von obligatorischen Richtlinien für den Datenaustausch beschlossen. Diese sind in natürlicher Sprache verfasst und werden oft erst zu späten Entwurfs-Phasen der Softwareentwicklung berücksichtigt. Der fehlende Einbezug von Richtlinien, schon während der Entwurfs-Phase, kann zu unberücksichtigten Lücken der Vertraulichkeit führen. Diese müssen dann oft unter höheren Aufwänden in späteren Anpassungen behoben werden. Eine Verfeinerung der Richtlinien, die bereits zur Entwurfszeit von Software ansetzt, kann einem Softwarearchitekten frühzeitig Hinweise auf kritische Eigenschaften oder Verletzungen der Software liefern und hilft diese zu vermeiden. Das Ziel dieser Arbeit ist es, einen Verfeinerungsansatz trotz Ungewissheiten durch mangelnde Informationen zu entwickeln. Die Erkennung und Einordnung von Ungewissheiten erfolgt basierend auf einer Taxonomie von Ungewissheit. Der Verfeinerungsprozess analysiert verschiedene Abstraktionsebenen einer Softwarearchitektur, angefangen bei der Systemebene, über einzelne Komponenten hin zu Aufrufen von Diensten und deren Schnittstellen. Mögliche Verletzungen der eingegebenen Richtlinien werden durch die Erstellung eines Zugriffskontrollgraphen, der Dekomposition des Graphen und der Identifikation einzelner Serviceaufrufe festgestellt. Die identifizierten, kritischen Elemente der Softwarearchitektur werden ausgegeben.der Softwarearchitektur werden ausgegeben.)
Derivation of Change Sequences from State-Based File Differences for Delta-Based Model Consistency + (In view-based software development, views … In view-based software development, views may share concepts and thus contain redundant or dependent information. Keeping the individual views synchronized is a crucial property to avoid inconsistencies in the system. In approaches based on a Single Underlying Model (SUM), inconsistencies are avoided by establishing the SUM as a single source of truth from which views are projected. To synchronize updates from views to the SUM, delta-based consistency preservation is commonly applied. This requires the views to provide fine-grained change sequences which are used to incrementally update the SUM. However, the functionality of providing these change sequences is rarely found in real-world applications. Instead, only state-based differences are persisted. Therefore, it is desirable to also support views which provide state-based differences in delta-based consistency preservation. This can be achieved by estimating the fine-grained change sequences from the state-based differences.This thesis evaluates the quality of estimated change sequences in the context of model consistency preservation. To derive such sequences, matching elements across the compared models need to be identified and their differences need to be computed. We evaluate a sequence derivation strategy that matches elements based on their unique identifier and one that establishes a similarity metric between elements based on the elements’ features. As an evaluation baseline, different test suites are created. Each test consists of an initial and changed version of both a UML class diagram and consistent Java source code. Using the different strategies, we derive and propagate change sequences based on the state-based difference of the UML view and evaluate the outcome in both domains. The results show that the identity-based matching strategy is able to derive the correct change sequence in almost all (97 %) of the considered cases. For the similarity-based matching strategy we identify two reoccurring error patterns across different test suites. To address these patterns, we provide an extended similarity-based matching strategy that is able to reduce the occurrence frequency of the error patterns while introducing almost no performance overhead.ntroducing almost no performance overhead.)
Vergleich verschiedener Sprachmodelle für den Einsatz in automatisierter Rückverfolgbarkeitsanalyse + (Informationen über logische Verbindungen z … Informationen über logische Verbindungen zwischen Anforderungen und ihrer Umsetzung in Quelltext sind nützlich für viele Aufgabenstellungen der Softwareentwicklung. Sie können beispielsweise die Wartung von Software bei Anforderungs-Änderungen erleichtern. Diese Rückverfolgbarkeitsverbindungen können im Zuge einer Rückverfolgbarkeitsanalyse ermittelt werden. Verfahren, wie FTLR, führen eine automatisierte Rückverfolgbarkeitsanalyse durch. FTLR erkennt Rückverfolgbarkeitsverbindungen mithilfe eines Vergleichs von Repräsentationen von Anforderungen und Quelltext. Bislang setzt FTLR das Sprachmodell fastText zur Repräsentation von Anforderungen und Quelltext ein. Der Ansatz fastText besitzt jedoch Schwachstellen. Das Sprachmodell ist nicht in der Lage verschiedene Bedeutungen eines Wortes zu repräsentieren. Außerdem wurde es nicht auf Quelltext vortrainiert. In dieser Arbeit wurde untersucht, ob sich alternative Sprachmodelle ohne diese Schwachstellen besser zum Einsatz in FTLR eigenen als fastText. In einem Experiment auf fünf Vergleichsdatensätzen für die Rückverfolgbarkeitsanalyse wurden die Ergebnisse der beiden alternativen Sprachmodelle UniXcoder und Wikipedia2Vec mit fastText verglichen. Das Sprachmodell UniXcoder eignet sich auf den Vergleichsdatensätzen iTrust und LibEST besser als fastText. Das Sprachmodell Wikipedia2Vec eignet sich auf keinem der eingesetzten Vergleichsdatensätze besser als fastText. Im Durchschnitt über alle verwendeten Testdatensätze eignet sich fastText besser für den Einsatz in FTLR als UniXcoder und Wikipedia2Vec.z in FTLR als UniXcoder und Wikipedia2Vec.)
Injection Molding Simulation based on Graph Neural Networks + (Injection molding simulations are importan … Injection molding simulations are important tools for the development of new injection molds. Existing simulations mostly are numerical solvers based on the finite element method. These solvers are reliable and precise, but very computionally expensive even on simple part geometries. In this thesis, we aim to develop a faster injection molding simulation based on Graph Neural Networks (GNNs). Our approach learns a simulation as a composition of three functions: an encoder, a processor and a decoder. The encoder takes in a graph representation of a 3D geometry of a mold part and returns a numeric embedding of each node and edge in the graph. The processor updates the embeddings of each node multiple times based on its neighbors. The decoder then decodes the final embeddings of each node into physically meaningful variables, say, the fill time of the node. The envisioned GNN architecture has two interesting properties: (i) it is applicable to any kind of material, geometry and injection process parameters, and (ii) it works without a “time integrator”, i.e., it predicts the final result without intermediate steps. We plan to evaluate our architecture by its accuracy and runtime when predicting node properties. We further plan to interpret the learned GNNs from a physical perspective. learned GNNs from a physical perspective.)
Verknüpfung von Textelementen zu Softwarearchitektur-Modellen mit Hilfe von Synsets + (Inkonsistenzen bei der Benennung von Texte … Inkonsistenzen bei der Benennung von Textelementen einer Softwarearchitektur-Dokumentation (SAD) und Modellelementen eines Softwarearchitektur-Modells (SAM) führen zu Problemen bei der Rückverfolgbarkeit. Statt einem direkten Vergleich zwischen den Bezeichnern der Textelemente und den Namen der Modellelemente wird deshalb ein semantischer Vergleich auf Basis von Synsets durchgeführt, die durch die Auflösung sprachlicher Mehrdeutigkeiten (WSD, Word Sense Disambiguation) ermittelt werden. Mit einem WSD-Algorithmus werden die Bedeutungen der Textelemente im Kontext der SAD in Form von Synsets bestimmt. Über diese Synsets werden Synonyme der Textelemente verwendet, um eine Verknüpfung mit den Modellelementen herzustellen. Dadurch ist es möglich, Textelemente zu Modellelementen zuzuordnen, die semantisch dasselbe Element abbilden, aber unterschiedlich benannt sind.bilden, aber unterschiedlich benannt sind.)
Modeling and analyzing zero-trust architectures taking into account various quality objectives + (Integrating a Zero Trust Architecture (ZTA … Integrating a Zero Trust Architecture (ZTA) into a system is a step towards establishing a good defence against external and internal threats. However, there are different approaches to integrating a ZTA which vary in the used components, their assembly and allocation. The earlier in the development process those approaches are evaluated and the right one is selected the more costs and effort can be reduced. In this thesis, we analyse the most prominent standards and specifications for integrating a ZTA and derive a general model by extracting core ZTA tasks and logical components. We model these using the Palladio Component Model to enable assessing ZTAs at design time. We combine performance and security annotations to create a single model which supports both performance and security analysis. By doing this we also assess the possibility of combining performance and security analyses.mbining performance and security analyses.)
Streaming MMD Change Detection + (Kernel methods are among the most well-kno … Kernel methods are among the most well-known approaches in data science. Their ability to represent probability distributions as elements in a reproducing kernel Hilbert space gives rise to maximum mean discrepancy (MMD). MMD quantifies the dissimilarity of two distributions and allows powerful two-sample tests on many domains. One important application of general two-sample tests is change detection in data streams: Here, one tests the null hypothesis that the distributions of data within the stream do not change versus the alternative hypothesis that the distributions do change; a change in distribution then indicates a change point. The broad applicability of kernel-based two-sample tests renders their use for change detection in data streams highly desirable. But, their quadratic runtime complexity prohibits their application. While approximations for kernel methods that reduce their runtime in the static setting exist, their application to data streams is challenging.In this thesis, we propose a novel change detector, RADMAN, which leverages the random Fourier feature-based kernel approximation to efficiently detect changes in data streams with a polylogarithmic runtime complexity of O(log^2 n) per insert operation, with n the total number of observations. The proposed approach runs significantly faster than existing methods but obtains similar result quality. Our experiments on synthetic and real-world data sets show that it performs better than current state-of-the-art approaches. than current state-of-the-art approaches.)
Ein Datensatz handgezeichneter UML-Klassendiagramme für maschinelle Lernverfahren + (Klassendiagramme ermöglichen die grafische … Klassendiagramme ermöglichen die grafische Modellierung eines Softwaresystems.Insbesondere zu Beginn von Softwareprojekten entstehen diese als handgezeichnete Skizzen auf nicht-digitalen Eingabegeräten wie Papier oder Whiteboards.Das Festhalten von Skizzen dieser Art ist folglich auf eine fotografische Lösung beschränkt.Eine digitale Weiterverarbeitung einer auf einem Bild gesicherten Klassendiagrammskizze ist ohne manuelle Rekonstruktion in ein maschinell verarbeitbares Diagramm nicht möglich.Maschinelle Lernverfahren können durch eine Skizzenerkennung eine automatisierte Transformation in ein digitales Modell gewährleisten.Voraussetzung für diese Verfahren sind annotierte Trainingsdaten.Für UML-Klassendiagramme sind solche bislang nicht veröffentlicht.Diese Arbeit beschäftigt sich mit der Erstellung eines Datensatzes annotierter UML-Klassendiagrammskizzen für maschinelle Lernverfahren.Hierfür wird eine Datenerhebung, ein Werkzeug für das Annotieren von UML-Klassendiagrammen und eine Konvertierung der Daten in ein Eingabeformat für das maschinelle Lernen präsentiert.Der annotierte Datensatz wird im Anschluss anhand seiner Vielfältigkeit, Detailtiefe und Größe bewertet.Zur weiteren Evaluation wird der Einsatz des Datensatzes an einem maschinellen Lernverfahren validiert.Das Lernverfahren ist nach dem Training der Daten in der Lage, Knoten mit einem F1-Maß von über 99%, Textpositionen mit einem F1-Maß von über 87% und Kanten mit einem F1-Maß von über 71% zu erkennen.Die Evaluation zeigt folglich, dass sich der Datensatz für den Einsatz maschineller Lernverfahren eignet.Einsatz maschineller Lernverfahren eignet.)

Analyzing Efficiency of High-Performance Applications + (Kurzfassung)
Analyzing Scientific Workflow Management Systems + (Kurzfassung)
Commit-basierte kontinuierliche Integration von Leistungsmodellen + (Kurzfassung)
Concept and Implementation of a Delta Chain + (Kurzfassung)
Definition einer Referenzarchitektur für organisationsübergreifende Zusammenarbeit in modellbasierten Entwicklungsprozessen zur Wahrung des geistigen Eigentums + (Kurzfassung)
Efficient Reduction of Energy Time Series + (Kurzfassung)
Entwurf eines Migrationsverfahren für Microsoft Access Anwendungen + (Kurzfassung)
Erzeugung von Verschlüsselungsregeln auf Modelländerungen aus Zugriffskontrollregeln auf Modellelementen + (Kurzfassung)
Evaluation und Optimierung der Wartbarkeit von Software-Architekturen + (Kurzfassung)
Extraktion von Label-Propagationsfunktionen für Informationsflussanalysen aus architekturellen Verhaltensbeschreibungen + (Kurzfassung)
Iterative Quelltextanalyse für Informationsflusssicherheit zur Überprüfung von Vertraulichkeit auf Architekturebene + (Kurzfassung)
Optimierung des Migrationsverfahrens in modellbasierten E/E-Entwicklungswerkzeugen durch bedarfsorientierte Prozessierung der Historie von Bestandsmodellen + (Kurzfassung)
Retrieval-Augmented Large Language Models for Traceability Link Recovery + (Kurzfassung)
Source-Target-Mapping von komplexen Relationen in Modell-zu-Modell-Transformationen + (Kurzfassung)

Exploring The Robustness Of The Natural Language Inference Capabilties Of T5 + (Large language models like T5 perform exce … Large language models like T5 perform excellently on various NLI benchmarks. However, it has been shown that even small changes in the structure of these tasks can significantly reduce accuracy. I build upon this insight and explore how robust the NLI skills of T5 are in three scenarios. First, I show that T5 is robust to some variations in the MNLI pattern, while others degenerate performance significantly. Second, I observe that some other patterns that T5 was trained on can be substituted for the MNLI pattern and still achieve good results. Third, I demonstrate that the MNLI pattern translate well to other NLI datasets, even improving accuracy by 13% in the case of RTE. All things considered, I conclude that the robustness of the NLI skills of T5 really depend on which alterations are applied.y depend on which alterations are applied.)
Theory-Guided Data Science for Lithium-Ion Battery Modeling + (Lithium-ion batteries are driving innovati … Lithium-ion batteries are driving innovation in the evolution of electromobility and renewable energy. These complex, dynamic systems require reliable and accurate monitoring through Battery Management Systems to ensure the safety and longevity of battery cells. Therefore an accurate prediction of the battery voltage is essential which is currently realized by so-called Equivalent Circuit (EC) Models. Although state-of-the-art approaches deliver good results, they are hard to train due to the high number of variables, lacking the ability to generalize, and need to make many simplifying assumptions. In contrast to theory-based models, purely data-driven approaches require large datasets and are often unable to produce physically consistent results. Theory-Guided Data Science (TGDS) aims at using scientific knowledge to improve the effectiveness of Data Science models in scientific discovery. This concept has been very successful in several domains including climate science and material research. Our work is the first one to apply TGDS to battery systems by working together closely with domain experts. We compare the performance of different TGDS approaches against each other as well as against the two baselines using only theory-based EC-Models and black-box Machine Learning models.els and black-box Machine Learning models.)
Attention Based Selection of Log Templates for Automatic Log Analysis + (Log analysis serves as a crucial preproces … Log analysis serves as a crucial preprocessing step in text log data analysis, including anomaly detection in cloud system monitoring. However, selecting an optimal log parsing algorithm tailored to a specific task remains problematic.With many algorithms to choose from, each requiring proper parameterization, making an informed decision becomes difficult. Moreover, the selected algorithm is typically applied uniformly across the entire dataset, regardless of the specific data analysis task, often leading to suboptimal results.In this thesis, we evaluate a novel attention-based method for automating the selection of log parsing algorithms, aiming to improve data analysis outcomes. We build on the success of a recent Master Thesis, which introduced this attention-based method and demonstrated its promising results for a specific log parsing algorithm and dataset. The primary objective of our work is to evaluate the effectiveness of this approach across different algorithms and datasets. across different algorithms and datasets.)
Metamodel Evolution in the Context of a MOF-Based Metamodeling Infrastructure + (Lorem ipsum dolor sit amet, consetetur sad … Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.ta sanctus est Lorem ipsum dolor sit amet.)
Evaluation of Automated Feature Generation Methods + (Manual feature engineering is a time consu … Manual feature engineering is a time consuming and costly activity, when developing new Machine Learning applications, as it involves manual labor of a domain expert. Therefore, efforts have been made to automate the feature generation process. However, there exists no large benchmark of these Automated Feature Generation methods. It is therefore not obvious which method performs well in combination with specific Machine Learning models and what the strengths and weaknesses of these methods are. In this thesis we present an evaluation framework for Automated Feature Generation methods, that is integrated into the scikit-learn framework for Python. We integrate nine Automated Feature Generation methods into this framework.We further evaluate the methods on 91 datasets for classification problems. The datasets in our evaluation have up to 58 features and 12,958 observations. As Machine Learning models we investigate five models including state of the art models like XGBoost.ding state of the art models like XGBoost.)
Surrogate Model Based Process Parameters Optimization of Textile Forming + (Manufacturing optimization is crucial for … Manufacturing optimization is crucial for organizations to remain competitive in the market. However, complex processes, such as textile forming, can be challenging to optimize, requiring significant resources. Surrogate-based optimization is an efficient method that uses simplified models to guide the search for optimal parameter combinations of manufacturing processes. Moreover, incorporating uncertainty estimates into the model can further speed up the optimization process, which can be achieved by using Bayesian deep neural networks. Additionally, convolutional neural networks can take advantage of spatial information in the images that are part of the textile forming parameters. In this work, a Bayesian deep convolutional surrogate model is proposed that uses all available process parameters to predict the shear angle of a textile element. By incorporating background information into the surrogate model, it is expected to predict detailed process results, leading to greater efficiency and increased product quality. efficiency and increased product quality.)
Streaming Model Analysis - Synergies from Stream Processing and Incremental Model Analysis + (Many modern applications take a potentiall … Many modern applications take a potentially infinite stream of events as input to interpret and process the data. The established approach to handle such tasks is called Event Stream Processing. The underlying technologies are designed to process this stream efficiently, but applications based on this approach can become hard to maintain, as the application grows. A model-driven approach can help to manage increasing complexity and changing requirements. This thesis examines how a combination of Event Stream Processing and Model-Driven Engineering can be used to handle an incoming stream of events. An architecture that combines these two technologies is proposed and two case studies have been performed. The DEBS grand challenges from 2015 and 2016 have been used to evaluate applications based on the proposed architecture towards their performance, scalability and maintainability. The result showed that they can be adapted to a variety of change scenarios with an acceptable cost, but that their processing speed is not competitive.their processing speed is not competitive.)
Empirical Identification of Performance Influences of Configuration Options in High-Performance Applications + (Many modern high-performance applications … Many modern high-performance applications are highly-configurable software systems that provide hundreds or even thousands of configuration options. System administrators or application users need to understand all these options and their impacts on the software performance to choose suitable configuration values. To understand the influence of configuration options on the run-time characteristics of a software system, users can use performance prediction models, but building performance prediction models for highly-configurable high-performance applications is expensive. However, not all configuration options, which a software system offers, are performance-relevant. Removing these performance-irrelevant configuration options from the modeling process can reduce the construction cost. In this thesis, we explore and analyze two different approaches to empirically identify configuration options that are not performance-relevant and can be removed from the performance prediction model. The first approach reuses existing performance modeling methods to create much cheaper prediction models by using fewer samples and then analyzing the models to identify performance-irrelevant configuration options. The second approach uses white-box knowledge acquired through dynamic taint analysis to systematically construct the minimal number of required experiments to detect performance-irrelevant configuration options. In the evaluation with a case study, we show that the first approach identifies performance-irrelevant configuration options but also produces misclassifications. The second approach did not perform to our expectations. Further improvement is necessary.tations. Further improvement is necessary.)
Enabling the Information Transfer between Architecture and Source Code for Security Analysis + (Many software systems have to be designed … Many software systems have to be designed and developed in a way that specific security requirements are guaranteed. Security can be specified on different views of the software system that contain different kinds of information about the software system. Therefore, a security analysis on one view must assume security properties of other views. A security analysis on another view can be used to verify these assumptions. We provide an approach for enabling the information transfer between a static architecture analysis and a static, lattice-based source code analysis. This approach can be used to reduce the assumptions in a component-based architecture model. In this approach, requirements under which information can be transferred between the two security analyses are provided. We consider the architecture and source code security analysis as black boxes. Therefore, the information transfer between the security analyses is based on a megamodel consisting of the architecture model, the source code model, and the source code analysis results. The feasibility of this approach is evaluated in a case study using Java Object-sensitive ANAlysis and Confidentiality4CBSE. The evaluation shows that information can be transferred between an architecture and a source code analysis. The information transfer reveals new security violations which are not found using only one security analysis.ot found using only one security analysis.)
Auswirkungen von Metamodellen auf Modellanalysen + (Metamodelle sind das zentrale Artefakt bei … Metamodelle sind das zentrale Artefakt bei der modellgetriebenen Softwareentwicklung. Obwohl viele Qualitätsattribute und Evaluierungsmechanismen für Metamodelle bekannt sind, ist es noch nicht empirisch untersucht, welche Auswirkungen Metamodelle auf andere Artefakten haben. Die gegenwärtige Ausarbeitung beschäftigt sich mit der Auswirkung von Metamodellen auf andere Artefakte der Softwareentwicklung. Genauer wird untersucht, inwieweit die Qualitätsattribute von Metamodellen die Modellanalysen und die Modelltransformationen beeinflussen. Zu diesem Zweck werden verschiedene Artefakte analysiert – die Ergebnisse aus Metamodell-Metriken, Code-Metriken von Modellanalysen und ATL-Transformationen, sowie manuellen Bewertungen von Metamodellen. Die Daten werden analysiert, Korrelationen werden bestimmt und Abhängigkeiten werden aufgedeckt.immt und Abhängigkeiten werden aufgedeckt.)
Enabling Architectural Performability Analyses for Microservices via Design Pattern Completions + (Microservices architectures have gained po … Microservices architectures have gained popularity over the recent years, especially since global players in the internet economy changed to this architectural style. Many architectural patterns for recurring problems were identified, such as the Service Discovery for service registration or Client-side Load Balancing for load distribution.Architectural analyses with the Palladio framework allow for the investigation of the attainment of these requirements during design time. The Architectural Templates method combines architecture models with architectural patterns and styles and allows for design-time analyses.In this thesis, we create a Microservices Architectural Templates catalog, containing microservices Architectural Templates. A selection of widely used patterns is analyzed and conceptually mapped to the Architectural Templates method.A case study, conducted with a sample application representing a customer relationship management application, shows that software architects can profit from the provided templates by automatic model completions and accurate analyses results.completions and accurate analyses results.)
Differentially Private Event Sequences over Infinite Streams + (Mit Smart Metern erfasste Datenströme stel … Mit Smart Metern erfasste Datenströme stellen eine Gefahr für die Privatheit dar, sodass Bedarf für Privatheitsverfahren besteht. Aktueller Stand der Technik für Datenströme ist w-event differential privacy. Dies wurde bisher v.a. für die Publikation von Histogram-Queries verwendet. Ziel dieser Arbeit ist die eingehende experimentelle Analyse der Mechanismen, mit dem Fokus darauf zu beurteilen, wie gut diese Mechanismen sich für die Publikation von Sum-Queries, wie sie im Smart Meter Szenario gebraucht werden, eignen. Die Arbeit besteht aus drei Teilen: (1) Reproduktion der in der Literatur propagierten guten Ergebnisse der wichtigsten w-event DP Mechanismen für Histogram-Queries, (2) Evaluierung deren Qualität bei Anwendung auf Smart Meter Daten (Sum-Queries), (3) Evaluierung der Qualität zweier Mechanismen bzgl. der Gewährleistung von Pan-Privacy, einer erweiterten Garantie. Während wir in (1) die Ergebnisse größtenteils nicht reproduzieren konnten, erzielten wir in (2) gute Ergebnisse. Bzgl. (3) gelang es uns, die theoretische Qualitätsanalyse aus der Literatur zu bestätigen.tsanalyse aus der Literatur zu bestätigen.)
Modellierung und Simulation von dynamischen Container-basierten Software-Architekturen in Palladio + (Mit dem Palladio Komponentenmodell (PCM) l … Mit dem Palladio Komponentenmodell (PCM) lassen sich Softwaresysteme modellieren und simulieren. Moderne verteilte Software-Systeme werden jedoch nicht mehr einfach statisch deployed, sondern es wird ein gewünschter Zustand definiert, der mithilfe einer Kontrollschleife dann eingehalten werden soll. Das passiert dann bspw. durch das Starten oder Stoppen von Containern und Pods. In dieser Arbeit wurde eine Erweiterung des PCM um die Konzepte von Containerorchestrierungswerkzeugen wie Kubernetes erarbeitet und umgesetzt. Zusätzlich wurde ein Konzept erarbeitet um dynamische Containerbasierte Systeme zu simulieren. Es wurde dabei insbesondere die Allokation bzw. Reallokation von Pods zur Simulationszeit betrachtet. Abschließend wurde die Modellerweiterung evaluiert.end wurde die Modellerweiterung evaluiert.)
Tradeoff zwischen Privacy und Utility für Short Term Load Forecasting + (Mit der Etablierung von Smart Metern gehen … Mit der Etablierung von Smart Metern gehen verschiedene Vor- und Nachteile einher. Einerseits bieten die Smart Meter neue Möglichkeiten Energieverbräuche akkurater vorherzusagen (Forecasting) und sorgen damit für eine bessere Planbarkeit des Smart Grids. Andererseits können aus Energieverbrauchsdaten viele private Informationen extrahiert werden, was neue potentielle Angriffsvektoren auf die Privatheit der Endverbraucher impliziert. Der Schutz der Privatheit wird in der Literatur durch verschiedene Perturbations-Methoden umgesetzt. Da Pertubation die Daten verändert, sorgt dies jedoch für weniger akkurate Forecasts. Daher gilt es ein Tradeoff zu finden. In dieser Arbeit werden verschiedene gegebene Techniken zur Perturbation hinsichtlich ihrer Privacy (Schutz der Privatheit) und Utility (Akkuratheit der Forecasts) experimentell miteinander verglichen. Hierzu werden verschiedene Datensätze, Forecasting-Algorithmen und Metriken zur Bewertung von Privacy und Utility herangezogen. Die Arbeit kommt zum Schluss, dass die so genannte Denoise- und WeakPeak-Technik zum Einstellen eines Tradeoffs zwischen Privacy und Utility besonders geeignet ist.rivacy und Utility besonders geeignet ist.)
Einbindung eines EDA-Programms zur Erstellung elektronischer Leiterplatten in das Vitruvius-Framework + (Mithilfe der modellgetriebenen Softwareent … Mithilfe der modellgetriebenen Softwareentwicklung kann im Entwicklungsprozess eines Software-Systems, dieses bzw. dessen Teile und Abstraktionen durch Modelle beschrieben werden. Diese Modelle können untereinander in Abhängigkeitsbeziehungen stehen sowie über redundante Informationen verfügen. Um Inkonsistenzen zu vermeiden, werden Tools zur automatisierten Konsistenzhaltung eingesetzt.In dieser Arbeit wird das EDA-Programm Eagle, das zur Erstellung elektronischer Schaltpläne und Leiterplatten genutzt wird, in das Vitruvius-Framework eingebunden. Bestandteile sind hierbei das Ableiten eines Ecore-Metamodells, das die Schaltplandatei von Eagle beschreibt, das Etablieren von Transformationen zwischen Ecore-Modellen und Schaltplandateien sowie das Extrahieren von Änderungen zwischen zwei chronologisch aufeinanderfolgenden Schaltplandateien. Die extrahierten Änderungen werden in das Vitruvius-Framework eingespielt, wo sie durch das Framework zu in Konsistenzbeziehung stehenden Ecore-Modellen propagiert werden. Zudem wird ein Verfahren eingesetzt, um Änderungen in der Schaltplandatei einem eindeutigen elektronischen Bauteil zuordnen zu können. Dies ist erforderlich, um Bauteile im Kontext mit anderen Programmen zu verfolgen, da die Eigenschaften eines Bauteils in verschiedenen Programmen variieren können.verschiedenen Programmen variieren können.)
Automated Extraction of Stateful Power Models for Cyber Foraging Systems + (Mobile devices are strongly resource-const … Mobile devices are strongly resource-constrained in terms of computing and battery capacity. Cyber-foraging systems circumvent these constraints by offloading a task to a more powerful system in close proximity. Offloading itself induces additional workload and thus additional power consumption on the mobile device. Therefore, offloading systems must decide whether to offload or to execute locally. Power models, which estimate the power consumption for a given workload can be helpful to make an informed decision.Recent research has shown that various hardware components such as wireless network interface cards (WNIC), cellular network interface cards or GPS modules have power states, that is, the power consumption behavior of a hardware component depends on the current state. Power models that consider power states(stateful power models) can be modeled as Power State Machines (PSM). For systems with multiple power states, stateful models proved to be more accurate than models that do not consider power states (stateless models).Manually generating PSMs is time-consuming and limits the practicability of PSMs. Therefore, in this thesis, we explore the possibility of automatically generating PSMs. The contribution of this thesis is twofold: (1) We introduce an automated measurementbased profiling approach (2) and we introduce a step-based approach, which, provided with profiling data, automatically extracts PSMs along with tail states and state transitions.We evaluate the automated PSM extraction in a case study on an offloading speech recognition system. We compare the power consumption prediction accuracy of the generated PSM with the prediction accuracy of a stateless regression based model.Because we measure the power consumption of the whole system, we use along with all WiFi power models the same CPU power model in order to predict the power consumption of the whole system. We find that a slightly adapted version of thegenerated PSM predicts the power consumption with a mean error of approx. 3% and an error of approx. 2% in the best case. In contrast, the regression model produces a mean error ofapprox. 19% and an error of approx. 18% in the best case. an error of approx. 18% in the best case.)
Inkrementelle Modellreduktion zur Verkürzung der Testzyklen in der Transformationsentwicklung + (Modellgetriebene Softwareentwicklung (MDD) … Modellgetriebene Softwareentwicklung (MDD) ist ein Paradigma der Softwareentwicklung, in dem das Modell eine zentrale Rolle spielt. In der MDD wird das Problemfeld durch das Model abstrakt und repräsentativ beschrieben. Im Laufe der Entwicklung wird das Modell durch Modelltransformation schrittweise konkretisiert und schließlich in Programmcode umgewandelt. Je umfangreicher und komplexer das Problemfelds ist, desto größer ist die Anzahl der Modellelemente und desto komplexer ist der Zusammenhang zwischen den Modellelementen. Aus diesem Grund ist die Transformation eines solch großen Modells zeitaufwendig und fehleranfällig. Es werden in der Entwicklung mehrmals Test durchgeführt, um die Korrektheit des Modells und der Transformation zu gewährleisten. Die große Anzahl der Elemente im Modell verlangsamt den Test und erschwert das Finden der Fehlerursache im Modell und in der Transformation. Daher wurde im Rahmen dieser Bachelorarbeit untersucht, ob ein Ausschnitt des Modells existiert, welcher folgende Eigenschaften hat: Dieser Ausschnitt soll nur Teile des originalen Modells enthalten. Weiter sollen mit diesem Ausschnitt alle Fehler des vollständigen Modells repräsentiert werden können. Die Ursache und Korrektur des fehlerhaften Modells und der fehlerhaften Transformation werden im Rahmen dieser Arbeit nicht untersucht. Die Arbeit konzentriert sich auf das Erstellen und Untersuchen dieses Ausschnitts des Modells.ntersuchen dieses Ausschnitts des Modells.)
Anytime Tradeoff Strategies with Multiple Targets + (Modern applications typically need to find … Modern applications typically need to find solutions to complex problems under limited time and resources. In settings, in which the exact computation of indicators can either be infeasible or economically undesirable, the use of “anytime” algorithms, which can return approximate results when interrupted, is particularly beneficial, since they offer a natural way to trade computational power for result accuracy.However, modern systems typically need to solve multiple problems simultaneously. E.g. in order to find high correlations in a dataset, one needs to examine each pair of variables. This is challenging, in particular if the number of variables is large and the data evolves dynamically.This thesis focuses on the following question: How should one distribute resources at anytime, in order to maximize the overall quality of multiple targets? First, we define the problem, considering various notions of quality and user requirements. Second, we propose a set of strategies to tackle this problem. Finally, we evaluate our strategies via extensive experiments. our strategies via extensive experiments.)
Outlier Analysis in Live Systems from Application Logs + (Modern computer applications tend to gener … Modern computer applications tend to generate massive amounts of logs and have become so complex that it is often difficult to explain why applications failed. Locating outliers in application logs can help explain application failures. Outlier detection in application logs is challenging because (1) the log is unstructured text streaming data. (2) labeling application logs is labor-intensive and inefficient.Logs are similar to natural languages. Recent deep learning algorithm Transformer Neural Network has shown outstanding performance in Natural Language Processing (NLP) tasks. Based on these, we adapt Transformer Neural Network to detect outliers from applications logs In an unsupervised way. We compared our algorithm against state-of-the-art log outlier detection algorithms on three widely used benchmark datasets. Our algorithm outperformed state-of-the-art log outlier detection algorithms.-the-art log outlier detection algorithms.)
Subspace Search in Data Streams + (Modern data mining often takes place on hi … Modern data mining often takes place on high-dimensional data streams, which evolve at a very fast pace: On the one hand, the "curse of dimensionality" leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in a few low-dimensional subspaces. On the other hand, data streams are non-stationary and virtually unbounded. Hence, algorithms operating on data streams must work incrementally and take concept drift into account. While "high-dimensionality" and the "streaming setting" provide two unique sets of challenges, we observe that the existing mining algorithms only address them separately. Thus, our plan is to propose a novel algorithm, which keeps track of the subspaces of interest in high-dimensional data streams over time. We quantify the relevance of subspaces via a so-called "contrast" measure, which we are able to maintain incrementally in an efficient way. Furthermore, we propose a set of heuristics to adapt the search for the relevant subspaces as the data and the underlying distribution evolves.We show that our approach is beneficial as a feature selection method and as such can be applied to extend a range of knowledge discovery tasks, e.g., "outlier detection", in high-dimensional data-streams.ection", in high-dimensional data-streams.)
Bewertung verschiedener Parallelisierungsstrategien im Hinblick auf Leistungsfähigkeit von paralleler Programmausführung + (Moderne Prozessoren erreichen eine Leistun … Moderne Prozessoren erreichen eine Leistungssteigerung durch Hinzufügen mehrerer Kerne. Dadurch muss bei der Softwareentwicklung darauf geachtet werden, die Programmabläufe zu parallelisieren. Einflussfaktoren, die die Leistungsfähigkeit paralleler Programmausführung beeinflussen können, wurden bereits kategorisiert. Der Einfluss der gewählten Parallelisierungsstrategie ist dabei unbekannt. Im Rahmen der Bachelorarbeit wurde der Einfluss der gewählten Parallelisierungsstrategie auf die Leistungsfähigkeit von Software untersucht. Dazu wurden unterschiedliche Hardwareanforderungen genutzt. Mit ihnen wurden einzelne Arbeitspakete generiert. Diese wurden durch verschiedene Parallelisierungsstrategien ausgeführt. Die verwendeten Parallelisierungsstrategien sind: Java Threads, Java ParallelStreams, OpenMp und Akka Actor. Bei jeder Ausführung wurden die Laufzeit und das Cacheverhalten gemessen. Zudem wurden die Experimente auf verschiedenen dezidierten Servern und dem BwUniCluster durchgeführt. Die Auswertungen erfolgten mittels Beschleunigungskurven und der Cache Miss Rate. Die Ergebnisse zeigen, dass sich die Parallelisierungsstrategien bei den verwendeten Arbeitspaketen nur in geringem Maße unterscheiden.aketen nur in geringem Maße unterscheiden.)
Integrating Architecture-based Confidentiality Analysis with Code-based Information Flow Analysis + (Moderne Softwaresysteme müssen einer Vielz … Moderne Softwaresysteme müssen einer Vielzahl von Sicherheitsanforderungen gerecht werden. Diese Anforderungen scheinen im Laufe der Zeit immer strenger zu werden. Heutzutage führt ein Softwaresystem, das Vertraulichkeitsanforderungen nicht erfüllt, oft zur unbeabsichtigten Offenlegung sensibler Daten. Dies ist oft mit finanziellen Kosten verbunden, da die DSGVO Bußgelder eingeführt und erhöht hat, kann aber auch den Ruf eines Unternehmens beeinträchtigen und zu Kundenverlusten führen. Viele Sicherheitslücken können aus Diskrepanzen zwischen der Architekturplanung und der Implementierung des Codes entstehen. Aus diesem Grund untersucht diese Arbeit die Integration einer statischen, architekturbasierten Vertraulichkeitsanalyse mit einer statischen, codebasierten Informationsflussanalyse. Durch die Kombination dieser beiden Analysen möchten wir zeigen, dass wir eine Diskrepanz zwischen Design und Implementierung identifizieren können. Der in dieser Arbeit gewählte Ansatz behandelt die Architekturplanung als das beabsichtigte Verhalten des Systems. Es werden die erforderlichen Artefakte generiert, um eine codebasierte Analyse durchzuführen und zu überprüfen, ob die auf der Architektur definierten Eigenschaften auf die Implementierung anwendbar sind. In einer kleinen Studie haben wir die Durchführbarkeit des Ansatzes evaluiert. Zusammenfassend zielt diese Arbeit darauf ab, die Lücke zwischen der architekturellen Sicht und der Codesicht zu überbrücken, indem Vertraulichkeitseigenschaften in beiden verbunden werden.seigenschaften in beiden verbunden werden.)
Rekonstruktion von Komponentenmodellen für Qualitätsvorhersagen auf der Grundlage heterogener Artefakte in der Softwareentwicklung + (Moderne Softwaresysteme werden oftmals nic … Moderne Softwaresysteme werden oftmals nicht mehr als monolithische Anwendungen konstruiert. Verteilte Architekturen liegen im Trend. Der Einsatz von Technologien wie Docker und Spring bringt, neben dem Quelltext, zusätzliche Konfigurationsdateien mit ein. Eine Rekonstruktion der Softwarearchitektur nur anhand des Quelltextes wird dadurch erschwert. Zu Beginn dieser Arbeit wurden einige wissenschaftliche Arbeiten untersucht, die sich mit dem Thema Rekonstruktion von Softwarearchitekturen beschäftigen. Jedoch konnte keine Arbeit gefunden werden, welche sowohl heterogene Softwareartefakte unterstützt als auch ein für die Qualitätsvorhersage geeignetes Modell generiert.Aufgrund dessen stellt diese Arbeit einen neuen Ansatz vor, der mehrere heterogene Softwareartefakte zur Rekonstruktion eines Architekturmodells miteinbezieht. Genauer wird in dieser Arbeit der Ansatz als Prototyp für die Artefakte Java-Quelltext, Dockerfiles, Docker-Compose-Dateien sowie Spring-Konfigurationsdateien umgesetzt. Als Zielmodell kommt das Palladio-Komponentenmodell zum Einsatz, welches sich für Analysen und Simulationen hinsichtlich Performanz und Verlässlichkeit eignet. Es wird näher untersucht, inwiefern die Informationen der Artefakte zusammengeführt werden können. Der Ansatz sieht es vor, die Artefakte zuerst in Modelle zu transformieren. Für diese Transformationen werden zwei unterschiedliche Vorgehensweisen betrachtet. Zuerst soll Java-Quelltext mithilfe von JDT in ein bestehendes Metamodell übertragen werden. Für die übrigen Artefakte wird eine Xtext-Grammatik vorgeschlagen, welche ein passendes Metamodell erzeugen kann. Die Architektur des Ansatzes wurde außerdem so gestaltet, dass eine Anpassung oder Erweiterung bezüglich der unterstützten Artefakte einfach möglich ist.Zum Abschluss wird die prototypische Implementierung beschrieben und evaluiert. Dafür wurden zwei Fallstudien ausgewählt und mithilfe des Prototyps das Architekturmodell der Projekte extrahiert. Die Ergebnisse wurden anhand von vorher definierten Metriken anschließend untersucht. Dadurch konnte gezeigt werden, dass der Ansatz funktioniert und durch die heterogenen Artefakte ein Mehrwert zur Rekonstruktion des Architekturmodells beigetragen werden kann.rchitekturmodells beigetragen werden kann.)
Monitoring Complex Systems with Domain Knowledge: Adapting Contextual Bandits to Tracing Data + (Monitoring in complex computing systems is … Monitoring in complex computing systems is crucial to detect malicious states or errors in program execution. Due to the computational complexity, it is not feasible to monitor all data streams in practice. We are interested in monitoring pairs of highly correlated data streams. However we can not compute the measure of correlation for every pair of data streams at each timestep.Picking highly correlated pairs, while exploring potentially higher correlated ones is an instance of the exploration / exploitation problem. Bandit algorithms are a family of online learning algorithms that aim to optimize sequential decision making and balance exploration and exploitation. A contextual bandit additional uses contextual information to decide better.In our work we want to use a contextual bandit algorithm to keep an overview over highly correlated pairs of data streams. The context in our work contains information about the state of the system, given as execution traces.A key part of our work is to explore and evaluate different representations of the knowledge encapsulated in traces.Also we adapt state-of-the-art contextual bandit algorithms to the use case of correlation monitoring.to the use case of correlation monitoring.)
Integrating Structured Background Information into Time-Series Data Monitoring of Complex Systems + (Monitoring of time series data is increasi … Monitoring of time series data is increasingly important due to massive data generated by complex systems, such as industrial production lines, meteorological sensor networks, or cloud computing centers. Typical time series monitoring tasks include: future value forecasting, detecting of outliers or computing the dependencies.However, the already existing methods for time series monitoring tend to ignore the background information such as relationships between components or process structure that is available for almost any complex system. Such background information gives a context to the time series data, and can potentially improve the performance of time series monitoring tasks.In this bachelor thesis, we show how to incorporate structured background information to improve three different time series monitoring tasks. We perform the experiments on the data from the cloud computing center, where we extract background information from system traces. Additionally, we investigate different representations and quality of background information and conclude that its usefulness is independent from a concrete time series monitoring task.om a concrete time series monitoring task.)
Pattern Matching for Microservices in a Container-Based Architecture + (Multiple containers as packages of softwar … Multiple containers as packages of software code can interact with each other in a network and build together a container-based architecture. Huge architectures are hard to understand without any knowledge about the application or the applied underlying technologies. Therefore, this master thesis uses the approach of design pattern detection to reduce the amount of complexity of one architecture representation to multiple smaller pattern instances. So, a user can understand the depicted pattern instances in a short period of time by knowing the general patterns in advance.y knowing the general patterns in advance.)
Studienplanung mit Hilfe von Workflow-Verifikation: Fokus Dozentensicht + (Nach der Entwicklung eines Informationssys … Nach der Entwicklung eines Informationssystems im Rahmen einer studentischen Teamarbeit am Lehrstuhl "Systeme der Informationsverwaltung", das den Studierenden bei der Studienplanung unterstützt, soll dieses System erweitert werden, sodass es auch den Dozenten bei der Einplanung ihrer Lehrveranstaltungen in das Lehrangebot des jeweiligen Modulhandbuchs unterstützen kann. In dieser Arbeit wurde eine Anforderungsanalyse durchgeführt und konzipiert, wie das existierende System erweitert werden kann. Der Lehrstuhl hat bereits umfangreiche Erfahrung in datengestützter Verifikation von Prozessabläufen unter Nutzung von Petri Netzen. Da ein Studienplan als Ablauf seiner Lehrveranstaltungen als Prozess allerdings mit involvierten Daten modelliert werden kann, wurden in dieser Arbeit Verifikationsmethoden untersucht und kombiniert, um eine Datenwert-basierte Verifikation von Petri-Netz-Modellen zu ermöglichen. Anhand der Ergebnisse wurden Tests durchgeführt, um zu untersuchen, inwiefern solche Verifikationsmethoden die Studienpläne auf Korrektheit überprüfen können. Die Tests und die Untersuchungen haben gezeigt, dass ein Einsatz von Verifikationsmethoden für Petri-Netze zur Unterstützung eines solchen Systems unter bestimmten Einschränkungen ermöglicht werden kann.en Einschränkungen ermöglicht werden kann.)
Modellierung und Simulation von verteilter und wiederverwendbarer nachrichtenbasierter Middleware + (Nachrichtenbasierte Middleware (MOM) wird … Nachrichtenbasierte Middleware (MOM) wird in verschiedenen Domänen genutzt. Es gibt eine Vielzahl von verschiedenen MOMs, die jeweils unterschiedliche Ziele oder Schwerpunkte haben. Währende die einen besonderen Wert auf Performance oder auf Verfügbarkeit legen, möchten andere allseitig einsetzbar sein. Außerdem bieten MOMs eine hohe Konfigurierbarkeit an. Das Ziel dieser Masterarbeit ist es, den Softwarearchitekten bei der Wahl und der Konfiguration einer MOM bereits in der Designphase zu unterstützen. Existierende Modellierungs- und Vorhersagetechniken vernachlässigen den Einfluss von Warteschlangen. Dadurch können bestimmte Effekte der MOM nicht abgebildet werden, zum Beispiel, das Ansteigen der Latenz einer Nachricht, wenn die Warteschlange gefüllt ist. Die Beiträge der Masterarbeit sind: Auswahl und Ausmessen einer MOM, um Effekte und Ressourcenanforderungen zu untersuchen; Performance-Modellierung einer MOM mit Warteschlangen mit anschließender Kalibrierung; Eine Modeltransformation um bereits existierende Modell-Elemente wiederzuverwenden. Der Ansatz wurde mithilfe des SPECjms2007 Benchmarks evaluiert.ilfe des SPECjms2007 Benchmarks evaluiert.)
Automatisierte Gewinnung von Nachverfolgbarkeitsverbindungen zwischen Softwarearchitektur und Quelltext + (Nachverfolgbarkeitsverbindungen zwischen A … Nachverfolgbarkeitsverbindungen zwischen Architektur und Quelltext können das Wissen über ein System erweitern. Aufgrund des Erstellungsaufwands existieren in Softwareprojekten oft keine oder nur unvollständige Nachverfolgbarkeitsinformationen. Diese Arbeit untersucht einen Ansatz mit zwei Schritten, um automatisiert Nachverfolgbarkeitsverbindungen zwischen Architekturmodellelementen und Quelltext zu generieren. Damit die Erstellung von Nachverfolgbarkeitsverbindungen für verschiedene Programmiersprachen und Architektur-Metamodelle vereinheitlicht wird, werden im ersten Schritt aus den vorliegenden Artefakten Modelle erstellt. Der Quelltext wird dabei in ein von der konkreten Programmiersprache unabhängiges Modell überführt. Dafür wird ein Metamodell verwendet, das auf dem von der OMG spezifizierten KDM basiert. Für den zweiten Schritt werden auf den erstellten Modellen arbeitende Heuristiken und Aggregationen definiert. Diese werden genutzt, um die Nachverfolgbarkeitsverbindungen zu generieren. Die Heuristiken nutzen zum Beispiel Paket-, Pfad-, Namen- und Methoden-Informationen. Die Evaluation des Ansatzes nutzt einen dafür erstellten Goldstandard mit fünf Fallstudien. Es werden Nachverfolgbarkeitsverbindungen für PCM, UML, Java und Shell generiert. Für den Mikro-Durchschnitt des F1-Maßes wird ein Wert von 99,11 % erreicht. Fließt jede Komponente und Schnittstelle in gleichem Maße in den Wert ein, beträgt das F1-Maß 93,71 %. Insgesamt können mit dem Ansatz dieser Arbeit also sehr gute Ergebnisse erzielt werden. Für die TEAMMATES-Fallstudie wird mithilfe mehrerer Quelltextversionen der Einfluss der Konsistenz auf die Ergebnisse untersucht. Der Mikro-Durchschnitt des F1-Maßes ist für die konsistentere Version um 6,05 Prozentpunkte höher. Die Konsistenz kann also die Qualität der Ergebnisse beeinflussen. die Qualität der Ergebnisse beeinflussen.)
Entity Recognition in Software Documentation Using Trace Links to Informal Diagrams + (Natural Language Software Architecture Doc … Natural Language Software Architecture Documentation ( NLSAD ) and Software Architecture Model ( SAM) provide information about a software systems design and qualities. Inconsistencies between these artifacts can negatively impact the comprehension and evolution of the system. ArDoCo is an approach that was proposed in prior work by Keim et al. to find such inconsistencies and relies on Traceability Link Recovery (TLR) between entities in the NLSAD and SAM . ArDoCo searches for Unmentioned Model Elements (UMEs) in the model and Missing Model Elements (MMEs) in the text using the linkage information. ArDoCo’s approach shows promising results but has room for improvement regarding precision due to falsely identified textual entities. This work proposes using informal diagrams from the Software Architecture Documentation (SAD) to improve this. The approach performs an additional TLR between the textual entities and the diagram entities. According to heuristics, the linkage of textual entities and diagram entities is utilized to increase or decrease the confidence in textual entities. The Diagram Text TLR and its impact on ArDoCo’s performance are evaluated separately using the same data set as previous work by Keim et al. The data set was extended to include informal diagrams. The Diagram Text TLR achieves a good F1-score with Optical Character Recognition (OCR) of 0.54. The approach improves the MME detection (0.77→0.94 accuracy) by lowering the amount of falsely identified textual entities (0.39→0.69 precision) with a negligible impact on recall. The UME detection and ArDoCo ’s NLSAD to SAM are slightly positively impacted and continue to perform excellently. The results show that using informal diagrams to improve entity recognition in the text is promising. Room for improvement exists in dealing with issues related to OCR and diagram element processing.ted to OCR and diagram element processing.)
Bestimmung von Aktionsidentität in gesprochener Sprache + (Natürliche Sprache enthält Aktionen, die a … Natürliche Sprache enthält Aktionen, die ausgeführt werden können.Innerhalb eines Diskurses kommt es häufig vor, dass Menschen eine Aktion mehrmals beschreiben.Dies muss nicht immer bedeuten, dass diese Aktion auch mehrmals ausgeführt werden soll.Diese Bachelorarbeit untersucht, wie erkannt werden kann, ob sich eine Nennung einer Aktion auf eine bereits genannte Aktion bezieht.Es wird ein Vorgehen erarbeitet, das feststellt, ob sich mehrere Aktionsnennungen in gesprochener Sprache auf dieselbe Aktionsidentität beziehen.Bei diesem Vorgehen werden Aktionen paarweise verglichen.Das Vorgehen wird als Agent für die Rahmenarchitektur PARSE umgesetzt und evaluiert.Das Werkzeug erzielt ein F1-Maß von 0,8, wenn die Aktionen richtig erkannt werden und Informationen über Korreferenz zwischen Entitäten zur Verfügung stehen.z zwischen Entitäten zur Verfügung stehen.)
Performanzmodellierung von Apache Cassandra im Palladio-Komponentenmodell + (NoSQL-Datenbankmanagementsysteme werden al … NoSQL-Datenbankmanagementsysteme werden als Back-End für Software im Big-Data-Bereich verwendet, da sie im Vergleich zu relationalen Datenbankmanagementsystemen besser skalieren, kein festes Datenbankschema benötigen und in virtuellen Systemen einfach eingesetzt werden können. Apache Cassandra wurde aufgrund seiner Verbreitung und seiner Lizensierung als Open-Source-Projekt als Beispiel für NoSQL-Datenbankmanagementsysteme ausgewählt. Existierende Modelle von Apache Cassandra betrachten dabei nur die maximal mögliche Anzahl an Anfragen an Cassandra und deren Durchsatz und Latenz. Diese Anzahl zu reduzieren erhöht die Latenz der einzelnen Anfragen. Das in dieser Bachelorarbeit erstellte Modell soll unter anderem diesen Effekt abbilden.Die Beiträge der Arbeit sind das Erstellen und Parametrisieren eines Modells von Cassandra im Palladio-Komponentenmodell und das Evaluieren des Modells anhand von Benchmarkergebnissen. Zudem wird für dieses Ziel eine Vorgehensweise entwickelt, die das Erheben der notwendigen Daten sowie deren Auswertung und Evaluierung strukturiert und soweit möglich automatisiert und vereinfacht.Die Evaluation des Modells erfolgt durch automatisierte Simulationen, deren Ergebnisse mit den Benchmarks verglichen werden. Dadurch konnte die Anwendbarkeit des Modells für einen Thread und eine beliebige Anzahl Anfragen bei gleichzeitiger Verwendung von einer oder mehreren verschiedenen Operationen, abgesehen von der Scan-Operation, gezeigt werden.en von der Scan-Operation, gezeigt werden.)
Analysis of Classifier Performance on Aggregated Energy Status Data + (Non-intrusive load monitoring (NILM) algor … Non-intrusive load monitoring (NILM) algorithms aim at disaggregating consumption curves of households to the level of single appliances. However, there is no conventional way of quantifying and representing the tradeoff between the quality of analyses, such as the accuracy of the disaggregated consumption curves, and the load on the available computing resources. Thus, it is hard to plan the underlying infrastructure and resources for the analysis system and to find the optimal configuration of the system. This thesis introduces a system that assesses the quality of different analyses and their runtime behavior. This assessment is done based on varying configuration parameters and changed characteristics of the input dataset. Varied characteristics are the granularity of the data and the noisiness of the data. We demonstrate that the collected runtime behavior data can be used to choose reasonable characteristics of the input data set.ble characteristics of the input data set.)
Performancevorhersage für Container-Anwendungen (PdF) + (Nowadays distributed applications are ofte … Nowadays distributed applications are often not statically deployed on virtual machines. Instead, a desired state is defined declaratively. A control loop then tries to create the desired state in the cluster. Predicting the impact on the performance of a system using these deployment techniques is difficult. This paper introduces a method to predict the performance impact of the usage of containers and container orchestration in the deployment of a system. Our proposed approach enables system simulation and experimentation with various mechanisms of container orchestration, including autoscaling and container scheduling. We validated this approach using a micro-service reference application across different scenarios. Our findings suggest, that the simulation could effectively mimic most features of container orchestration tools, and the performance prediction of containerized applications in dynamic scenarios could be improved significantly.scenarios could be improved significantly.)
Enabling Consistency between Software Artefacts for Software Adaption and Evolution + (Nowadays, software systems are evolving at … Nowadays, software systems are evolving at a pace never seen before. As a result, emerging inconsistencies between different software artifacts are almost inevitable. Currently, there are already approaches for automated consistency maintenance between source code and architecture models. However, these approaches have various limitations. Therefore, in this thesis, we present a comprehensive approach for supporting the consistency preservation between software artifacts with special focus on software evolution and adaptation. At design-time, source code analysis and consistency rules are used, while at run-time, monitoring data is used as input for a transformation pipeline. In contrast to already existing approaches, the automated derivation of the system composition is supported. Ultimately, self-validations were included as a central component of the approach. In a case study based evaluation the accuracy of the models and the performance of the approach was measured. In addition, the scalability of the transformations within the pipeline was investigated.ions within the pipeline was investigated.)
Injection Molding Simulation based on Graph Neural Networks (GNNs) + (Numerical filling simulations are an impor … Numerical filling simulations are an important tool for the development of injection molding parts. Existing simulations rely on numerical solvers based on the finite element method. These solvers are reliable and precise, but very computationally expensive even on simple part geometries.In this thesis, we aim to develop a faster injection molding simulation based on Graph Neural Networks (GNNs) as a surrogate model. Our approach learns a simulation as a composition of three functions: an encoder, a processor and a decoder. The encoder takes in a graph representation of a 3D geometry of an injection molding part and returns a numeric embedding of each node in the graph. The processor updates the embeddings of each node multiple times based on its neighbors. The decoder then decodes the final embeddings of each node into physically meaningful variables, say, the fill state of the node.Our model can predict the progression of the flow front during a time step with a fixed size. To simulate a full mold filling process, our model is applied sequentially until the entire mold is filled. Our architecture is applicable to any kind of material, geometry and injection process parameters. We evaluate our architecture by its accuracy and runtime when predicting node properties. We also evaluate our models transfer learning ability on a real world injection molding part.ty on a real world injection molding part.)
Optimizing Parametric Dependencies for Incremental Performance Model Extraction + (Often during the development phase of a so … Often during the development phase of a software, engineers are facing different implementation alternatives. In order to test several options without investing the resources in implementing each one of them, a so-called performance model comes in practice. By using a performance models the developers can simulate the system in diverse scenarios and conditions. To minimize the differences between the real system and its model, i.e. to improve the accuracy of the model, parametric dependencies are introduced. They express a relation between the input arguments and the performance model parameters of the system. The latter could be loop iteration count, branch transition probabilities, resource demands or external service call arguments.Existing works in this field have two major shortcomings - they either do not perform incremental calibration of the performance model (updating only changed parts of the source code since the last commit), or do not consider more complex dependencies than linear. This work is part of the approach for the continuous integration of performance models. Our aim is to identify parametric dependencies for external service calls, as well as, to optimize the existing dependencies for the other types of performance model parameters. We propose using two machine learning algorithms for detecting initial dependencies and then refining the mathematical expressions with a genetic programming algorithm. Our contribution also includes feature selection of the candidates for a dependency and consideration not only of input service arguments but also the data flow i.e., the return values of previous external calls. return values of previous external calls.)
Automatically detecting Performance Regressions + (One of the most important aspects of softw … One of the most important aspects of software engineering is system performance. Common approaches to verify acceptable performance include running load tests on deployed software. However, complicated workflows and requirements like the necessity of deployments and extensive manual analysis of load test results cause tests to be performed very late in the development process, making feedback on potential performance regressions available much later after they were introduced.With this thesis, we propose PeReDeS, an approach that integrates into the development cycle of modern software projects, and explicitly models an automated performance regression detection system that provides feedback quickly and reduces manual effort for setup and load test analysis. PeReDeS is embedded into pipelines for continuous integration, manages the load test execution and lifecycle, processes load test results and makes feedback available to the authoring developer via reports on the coding platform. We further propose a method for detecting deviations in performance on load test results, based on Welch's t-test. The method is adapted to suit the context of performance regression detection, and is integrated into the PeReDeS detection pipeline. We further implemented our approach and evaluated it with an user study and a data-driven study to evaluate the usability and accuracy of our method. the usability and accuracy of our method.)
Evaluating architecture-based performance prediction for MPI-based systems + (One research field of High Performance Com … One research field of High Performance Computing (HPC) is computing clusters. Computing clusters are distributed memory systems where different machines are connected through a network. To enable the machines to communicate with each other they need the ability to pass messages to each other through the network. The Message Passing Interface (MPI) is the standard in implementing parallel systems for distributed memory systems. To enable software architects in predicting the performance of MPI-based systems several approaches have been proposed. However, those approaches depend either on an existing implementation of a program or are tailored for specific programming languages or use cases. In our approach, we use the Palladio Component Model (PCM) that allows us to model component-based architectures and to predict the performance of the modeled system. We modeled different MPI functions in the PCM that serve as reusable patterns and a communicator that is required for the MPI functions. The expected benefit is to provide patterns for different MPI functions that allow a precise modelation of MPI-based systems in the PCM. And to obtain a precise performance prediction of a PCM instance. performance prediction of a PCM instance.)
Batch query strategies for one-class active learning + (One-class classifiers learn to distinguish … One-class classifiers learn to distinguish normal objects from outliers. These classifiers are therefore suitable for strongly imbalanced class distributions with only a small fraction of outliers. Extensions of one-class classifiers make use of labeled samples to improve classification quality. As this labeling process is often time-consuming, one may use active learning methods to detect samples where obtaining a label from the user is worthwhile, with the goal of reducing the labeling effort to a fraction of the original data set. In the case of one-class active learning this labeling process consists of sequential queries, where the user labels one sample at a time. While batch queries where the user labels multiple samples at a time have potential advantages, for example parallelizing the labeling process, their application has so far been limited to binary and multi-class classification. In this thesis we explore whether batch queries can be used for one-class classification. We strive towards a novel batch query strategy for one-class classification by applying concepts from multi-class classification to the requirements of one-class active learning.requirements of one-class active learning.)
Performance Modeling of Distributed Computing + (Optimizing resource allocation in distribu … Optimizing resource allocation in distributed computing systems is crucial for enhancing system efficiency and reliability. Predicting job execution metadata, based on resource demands and platform characteristics, plays a key role in this optimization process.Distributed computing simulators are utilized for this purpose to model and predict system behaviors.Among the various simulators developed in recent decades, this thesis specifically focuses on the state-of-the-art simulator DCSim. DCSim simulates the nodes and links of the configured platform, generates the workloads according to configured parameter distributions, and performs the simulations. The simulated job execution metadata is accurate, yet the simulations demand computational resources and time that increase superlinearly with the number of nodes simulated.In this thesis, we explore the application of Recurrent Neural Networks and Transformer models for predicting job execution metadata within distributed computing environments.We focus on data preparation, model training, and evaluation for handling numerical sequences of varying lengths.This approach enhances the scalability of predictive systems by leveraging deep neural networks to interpret and forecast job execution metadata based on simulated data or historical data.We assess the models across four scenarios of increasing complexity, evaluating their ability to generalize for unseen jobs and platforms.We examine the training duration and the amount of data necessary to achieve accurate predictions and discuss the applicability of such models to overcome the scalability challenges of DCSim.The key findings of this work demonstrate that the models are capable of generalizing across sequences of lengths encountered during training but fall short in generalizing across different platforms.n generalizing across different platforms.)
Density-Based Outlier Detection Benchmark on Synthetic Data + (Outlier detection algorithms are widely us … Outlier detection algorithms are widely used in application fields such as image processing and fraud detection. Thus, during the past years, many different outlier detection algorithms were developed. While a lot of work has been put into comparing the efficiency of these algorithms, comparing methods in terms of effectiveness is rather difficult. One reason for that is the lack of commonly agreed-upon benchmark data.In this thesis the effectiveness of density-based outlier detection algorithms (such as KNN, LOF and related methods) on entirely synthetically generated data are compared, using its underlying density as ground truth.ng its underlying density as ground truth.)
High-Dimensional Neural-Based Outlier Detection + (Outlier detection in high-dimensional spac … Outlier detection in high-dimensional spaces is a challenging task because of consequences of the curse of dimensionality. Neural networks have recently gained in popularity for a wide range of applications due to the availability of computational power and large training data sets. Several studies examine the application of different neural network models, such an autoencoder, self-organising maps and restricted Boltzmann machines, for outlier detection in mainly low-dimensional data sets. In this diploma thesis we investigate if these neural network models can scale to high-dimensional spaces, adapt the useful neural network-based algorithms to the task of high-dimensional outlier detection, examine data-driven parameter selection strategies for these algorithms, develop suitable outlier score metrics for these models and investigate the possibility of identifying the outlying dimensions for detected outliers.outlying dimensions for detected outliers.)