
Aus SDQ-Institutsseminar

Dies ist ein Attribut des Datentyps Text.

Unterhalb werden 20 Seiten angezeigt, auf denen für dieses Attribut ein Datenwert gespeichert wurde.
Students are confronted with a huge amount of regulations when planning their studies at a university. It is challenging for them to create a personalized study plan while still complying to all official rules. The STUDYplan software aims to overcome the difficulties by enabling an intuitive and individual modeling of study plans. A study plan can be interpreted as a sequence of business process tasks that indicate courses to make use of existing work in the business process domain. This thesis focuses on the idea of synthesizing business process models from declarative specifications that indicate official and user-defined regulations for a study plan. We provide an elaborated approach for the modeling of study plan constraints and a generation concept specialized to study plans. This work motivates, discusses, partially implements and evaluates the proposed approach.  +
Te prediction of material failure is useful in many industrial contexts such as predictive maintenance, where it helps reducing costs by preventing outages. However, failure prediction is a complex task. Typically, material scientists need to create a physical material model to run computer simulations. In real-world scenarios, the creation of such models is ofen not feasible, as the measurement of exact material parameters is too expensive. Material scientists can use material models to generate simulation data. Tese data sets are multivariate sensor value time series. In this thesis we develop data-driven models to predict upcoming failure of an observed material. We identify and implement recurrent neural network architectures, as recent research indicated that these are well suited for predictions on time series. We compare the prediction performance with traditional models that do not directly predict on time series but involve an additional step of feature calculation. Finally, we analyze the predictions to fnd abstractions in the underlying material model that lead to unrealistic simulation data and thus impede accurate failure prediction. Knowing such abstractions empowers material scientists to refne the simulation models. The updated models would then contain more relevant information and make failure prediction more precise.  +
Data flow is becoming more and more important for business processes over the last few years. Nevertheless, data in workflows is often considered as second-class object and is not sufficiently supported. In many domains, such as the energy market, the importance of compliance requirements stemming form legal regulations or specific standards has dramatically increased over the past few years. To be broadly applicable, compliance verification has to support data-aware compliance rules as well as to consider data conditions within a process model. In this thesis we model the data-flow of data objects for a scenario in the energy market domain. For this purpose we use a scientific workflow management system, namely the Apache Taverna. We will then insure the correctness of the data flow of the process model. The theoretical starting point for this thesis is a verification approach of the supervisors of this thesis. It formalizes BPMN process models by mapping them to Petri Nets and unfolding the execution semantics regarding data. We develop an algorithm for transforming Taverna workflows to BPMN 2.0. We then ensure the correctness of the data-flow of the process model. For this purpose we analyse which compliance rules are relevant for the data objects and how to specify them using anti-patterns.  +
Static Code Analysis (SCA) has become an integral part of modern software development, especially since the rise of automation in the form of CI/CD. It is an ongoing question of how machine learning can best help improve SCA's state and thus facilitate maintainable, correct, and secure software. However, machine learning needs a solid foundation to learn on. This thesis proposes an approach to build that foundation by mining data on software issues from real-world code. We show how we used that concept to analyze over 4000 software packages and generate over two million issue samples. Additionally, we propose a method for refining this data and apply it to an existing machine learning SCA approach.  +
A group of people with diferent personal preferences wants to fnd a solution to a problem with high variability. Making decisions in the group comes with problems as a lack of communication leads to worse decision outcomes. Group dynamics and biases can lead to suboptimal decisions. Generally group decisions are complex and often the process that yields the decision result is unstructured, thereby not providing any reproducibility of the success. Groups have different power structures and usually individuals have diferent interests. Moreover finding solutions is a rather complex task and group decisions can sufer intransparency. To support groups in their decision making product confguration can be used. It allows to accurately map constraints and dependencies in complex problems and to map the solution space. Using a group recommender a group is supported in their confguration decisions. The goal is to show that these approaches can help a group with the confguration task presented by the usage of a configurator and to better process individual preferences than a human can. The benefts of this approach are, that the need for a group to communicate directly is reduced. Each user gives their own preferences and the group will get a recommendation based on that. This allows to reduce problems arising in groups decisions like lack of communication and bias in groups. Additionally this shows the viability of combining group recommendations and configuration approaches.  +
Consistency preservation between two metamodels can be achieved by defining a model transformation that repairs inconsistencies. In that case, there exists a consistency relation between metamodels. When there are multiple interrelated metamodels, consistency relations form a network. In multi-model consistency preservation, we are interested in methods to preserve consistency in a network of consistency relations. However, combinations of binary transformations can lead to specific interoperability issues. The purpose of this thesis is the decomposition of relations, an optimization technique for consistency relation networks. In this thesis, we design a decomposition procedure to detect independent and redundant subsets of consistency relations. The procedure aims to help developers find incompatibilities in consistency relation networks.  +
Entwicklungsprozesse von komplexen, softwareintensiven Systemen sind heutzutage von organisationsübergreifender Zusammenarbeit geprägt. Organisationen teilen verschiedene Artefakte miteinander und leisten darauf aufbauend ihren Beitrag zur Systementwicklung. Die Synchronisation von Änderungen an solchen geteilten Artefakten erfolgt hauptsächlich in Form von regelmäßigen, aber seltenen Meetings. Darüber hinaus enthalten die Artefakte im Allgemeinen geistiges Eigentum, das geschützt werden muss, auch vor mitwirkenden Organisationen. Wir entwerfen eine Referenzarchitektur, die einen konstanten Datenfluss beim organisationsübergreifenden Austausch von Artefakten unter Schutz des geistigen Eigentums ermöglicht.  +
Outlier detection algorithms are widely used in application fields such as image processing and fraud detection. Thus, during the past years, many different outlier detection algorithms were developed. While a lot of work has been put into comparing the efficiency of these algorithms, comparing methods in terms of effectiveness is rather difficult. One reason for that is the lack of commonly agreed-upon benchmark data. In this thesis the effectiveness of density-based outlier detection algorithms (such as KNN, LOF and related methods) on entirely synthetically generated data are compared, using its underlying density as ground truth.  +
Outlier detection is a popular topic in research, with a number of different approaches developed. Evaluating the effectiveness of these approaches however is a rather rarely touched field. The lack of commonly accepted benchmark data most likely is one of the obstacles for running a fair comparison of unsupervised outlier detection algorithms. This thesis compares the effectiveness of twelve density-based outlier detection algorithms in nearly 800.000 experiments over a broad range of algorithm parameters using the probability density as ground truth.  +
In view-based software development, views may share concepts and thus contain redundant or dependent information. Keeping the individual views synchronized is a crucial property to avoid inconsistencies in the system. In approaches based on a Single Underlying Model (SUM), inconsistencies are avoided by establishing the SUM as a single source of truth from which views are projected. To synchronize updates from views to the SUM, delta-based consistency preservation is commonly applied. This requires the views to provide fine-grained change sequences which are used to incrementally update the SUM. However, the functionality of providing these change sequences is rarely found in real-world applications. Instead, only state-based differences are persisted. Therefore, it is desirable to also support views which provide state-based differences in delta-based consistency preservation. This can be achieved by estimating the fine-grained change sequences from the state-based differences. This thesis evaluates the quality of estimated change sequences in the context of model consistency preservation. To derive such sequences, matching elements across the compared models need to be identified and their differences need to be computed. We evaluate a sequence derivation strategy that matches elements based on their unique identifier and one that establishes a similarity metric between elements based on the elements’ features. As an evaluation baseline, different test suites are created. Each test consists of an initial and changed version of both a UML class diagram and consistent Java source code. Using the different strategies, we derive and propagate change sequences based on the state-based difference of the UML view and evaluate the outcome in both domains. The results show that the identity-based matching strategy is able to derive the correct change sequence in almost all (97 %) of the considered cases. For the similarity-based matching strategy we identify two reoccurring error patterns across different test suites. To address these patterns, we provide an extended similarity-based matching strategy that is able to reduce the occurrence frequency of the error patterns while introducing almost no performance overhead.  
Twitter has been identified as a relevant data source for modelling purposes in the last decade. In this work, our goal was to model the conversational dynamics of inflation development in Germany through Twitter Data Mining. To accomplish this, we summarized and compared Twitter data mining techniques for time series data from pertinent research. Then, we constructed five models for generating time series from topic-related tweets and user profiles of the last 15 years. Evaluating the models, we observed that several approaches like modelling for user impact or adjusting for automated twitter accounts show promise. Yet, in the scenario of modelling inflation expectation dynamics, these more complex models could not contribute to a higher correlation between German CPI and the resulting time series compared to a baseline approach.  +
Die Spezifikation eines software-intensiven Systems umfasst eine Vielzahl von Artefakten. Diese Artefakte sind nicht unabhängig voneinander, sondern stellen die gleichen Elemente des Systems in unterschiedlichen Kontexten und Repräsentationen dar. In dieser Arbeit wurde im Rahmen einer Fallstudie ein neuer Ansatz untersucht, mit dem sich diese Überschneidungen von Artefakten konsistent halten lassen. Die Idee ist es, die Gemeinsamkeiten der Artefakte explizit zu modellieren und Änderungen über ein Zwischenmodell dieser Gemeinsamkeiten zwischen Artefakten zu übertragen. Der Ansatz verspricht eine bessere Verständlichkeit der Abhängigkeiten zwischen Artefakten und löst einige Probleme bisheriger Ansätze für deren Konsistenzerhaltung. Für die Umsetzung der Fallstudie wurde eine Sprache weiterentwickelt, mit der sich die Gemeinsamkeiten und deren Manifestationen in den verschiedenen Artefakten ausdrücken lassen. Wir konnten einige grundlegende Funktionalitäten der Sprache ergänzen und damit 64% der Konsistenzbeziehungen in unserer Fallstudie umsetzen. Für die restlichen Konsistenzbeziehungen müssen weitere Anpassungen an der Sprache vorgenommen werden. Für die Evaluation der generellen Anwendbarkeit des Ansatzes sind zusätzliche Fallstudien nötig.  +
In the early stages of developing a software architecture, many properties of the final system are yet unknown, or difficult to determine. There may be multiple viable architectures, but uncertainty about which architecture performs the best. Software architects can use Design Space Exploration to evaluate quality properties of architecture candidates to find the optimal solution. Design Space Exploration can be a resource intensive process. An architecture candidate may feature certain properties which disqualify it from consideration as an optimal candidate, regardless of its quality metrics. An example for this would be confidentiality violations in data flows introduced by certain components or combinations of components in the architecture. If these properties can be identified early, quality evaluation can be skipped and the candidate discarded, saving resources. Currently, analyses for identifying such properties are performed disjunct from the design space exploration process. Optimal candidates are determined first, and analyses are then applied to singular architecture candidates. Our approach augments the PerOpteryx design space exploration pipeline with an additional architecture candidate filter stage, which allows existing generic candidate analyses to be integrated into the DSE process. This enables automatic execution of analyses on architecture candidates during DSE, and early discarding of unwanted candidates before quality evaluation takes place. We use our filter stage to perform data flow confidentiality analyses on architecture candidates, and further provide a set of example analyses that can be used with the filter. We evaluate our approach by running PerOpteryx on case studies with our filter enabled. Our results indicate that the filter stage works as expected, able to analyze architecture candidates and skip quality evaluation for unwanted candidates.  +
Die Arbeit entwickelt einen Ansatz, der die automatische Adaption mit Fokus auf die Leistungsoptimierung mit einem Ansatz zur Bedienerintegration vereint. Der Ansatz verwendt automatischen Entwurfsraumexploration, um Laufzeit-Architekturmodelle der Anwendung zu optimieren und mit einem Modell-basierten Ansatz zur Adaptionsplanung und -ausführung zu kombinieren, der Bedienereingrife während der Adaptionsausführung ermöglicht.  +
Business Process Model and Notation (BPMN) is a standard language to specify business process models. It helps organizations around the world to analyze, improve and automate their processes. It is very important to make sure that those models are correct, as faulty models can do more harm than good. While many verification methods for BPMN concentrate only on control flow, the importance of correct data flow is often neglected. Additionally the few approaches tackling this problem, only do it on a surface level ignoring certain important aspects, such as data states. Because data objects with states can cause different types of errors than data objects without them, ignoring data states can lead to overlooking certain mistakes. This thesis tries to address the problem of detecting data flow errors on the level of data states, while also taking optional data and alternative data into account. We propose a new transformation for BPMN models to Petri Nets and specify suitable anti-patterns. Using a model checker, we are then capable of automatically detecting data flow errors regarding data states. In combination with existing approaches, which detect control flow errors or data flow errors on the level of data values, business process designers will be able to prove with a higher certainty that their models are actually flawless.  +
Using outlier detection algorithms, e.g., Support Vector Data Description (SVDD), for detecting outlying time-series usually requires extracting domain-specific attributes. However, this indirect way needs expert knowledge, making SVDD impractical for many real-world use cases. Incorporating "Global Alignment Kernels" directly into SVDD to compute the distance between time-series data bypasses the attribute-extraction step and makes the application of SVDD independent of the underlying domain. In this work, we propose a new time-series outlier detection algorithm, combining "Global Alignment Kernels" and SVDD. Its outlier detection capabilities will be evaluated on synthetic data as well as on real-world data sets. Additionally, our approach's performance will be compared to state-of-the-art methods for outlier detection, especially with regard to the types of detected outliers.  +
Detecting outlying time-series poses two challenges: First, labeled training data is rare, as it is costly and error-prone to obtain. Second, algorithms usually rely on distance metrics, which are not readily applicable to time-series data. To address the first challenge, one usually employs unsupervised algorithms. To address the second challenge, existing algorithms employ a feature-extraction step and apply the distance metrics to the extracted features instead. However, feature extraction requires expert knowledge, rendering this approach also costly and time-consuming. In this thesis, we propose GAK-SVDD. We combine the well-known SVDD algorithm to detect outliers in an unsupervised fashion with Global Alignment Kernels (GAK), bypassing the feature-extraction step. We evaluate GAK-SVDD's performance on 28 standard benchmark data sets and show that it is on par with its closest competitors. Comparing GAK with a DTW-based kernel, GAK improves the median Balanced Accuracy by 4%. Additionally, we extend our method to the active learning setting and examine the combination of GAK and domain-independent attributes.  +
This thesis focuses on the development of a database application that enables a comparative analysis between the Google Books Ngram Corpus(GBNC) and a German news corpora. The GBNC provides a vast collection of books spanning various time periods, while the German news corpora encompass up-to-date linguistic data from news sources. Such comparison aims to uncover insights into language usage patterns, linguistic evolution, and cultural shifts within the German language. Extracting meaningful insights from the compared corpora requires various linguistic metrics, statistical analyses and visualization techniques. By identifying patterns, trends and linguistic changes we can uncover valuable information on language usage evolution over time. This thesis provides a comprehensive framework for comparing the GBNC to other corpora, showcasing the development of a database application that enables not only valuable linguistic analyses but also shed light on the composition of the GBNC by highlighting linguistic similarities and differences.  +
In the last decade, ample research has been produced regarding the value of user-generated data from microblogs as a basis for time series analysis in various fields.In this context, the objective of this thesis is to develop a domain-agnostic framework for mining microblog data (i.e., Twitter). Taking the subject related postings of a time series (e.g., inflation) as its input, the framework will generate temporal data sets that can serve as basis for time series analysis of the given target time series (e.g., inflation rate). To accomplish this, we will analyze and summarize the prevalent research related to microblog data-based forecasting and analysis, with a focus on the data processing and mining approach. Based on the findings, one or several candidate frameworks are developed and evaluated by testing the correlation of their generated data sets against the target time series they are generated for. While summative research on microblog data-based correlation analysis exists, it is mainly focused on summarizing the state of the field. This thesis adds to the body of research by applying summarized findings and generating experimental evidence regarding the generalizability of microblog data mining approaches and their effectiveness.  +
There are many data structures and indices that speed up kNN queries on time series. The existing indices are designed to work on the full time series only. In this thesis we develop a data structure that allows speeding up kNN queries in an arbitrary time range, i.e. for an arbitrary subsequence.  +