Data-Preparation for Machine-Learning Based Static Code Analysis

Aus SDQ-Institutsseminar
Vortragende(r) Felix Griesau
Vortragstyp Masterarbeit
Betreuer(in) Robert Heinrich
Termin Fr 1. April 2022
Vortragsmodus online
Kurzfassung Static Code Analysis (SCA) has become an integral part of modern software development, especially since the rise of automation in the form of CI/CD. It is an ongoing question of how machine learning can best help improve SCA's state and thus facilitate maintainable, correct, and secure software. However, machine learning needs a solid foundation to learn on. This thesis proposes an approach to build that foundation by mining data on software issues from real-world code. We show how we used that concept to analyze over 4000 software packages and generate over two million issue samples. Additionally, we propose a method for refining this data and apply it to an existing machine learning SCA approach.