||Datasets, like gene profiles from cancer patients, can have a large number of features. In order to apply prediction techniques, a lot of computing time and memory is needed. A solution to this problem is to reduce the number of features, whereby the main challenge is to still receive a satisfactory prediction performance afterwards. There are many state-of-the-art feature selection techniques, but they all have their limitations. We use Bayesian optimization, a technique to optimize expensive black-box-functions, and apply it to the problem of feature selection. Thereby, we face the challenge to adjust the standard optimization procedure to work with a discrete-valued search space, but also to find a way to optimize the acquisition function efficiently.
Overall, we propose 10 different Bayesian optimization feature selection approaches and evaluate their performance experimentally on 28 OpenML classification datasets. We do not only compare the approaches among themselves, but also to 9 state-of-the-art feature selection approaches. Our results state that especially four of our approaches perform well and can compete to most state-of-the-art approaches in terms of prediction performance. In terms of runtime, all our approaches do not perform outstandingly good, but similar to some filter and wrapper approaches.