The role of Mathematical Optimization to enhance Interpretability in Data Science

Abstract

Data Science aims to develop models that extract knowledge from complex data and represent it to aid Data Driven Decision Making. Mathematical Optimization has played a crucial role across the three main pillars of Data Science, namely Supervised Learning, Unsupervised Learning and Information Visualization. For instance, Quadratic Programming is used in Support Vector Machines, a Supervised Learning tool. Mixed-Integer Programming is used in Clustering, an Unsupervised Learning task. Global Optimization is used in MultiDimensional Scaling, an Information Visualization tool.

Data Science models should strike a balance between accuracy and interpretability. Interpretability is desirable, for instance, in medical diagnosis; it is required by regulators for models aiding, for instance, credit scoring; and since 2018 the EU extends this requirement by imposing the so-called right-to-explanation. In this presentation, we discuss recent Mathematical Optimization models that enhance the interpretability of state-of-art supervised learning tools, such as nearest neighbors, classification trees and support vector machines, while preserving their good learning performance.