Machine learning has become an integral part of our daily life – whether it be an essential component of all social media services or a simple helper for personal optimization. For some time now, also most areas of science have been influenced by machine learning in one way or another, as it opens up possibilities to derive findings and discoveries primarily from data.
Probably the most common objective has always been the prediction accuracy of the models. However, with the emergence of complex models such as deep neural networks, another aspired goal for scientific applications has emerged: explainability. This means that machine learning models should be designed in such a way that they not only provide accurate estimates, but also allow an understanding of why specific decisions are made and why the model operates in a certain way. With these demands to move away from non-transparent black-box models, new fields of research have emerged such as explainable artificial intelligence (XAI) or theory-guided/informed machine learning.
From transparency to explainability
Explainability is not a discrete state that either exists or does not exist, but rather a property that promotes that results become more trustworthy, models can be improved in a more targeted way, or scientific insights can be gained that did not exist before. Key elements of explainability are transparency and interpretability. Transparency is comparatively easy to achieve by describing and motivating the machine learning process by the creator. Even deep neural networks, often referred to as complete black boxes, are at least transparent in the sense that the relation between input and output can be written down in mathematical terms. The fact that the model would not be accessible is therefore usually not the problem, but rather that the models are often too complex to fully understand how they work and how decisions are made. That is exactly where interpretability comes into play. Interpretability is achieved by transferring abstract and complex processes into a domain that can be understood by a human.
A visualization tool often used in the sciences are heatmaps. They highlight parts of the input data which are salient, important, or sensitive to occlusions, depending on which method is used. Heatmaps are displayed in the same space as the input, so when analyzing images heatmaps are created that are images of the same size. They can also be applied to other data as long as they are in human-understandable domain. One of the most prominent methods is layer-wise relevance propagation for neural networks, which is applied after the model was learned. It uses the learned weights and activations that result when applied to a given input and propagates the output back into the input space. Another principle is pursued by model-agnostic approaches such as LIME (local interpretable model-agnostic explanations), which can be used with all kinds of methods – even non-transparent ones. The idea behind these approaches is to change the inputs and analyze how the output changes in response.
Nevertheless, domain knowledge is essential to achieve explainability for an intended application. Despite that processes in a model could be explained from a purely mathematical point of view, an additional integration of knowledge from the respective application is indispensable, not least to assess the meaningfulness of the results.
Explainable machine learning in the natural sciences
The possibilities this opens up in the natural sciences are wide-ranging. In the biosciences, as an example, the identification of whales from photographs plays an important role to analyze their migration over time and space. Identification by an expert is accurate and based on specific features such as scars and shape. Machine learning methods can automate this process and are therefore in great demand, which is why this task is approached as a Kaggle Challenge. Before such a tool is actually used in practice, the quality of such a model can be assessed by analyzing the derived heatmaps (Fig. 1).
This way it can be checked whether the model also looks at relevant features in the image rather than insignificant ones like water. In this way, the so called clever-Hans-effect can be excluded, which is defined by making right decisions for wrong reasons. This could occur, for example, if by chance a whale was always photographed with a mountain in the background and the identification algorithm falsely assumed this to be a feature of the whale. Therefore, human-understandable interpretations and their explanation by an expert are essential for scientific applications, as they allow us to draw conclusions about whether the models operate as expected.
Much more far-reaching, however, is the application of explainable machine learning when we provoke that the methods do not deliver what we expect but rather gives us new scientific insights. A prominent approach is presented, for example, by Iten et al., in which physical principles are to be derived automatically from observational data without any prior knowledge. The idea behind this is that the learned representation of the neural network is much simpler than the input data, and explanatory factors of the system such as physical parameters are captured in a few interpretable elements such as neurons.
In combination with expert knowledge, techniques such as neural networks can thus recognize patterns that help us to encounter things that were previously unknown to us.