The countdown to the Machine Learning conference in Berlin keeps ticking. We spoke with ML conference speaker and ML6 head of Applied Research Xander Steenbrugge about the “black box problem” in neural networks. Catch more of AI expert Xander Steenbrugge during his keynote talk, session, and workshop.

JAXenter: You talk about neural networks in your keynote. Can you give us a very concrete example of a neural network first?

Xander Steenbrugge: A neural network is a chain of trainable numerical transformations applied to some input data yielding some output data. With this very general paradigm, we can now build anything from image classifiers, speech-to-text engines and programs that beat the best humans at Chess or Go.

JAXenter: What’s behind the black box problem?

Xander Steenbrugge: One of the major problems with current Deep Learning techniques is that trained models are very uninterpretable because they consist of millions of different parameters that are all interacting in very complicated ways in order to achieve the task they where trained for, you can’t just look at them and say “Aha so this is what it’s doing…” This makes it tricky to apply them in domains where safety and operational predictability are crucial. Across many application areas, we are left with a choice of using a 90% accurate model we understand, or a 99% accurate model we don’t. But if that model is in charge of diagnosing you and suggesting a medical treatment, what would you choose?

JAXenter: There are various ways to fool neural networks to make obvious mistakes called ‘adversarial attacks‘. What is the significance of adversarial attacks for ML applications and how will this point develop?

Xander Steenbrugge: Adversarial attacks are significant because they pose a severe security risk for existing ML applications. The biggest problem is that most adversarial attacks are undetectable by humans, making them a nasty “under the radar problem”. Imagine a self-driving car that fails to recognize a stop sign because someone stuck an adversarial sticker on it; no need to explain why this is a very serious issue. Adversarial examples have exposed a new weakness in the current generation of neural network models that is not present in our biological brains and many research groups are now working to fix these, very likely paving the road for exciting new discoveries and potential breakthroughs in the field of AI.

JAXenter: Why can’t black boxes be interpreted and what approaches is research taking in this context?

Xander Steenbrugge: Neural nets are uninterpretable because there are too many parameters, too many moving parts for a human to interpret. The research community is now actively working on creating new tools to bridge this gap. The first successful techniques try to generate pictures of what individual neurons in the network are looking at, giving an idea of the active components in the network. Recent works are now also trying to create trainable interfaces that map a networks decision process onto a representation that humans can interpret using techniques like attention and even natural language.

Thank you!