Generative Adversarial Networks (GANs) have recently sparked an increasing amount of interest, as they can generate images of faces that look convincingly real. What else are they capable of, what risks could they pose in the long run, and what do they have in common with the emerging internet in the 1990’s? We interviewed ML Conference speaker Xander Steenbrugge.

Editorial Team: In the abstract for your ML Conference session “Generative Media – an Emerging Industry”, you wrote that one of the most beautiful ideas in the Deep Learning Revolution of the past decade was the invention of Generative Adversarial Networks (GANs). So, would you first explain what GANs are essentially?

A new industry of generative media will emerge over the next decade.

Xander Steenbrugge: GANs, or Generative Adversarial Networks, are a totally new approach to generative models, invented by Ian Goodfellow in 2014. In contrast to, say, classification models which classify images into categories, generative models can generate completely novel images – or any kind of data for that matter – by first learning what that data usually looks like from a dataset. This entire process usually happens completely unsupervised, i.e. without needing any labels.

Editorial Team: Could you explain how a GAN works in general?

Xander Steenbrugge: The central idea is that a GAN has two neural networks that are adversaries of each other. On the one hand, the Generator tries to create an image that looks as real as possible. The second model, the Discriminator, gets to see an image that could come from two sources: it’s either a real image from an actual dataset, or a fake image coming from the Generator. It then has to learn to see the difference and this learning signal is used to improve both the Generator and the Discriminator. If one can keep these two models balanced, the end result is a Generator that can generate images which look very similar to the actual dataset that was used during training.

Generated images – and how we can spot them

Editorial Team: Which different types of media can GANs deal with, and are they better suited for a certain type?

Xander Steenbrugge: The most impressive results from using GANs have been demonstrated in the image domain. The reason is that convolutional networks are simply very, very good. However, the core idea behind GANs can, in principle, be applied to many different data types such as audio or text.

Editorial Team: Some telltale signs show that an image of a person’s face is artificially generated, e.g. artifacts or mismatched earrings. As GANs continue to improve, do you expect these signs that are visible to the human eye will vanish? And if so, will there be other methods to ascertain whether an image has been created by a GAN?

In the long run, everybody will learn to understand that any type of media can now be “faked”.

Xander Steenbrugge: Looking at the quality improvements in generated images over the past five years, I am very certain that very soon we’ll be able to generate images indistinguishable from real ones in specific narrow domains such as faces or cars. Approximating the entire natural manifold of possible images, however, might prove to be a much more challenging task, as GAN results on a full ImageNet dataset look a lot worse than when trained on just faces.

At the same time, I’m very confident that detecting generated images will be doable with very similar types of models. You could, for example, take the discriminator and use it as a “fake detection” filter. The bigger challenge, though, will be to educate the general public that seeing a video of something no longer means it actually happened.

Future implications of GANs

Editorial Team: In what respect could GANs have a negative impact in the long run?

Xander Steenbrugge: A new industry of generative media will emerge over the next decade. I can foresee applications in the movie industry (licensing an actor’s face), Virtual Reality (avatars that look like their users), design, art, etc.

I believe that, in the long run, everybody will learn to understand that any type of media can now be “faked”. When you get a letter today saying that America is going to nuke China tomorrow, signed by Barack Obama, you won’t believe that’s true because you know anybody could have written that letter. A couple of years from now, everybody will have that same intuition for an HD video of Obama saying the same thing. The problem is that many people currently don’t have that intuition yet, and therein lies the biggest risk.

Editorial Team: What do you believe the positive effects of GANs will be?

The true leap will come when we can query a generative model for a very specific output.

Xander Steenbrugge: This is very hard to predict because it’s such a broad concept. It’s like asking “what will the positive effects of the internet be?” in the late 1990’s. I believe that generative models have a very big future. In essence, these models can learn what the world is like by looking at data and then create new “realities” that never existed. Currently, most GANs only allow for creating random data samples. The true leap will come when we can query a generative model for a very specific output like “Generate an image of what my living room would look like if I bought this IKEA sofa and painted the east wall in this shade of orange.” These are called conditional samples and we are making fast progress towards this as well. In the end, I believe that generative models will become embedded in all our wearables, TV screens, smartphones, and more, and will give us a personalized lens by which to look at the world around us. Is that good or bad? I don’t think that’s the right question, as in my view technology itself is neutral. What you do with it is everyone’s personal choice.

Editorial Team: Thank you for the interview!

Questions by Maika Möbus