ML CONFERENCE Blog

AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence

An Interview with LAION co-founder Christoph Schuhmann

Jun 21, 2023

In early March of this year, we had the pleasure of talking with Christoph Schuhmann, co-founder of the open-source AI organization LAION. We spoke with him about the organization's founding, the datasets and models it has produced, and the future of open-source AI development.

**devmio: Hello, Christoph! Could you tell us what LAION is and what role you play there?**

 

**Christoph Schuhmann:** LAION stands for Large-Scale Artificial Intelligence Open Network. First and foremost, it’s simply a huge community of people who share the dream of open-source AI models, research, and datasets. That’s what connects us all. We have a [Discord server](https://discord.com/invite/xBPBXfcFHd) where anyone can come in and share a bit about the latest research in the field. You can also propose a new project and find people to work on it with you. And if you ask the mods, me, or other people, you might even get a channel for your project. That’s basically the core.

 

When we had such surprising success with our first dataset called [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/), we set up a small non-profit association that doesn’t actually do anything. We have a bank account with a bit of money coming into it from a few companies that support us. That’s primarily Hugging Face, but also StabilityAI, although we’re mostly supported not by money but by cloud compute.

 

StabilityAI, for example, has a huge cluster with 4000 or now 5600 GPUs, and there we or our members who are approved by the core team can use preemptable GPUs, for example, what is not being used at the moment and is idle.

 

**devmio: So we can just come to you and contribute? Propose our ideas and ask for help with our projects or help with ongoing projects?**

 

**Christoph Schuhmann:** Exactly! You can now come to our Discord server and say that you want to contribute to a project or help us with PR or whatever. You are most welcome!

 

**devmio: Is LAION based in Germany? And you are the chairman and co-founder?**

 

**Christoph Schuhmann:** Exactly. I am a physics and computer science teacher, I have been regularly involved with machine learning, and I also have a background in reform-oriented education. I made a Kickstarter documentary seven or eight years ago about schools where you can learn without grades and curriculum. After that took off, I did tutorials on how to start such an independent school. So I knew how to set up a grassroots non-profit organization. I am not paid for my work at LAION.

 

## The Beginnings of LAION

 

**devmio: How did LAION come to life? How did you get to know the other members?**

 

**Christoph Schuhmann:** I actually started LAION after reading a lot about deep learning and machine learning and doing online courses in my spare time over the last five to six years. When the first version of DALL-E was published at the beginning of 2021, I was totally shocked by how good it was. At that time, however, many non-computer scientists didn’t find it that impressive.

 

I then asked on a few Discord servers about machine learning and what we would need to replicate something similar and make it open-source. There was a well-known open-source programmer at the time called Philip Wang (his alias on GitHub is lucidrains) who is a legend in the community because whenever a new paper comes out he has the associated codebase implemented within a few days. He also built an implementation of the first version of DALL-E in Pytorch called [DALLE-pytorch](https://github.com/lucidrains/DALLE-pytorch). This model was then trained by a few people using small data sets on Discord, and that was proof of concept.

 

But the data was missing, and I suggested going to [Common Crawl](https://commoncrawl.org/), a non-profit from Seattle that scraps HTML code from the internet every two to three months and makes it available. A snapshot, so to speak, of the HTML code of all possible websites, which is 250 terabytes zip file. I then suggested downloading a gigabyte as a test and wrote a script that extracts image tags together with alt tags and then uses the CLIP model to see how well they fit together.

 

Then two “machine learning nerds”, who were much better at it than I was at the time, implemented it efficiently but didn’t finish it. That was a shame, but they were developing the GPT open-source variant [GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj) and therefore didn’t have the time.

 

Then in the spring of 2021, I sat down and just wrote down a huge spaghetti code in a Google Colab and then asked around on Discord who wanted to help me with it. Someone got in touch, who later turned out to be only 15 at the time. And he wrote a tracker, basically a server that manages lots of colabs, each of which gets a small job, extracts a gigabyte, and then uploads the results. At that time, the first version was still using Google Drive.

 

## The Road to the LAION-400M Dataset

 

It was a complete disaster because Google Drive wasn’t suitable for it, but it was the easiest thing we could do quickly. Then I looked for some people on a Discord server, made some more accounts, and then we ended up with 50 Google Colabs working all the time.

 

But it worked, and then, within a few weeks, we had filtered 3 million image-text pairs, which at the time was more than Google’s [Conceptual Captions](https://ai.google.com/research/ConceptualCaptions/), a very well-known dataset of 2019. That little success got us so much attention on the Discord server that people just started supporting us and writing things like, “I have 50 little virtual machines here from my work, you could use them, I don’t need them right now,” or “I have another 3090 lying around here with me, I can share it with you.”

 

After three months, we had 413 million filtered image-text pairs. That was our LAION-400M dataset. At the time, it was by far the largest image-text dataset freely available, over 30 times larger than [Google’s Conceptual Caption 12M](https://github.com/google-research-datasets/conceptual-12m), with about 12 million pairs.

 

We then did a [blog post about our dataset](https://laion.ai/blog/laion-400-open-dataset/), and after less than an hour, I already had an email from the Hugging Face people wanting to support us. I had then posted on the Discord server that if we had $5,000, we could probably create a billion image-text pairs. Shortly after, someone already agreed to pay that: “If it’s so little, I’ll pay it.” At some point, it turned out that the person had his own startup in text-to-image generation, and later he became the chief engineer of Midjourney.

 

As you can see, it was simply a huge community, just 100 people who only knew each other from chat groups with aliases. At some point, I made the suggestion to create an association, with a banking account, etc. That’s how LAION was founded.

 

## Even Bigger: LAION-5B and LAION-Aesthetics

 

We then also got some financial support from Hugging Face and started working on LAION-5B, which is a dataset containing five billion image-text pairs. By the end of 2021, we were done with just under 70 percent of it, and then we were approached by someone who wanted to create a start-up that was like OpenAI but really open-source. He offered to support us with GPUs from AWS. This was someone who introduced himself as a former investment banker or hedge fund manager, which I didn’t quite believe at first. In the end, it was just some guy from Discord. But then the access data for the first pods came, and it turned out that the guy was Emad Mostaque, the founder of StabilityAI.

 

**devmio: What is the relationship between LAION and Stability AI?**

 

**Christoph Schuhmann:** Contrary to what some AI-art critics claim, we are not a satellite organisation of Stability AI. On the contrary, Stability AI came to us after the LAION-5B dataset was almost finished and wanted to support us unconditionally. They then did the same with LAION-Aesthetics.

 

**devmio: Could you explain what LAION-Aesthetics is?**

 

**Christoph Schuhmann:** I trained a model that uses the CLIP embeddings of the LAION images to estimate how pretty the images are on a scale of one to ten. It’s a very small model, a multilayer perceptron running on a CPU. At some point, I ran the model over a couple of 100,000 images, sorted them, and thought that the ones with the high scores looked really good. The next step was to run it on 2.3 billion CLIP embeddings.

 

## From LAION-Aesthetics to Stable Diffusion

 

**devmio: How did LAION-Aesthetics help with the development of Stable Diffusion?**

 

**Christoph Schuhmann:** I had already heard about Robin Rombach, who was still a student in Heidelberg at the time and had helped develop latent diffusion models at the CompVis Group. Emad Mostaque, the founder of StabilityAI, told me in May 2022 that he would like to support Robin Rombach with compute time, and that’s how I got in touch with Robin.

 

I then sent him the LAION-Aesthetics dataset. The dataset can be thought of as a huge Excel spreadsheet containing links to images and the associated alt text. In addition, each image is given a score, such as whether something contains a watermark or smut. Robin and his team later trained the first prototype of Stable Diffusion on this. However, the model only got the name Stable Diffusion through Stability AI, to whom the model then migrated.

 

LAION also got access to the Stability AI cluster. But we were also lucky enough to be able to use JUWELS, one of the largest European supercomputers, because one of our founding members, Jenia Jitsev, is the lab director at the Jülich Supercomputer Center for Deep Learning. We then applied for compute time to train our own OpenCLIP models. And now we have the largest CLIP models available in open source.

 

## LAION’s OpenCLIP

 

**devmio: What exactly do CLIP models do? And what makes LAION’s OpenCLIP so special?**

 

**Christoph Schuhmann:** On the Stability AI cluster, a Ph.D. student from UC Washington has trained a model called CLIP-ViT-G. This model can tell you how well an image matches a text, and this model has managed to crack the 80 percent zero-shot mark. This means that we have now built a general-purpose AI model that is better than the best state-of-the-art models from five years ago that were built and trained specifically for this purpose.

 

These CLIP models are in turn used as text encoders, as “text building blocks” by Stable Diffusion and by many other models. CLIP models have an incredible number of applications. For example, they can be used for zero-shot image segmentation, zero-shot object detection with bounding boxes, zero-shot classification, or even for text-to-image generation.

 

We have trained and further developed these models. We now have a variant that not only trains these CLIP models but also generates captions through a text decoder. This model is called [CoCa](https://laion.ai/blog/coca/) and is quite close to the state of the art.

 

We have many such projects running at the same time, sometimes so many that I almost lose track of them. Currently, we cooperate with Mila, an institute of excellence from Montreal, and together we have access to the second largest supercomputer in the US, Summit. We have been given 6 million GPU hours there and are training all kinds of models.

 

**devmio: You have already talked a lot about Stable Diffusion, and Robin Rombach, the inventor, is a member of your team. Is Stable Diffusion managed by you, is that “your” model?**

 

**Christoph Schuhmann:** No, we don’t have anything to do with that for now. But we have made the development and training of Stable Diffusion easier with LAION-Aesthetics and LAION-5B.

 

## Open Source as a Superpower

 

**devmio: LAION is committed to making the latest developments in AI freely available. Why is open source so important in AI?**

 

**Christoph Schuhmann:** Let’s take the sentence: “AI should be open source so that it is available to the general public.” Now let’s take that sentence and replace “AI” with “superpowers”: “Superpowers should be open source and available to the public.” In this case, it becomes much more obvious what I’m actually getting at.

 

Imagine if there was such a thing as superpowers, and only OpenAI, Microsoft, Google, maybe the Chinese and American governments, and five other companies, have control over it and can decide what to do with it. Now, you could say that governments only ever want what’s best for their citizens. That’s debatable, of course, but let’s assume that’s the case. But does that also apply to Microsoft? Do they also have our best interests at heart, or does Microsoft simply want to sell its products?

 

If you have a very dark view of the world, you might say that there are a lot of bad people out there, and if everyone had superpowers now, there would certainly be 10, 20, or 30 percent of all people who would do really bad things. That’s why we have to control such things, for example through the state. But if you have a rather positive and optimistic view of the world, like me, for example, then you could say that most people are relatively nice. No angels, no do-gooders, but most people don’t want to actively do something bad, or destroy something, but simply live their lives. There are some people who are do-gooders and also people who have something bad in mind. But the latter are probably clearly in the minority.

 

If we assume that everyone has superpowers, then everyone would also have the opportunity to take action against destructive behaviour and limit its effects. In such a world, there would be a lot of positive things. Things like superpower art, superpower music, superpower computer games, and superpower productivity of companies that simply produce goods for the public. If you now ask yourself what kind of world you would like to live in and assume that you have a rather positive worldview, then you will probably decide that it would be good to make superpowers available to the general public as open source. And once you understand that, it’s very easy to understand that AI should also be open source.

 

AI is not the same as superpowers, of course, but in a world in which the internet plays an ever greater role, in which every child grows up with YouTube, in which AI is getting better and better, in which more and more autonomous systems are finding their way into our everyday lives, AI is incredibly important. Software and computerised things are sort of superpowers. And that’s going to get much more blatant, especially with ChatGPT. In three to four years, ChatGPT will be much better than it is today.

 

Now imagine if the whole world used technologies like ChatGPT and only OpenAI and Microsoft, Google and maybe two or three other big companies controlled those technologies. They can cut you off at any time, or tell you “Sorry, but I can’t do this task, it’s unethical in my opinion”, “I have to block you for an hour now”, or “Sorry, your request might be in competition with a Microsoft product, now I have to block you forever. Bye.”

 

**devmio: We had also spoken to other experts, for example, Pieter Buteneers and Christoph Henkelkmann, who had similar concerns. But the question remains whether everyone should really have unrestricted access to such technologies, right?**

 

**Christoph Schuhmann:** A lot of criticism, not directed at LAION but at Stable Diffusion, goes in this direction. There is criticism that there are open-source models like Stable Diffusion that can be used to create negative content, circumvent copyright and create fakes, etc. Of course, it’s wrong to violate copyright, and it’s also wrong to create negative content and fakes. But imagine if these technologies were only in the hands of Microsoft, Google, and a few more large research labs. They would develop really well in the background, and at some point, you would be able to generate everything perfectly with them. And then they leak out or there is a replica, and society is not prepared at all. Small and medium-sized university labs wouldn’t be prepared at all to look at the source code and discover the problems.

 

We have something similar with LAION-5B. There are also some questionable images in the dataset that we were unable to filter. As a result, there is also a disclaimer that it is a research dataset that should be thoroughly filtered and examined before being used in production. You have to handle this set carefully and responsibly. But this also means that you can find things in the set that you would like to remove from the internet.

 

For example, there is an organisation of artists, [Have I Been Trained](https://haveibeentrained.com/), that provides a tool that artists can use to determine if their artwork is included in LAION-5B. This organisation has simply taken our open-source code and used it for their own purposes to organise the disappointed artists.

 

And that’s a great thing because now all those artists who have images on the internet that they don’t want there can find them and have them removed. And not only artists! For example, if I have a picture of myself on the internet that I don’t want there, I can find out through LAION-5B where it is being used. We don’t have the images stored in LAION-5B, we just have a table with the links, it’s just an index. But through that, you can find out which URL is linked to the image and then contact the owners of the site and have the image removed. By doing this, LAION generates transparency and gives security researchers an early opportunity to work with these technologies and figure out how to make them more secure. And that’s important because this technology is coming one way or another.

 

In probably a lot less than five years, you’re going to be able to generate pretty much anything in terms of images that you can describe in words, photo-realistically, so that a human being with the naked eye can’t tell whether it’s a photo or not.

 

## AI in Law, Politics, and Society

 

**devmio: Because you also mentioned copyright: The legal situation in Germany regarding AI, copyright, and other issues is probably not entirely clear. Are there sufficient mechanisms? Do you think that the new EU regulations that are coming will be sufficient while not hindering creativity and research?**

 

**Christoph Schuhmann:** I am not a lawyer, but we have good lawyers advising us. There is a Data Mining Law, an EU-wide exception to copyright. It allows non-profit institutions, such as universities, but also associations like ours, whose focus is on research and who make their results publicly available, to download and analyse things that are openly available on the internet.

 

We are allowed to temporarily store the links, texts, whatever, and when we no longer need them for research, we have to delete them. This law explicitly allows data mining for research, and that is very good. I don’t think all the details of what’s going to happen in the future, especially with ChatGPT and other generative AIs for text and images, were anticipated in these laws. The people who made the law probably had more statistical analysis of the internet in mind and less training data for AIs.

 

I would like to see more clarity from legislators in the future. But I think that the current legal situation in Germany is very good, at least for non-profit organisations like LAION. I’m a bit worried that when the [EU AI Act](https://digital-strategy.ec.europa.eu/de/policies/european-approach-artificial-intelligence), which is being drafted, comes, something like general purpose AI, like ChatGPT, would be classified as high risk. If that were to be the case, it would mean that if you as an organisation operate or train a ChatGPT-like service, you would have to constantly account for everything meticulously and tick off a great many compliance rules, catalogues, and checklists.

 

Even if this is certainly well-intentioned, it would also extremely restrict research and development, especially of open source, associations, and of grassroots movements, so that only Big Tech Corporate would be able to comply with all the rules. Whether this will happen is unclear so far. I don’t want high-risk applications like facial recognition to go unregulated either. And I don’t want to be monitored all day.

 

But if any lawmakers are reading this: Politicians should keep in mind that it is very important to continue to enable open-source AI. It would be very good if we could continue to practice as we have been doing. Not only for LAION but for Europe. I am sure that quite a lot of companies and private people, maybe even state institutions can benefit from such models as CLIP or from the datasets that we are making.

 

And I believe that this can generate a lot of value for citizens and companies in the EU. So I would even go so far as to call for politicians and donors to maybe think about building something similar to a CERN for AI. With a billion euros, you could probably build a great open-source supercomputer that all companies and universities, in fact, anyone, could use to do AI research under two conditions: First, the whole thing has to be reviewed by some smart people, maybe experts and people from the open-source community. Second, all results, research papers, checkpoints of models, and datasets must be released under a fully open-source licence.

 

Because then a lot of companies that can’t afford a supercomputer at the moment could open source their research there and only keep the fine-tuning or anything that is really sensitive to the business model on the companies’ own computers. But all the other stuff happens openly. That would be great for a lot of companies, that would be great for a lot of medium and small universities, and that would also be great for groups like LAION.

 

_**Editor’s note**: After the interview, LAION started a petition for a CERN-like project. Read more on [LAION’s blog](https://laion.ai/blog/petition/)._

 

## AI for a Better World

 

**Christoph Schuhmann:** Another application for AI would be a project close to my heart: Imagine there is an open-source ChatGPT. You would then take, say, 100 teachers and have them answer questions from students about all sorts of subjects. For these questions, you could make really nice step-by-step explanations that really make sense. And then, you would collect data from the 100 teachers for the school material up to the tenth grade. That’s at least similar everywhere in the Western world, except, of course, history, politics, etc. But suppose you were to simply break down the subject matter from 100 countries, from 100 teachers, from the largest Western countries, and use that to fine-tune a ChatGPT model.

 

You need a model that has maybe 20 to 30 billion parameters, and you could use it to give access to first-class education to billions of children in the Third World who don’t have schools but have an old mobile phone and internet access. You don’t need high-tech future technology, you can do that with today’s technology. And these are big problems of the world that could be addressed with it.

 

Or another application: My mum is 83 years old, she can’t handle a computer and is often lonely. Imagine if she had a Siri that she could have a sensible conversation with. Not as a substitute for human relationships, but as a supplement. How many lonely old people do you think would be happier if they could just ask what’s going on in the world. Or “Remember when I told you that story, Siri? Back in my second marriage 30 years ago?” That would make a lot of people happier. And I think things like that can have a lot of effect with relatively little financial outlay.

 

**devmio: And what do you see next in AI development?**

 

**Christoph Schuhmann:** What I just talked about could happen in the next five years. Everything that happens after that,  I can’t really predict. It’s going to be insane.

 

**devmio: Thank you very much for taking the time to talk to us!**

Behind the Tracks