ChatGPT Archives - ML Conference

Building a Proof of Concept Chatbot with OpenAIs API, PHP and Pinecone

ML editorial team — Thu, 04 Jan 2024 08:50:31 +0000

[lwptoc]

The team at Three.ie, recognized that customers were having difficulty finding answers to basic questions on our website. To improve the user experience, we decided to utilize AI to create a more efficient and user-friendly experience with a chatbot. Building the chatbot posed several challenges, such as effectively managing the expanding context of each chat session and maintaining high-quality data. This article details our journey from concept to implementation and how we overcome these challenges. Anyone interested in AI, data management, and customer experience improvements should find valuable insights in this article.

While the chatbot project is still in progress, this article outlines the steps taken and key takeaways from the journey thus far. Stay tuned for subsequent installments and the project’s resolution.

Stay up to date

Learn more about MLCON

Identifying the Problem

Hi there, I’m a Senior PHP Developer at Three.ie, a company in the Telecom Industry. Today, I’d like to address the problem of our customers’ challenge with locating answers to basic questions on our website. Information like understanding bill details, how to top up, and more relevant information is available but isn’t easy to find, because it’s tucked away within our forums.

![community-page.png](community-page.png) {.caption}

Community Page {.caption}

The AI Solution

The rise of AI chatbots and the impressive capabilities of GPT-3 presented us with an opportunity to tackle this issue head-on. The idea was simple, why not leverage AI to create a more user-friendly way for customers to find the information they need? Our tool of choice for this task was OpenAI’s API, which we planned to integrate into a chat interface.

To make this chatbot truly useful, it needed access to the right data and that’s where Pinecone came in. Using this vector database, we were able to generate embeddings from the OpenAI API, creating an efficient search system for our chatbot.

This laid groundwork for our proof of concept: a simple yet effective solution for a problem faced by many businesses. Let’s dive deeper into how we brought this concept to life.

![chat-poc.png](chat-poc.png) {.figure}

First POC {.caption}

Challenges and AI’s Role

With our proof of concept in place, the next step was to ensure the chatbot was interacting with the right data and providing the most accurate search results possible. While Pinecone served as an excellent solution for storing data and enabling efficient search during the early stages. In the long term, we realized it might not be the most cost-effective choice for a full-fledged product.

While Pinecone is an excellent solution easy to integrate and straightforward to use. The free tier only allows you to have a single pod with a single project. We would need to create small indexes but separated into multiple products. The starting plan costs around $70/month/pod. Aiming to keep the project within budget was a priority, and we knew that continuing with Pinecone would soon become difficult, since we wanted to split our data.

The initial data used in the chatbot was extracted directly from our website and stored in separate files. This setup allowed us to create embeddings and feed them to our chatbot. To streamline this process, we developed a ‘data import’ script. The script works by taking a file, adding it to the database, creating an embedding using the content, and finally it stores the embedding in Pinecone, using the database ID as a reference.

Unfortunately, we faced a hurdle with the structure and quality of our data. Some of the extracted data was not well-structured, which led to issues with the chatbot’s responses. To address this challenge, we once again turned to AI, this time to enhance our data quality. Employing the GPT-3.5 model, we optimized the content of each file before generating the vector. By doing so, we were able to harness the power of AI not only for answering customer queries but also for improving the quality of our data.

As the process grew more complex, the need for more efficient automation became evident. To reduce the time taken by the data import script, we incorporated queues and utilized parallel processing. This allowed us to manage the increasingly complex data import process more effectively and keep the system efficient.

![data-ingress-flow.png](data-ingress-flow.png) {.figure}

Data Ingress Flow {.caption}

Data Integration

With our data stored and the API ready to handle chats, the next step was to bring everything together. The initial plan was to use Pinecone to retrieve the top three results matching the customer’s query. For instance, if a user inquired, “How can I top up by text message?”, we would generate an embedding for this question and then use Pinecone to fetch the three most relevant records. These matches were determined based on cosine similarity, ensuring the retrieved information was highly pertinent to the user’s query.

Cosine similarity is a key part of our search algorithm. Think of it like this: imagine each question and answer is a point in space. Cosine similarity measures how close these points are to each other. For example, if a user asks, “How do I top up my account?”, and we have a database entry that says, “Top up your account by going to Settings”, these two are closely related and would have a high cosine similarity score, close to 1. On the other hand, if the database entry says something about “changing profile picture”, the score would be low, closer to 0, indicating they’re not related.

This way, we can quickly find the best matches to a customer’s query, making the chatbot’s answers more relevant and useful.

For those who understand a bit of math, this is how cosine similarity works. You represent each sentence as a vector in multi-dimensional space. The cosine similarity is calculated as the dot product of two vectors divided by the product of their magnitudes. Mathematically, it looks like this:

![cosine-formula.png](cosine-formula.png) {.figure}

Cosine Similarity {.caption}

This formula gives us a value between -1 and 1. A value close to 1 means the sentences are very similar, and a value close to -1 means they are dissimilar. Zero means they are not related.

![simplified-workflow.png](simplified-workflow.png) {.figure}

Simplified Workflow {.caption}

Next, we used these top three records as a context in the OpenAI chat API. We merged everything together: the chat history, Three’s base prompt instructions, the current question, and the top three contexts.

![vector-comparison-logic.png](vector-comparison-logic.png) {.figure}

Vector Comparison Logic {.caption}

Initially, this approach was fantastic and provided accurate and informative answers. However, there was a looming issue, as we were using OpenAI’s first 4k model, and the entire context was sent for every request. Furthermore, the context was treated as “history” for the following message, meaning that each new message added the boilerplate text plus three more contexts. As you can imagine, this led to rapid growth of the context.

To manage this complexity, we decided to keep track of the context. We started storing each message from the user (along with the chatbot’s responses) and the selected contexts. As a result, each chat session now had two separate artifacts: messages and contexts. This ensured that if a user’s next message related to the same context, it wouldn’t be duplicated and we could keep track of what had been used before.

Progress so Far

To put it simply, our system starts with a manual input of questions and answers (Q&A) which is then enhanced by our AI. To ensure efficient data handling we use queues to store data quickly. In the chat, when a user asks a question, we add a “context group” that includes all the data we got from Pinecone. To maintain system organization and efficiency, older messages are removed from longer chats.

![chat-workflow.png](chat-workflow.png) {.figure}

Chat Workflow {.caption}

![chat-workflow.png](chat-workflow.png) {.figure}

Chat Workflow {.caption}

Automating Data Collection

Acknowledging the manual input as a bottleneck, we set out to streamline the process through automation. I started by trying out scrappers using different languages like PHP and Python. However, to be honest, none of them were good enough and we faced issues with both speed and accuracy. While this component of the system is still in its formative stages, we’re committed to overcoming this challenge. We are currently evaluating the possibility of utilizing an external service to manage this task, aiming to streamline and simplify the overall process.

While working towards data automation, I dedicated my efforts to improving our existing system. I developed a backend admin page, replacing the manual data input process with a streamlined interface. This admin panel provides additional control over the chatbot, enabling adjustments to parameters like the ‘temperature’ setting and initial prompt, further optimizing the customer experience. So, although we have challenges ahead, we’re making improvements every step of the way.

RETHINK YOUR APPROACHES

Business & Strategy

Learn more

A Week of Intense Progress

The week was a whirlwind of AI-fueled excitement, and we eagerly jumped in. After sending an email to my department, the feedback came flooding in. Our team was truly collaborative: a skilled designer supplied Figma templates and a copywriter crafted the app’s text. We even had volunteers who stress-tested our tool with unconventional prompts. It felt like everything was coming together quickly.

However, this initial enthusiasm came to a screeching halt due to security concerns becoming the new focus. A recent data breach at OpenAI, unrelated to our project, shifted our priorities. Though frustrating, it necessitated a comprehensive security check of all projects, causing a temporary halt to our progress.

The breach occurred during a specific nine-hour window on March 20, between 1 a.m. and 10 a.m. Pacific Time. OpenAI confirmed that around 1.2% of active ChatGPT Plus subscribers had their data compromised during this period. They were using the Redis client library (redis-py), which allowed them to maintain a pool of connections between their Python server and Redis. This meant they didn’t need to query the main database for every request, but it became a point of vulnerability.

In the end, it’s good to put security at the forefront and not treat it as an afterthought, especially in the wake of a data breach. While the delay is frustrating, we all agree that making sure our project is secure is worth the wait. Now, our primary focus is to meet all security guidelines before progressing further.

The Move to Microsoft Azure

In just one week, the board made a big decision to move from OpenAI and Pinecone to Microsoft’s Azure. At first glance, it looks like a smart choice as Azure is known for solid security but the plug-and-play aspect can be difficult.

What stood out in Azure was having our own dedicated GPT-3.5 Turbo model. Unlike OpenAI, where the general GPT-3.5 model is shared, Azure gives you a model exclusive to your company. You can train it, fine-tune it, all in a very secure environment, a big plus for us.

The hard part? Setting up the data storage was not an easy feat. Everything in Azure is different from what we were used to. So, we are now investing time to understand these new services, a learning curve we’re currently climbing.

Azure Cognitive Search

In our move to Microsoft Azure, security was a key focus. We looked into using Azure Cognitive Search for our data management. Azure offers advanced security features like end-to-end encryption and multi-factor authentication. This aligns well with our company’s heightened focus on safeguarding customer data.

The idea was simple: you upload your data into Azure, create an index, and then you can search it just like a database. You define what’s called “fields” for indexing and then Azure Cognitive Search organizes it for quick searching. But the truth is, setting it up wasn’t easy because creating the indexes was more complex than we thought. So, we didn’t end up using it in our project. It’s a powerful tool, but difficult to implement. This was the idea:

![azure-structure.png](azure-structure.png) {.figure}

Azure Structure {.caption}

The Long Road of Discovery

So, what did we really learn from this whole experience? First, improving the customer journey isn’t a walk in the park; it’s a full-on challenge. AI brings a lot of potential to the table, but it’s not a magic fix. We’re still deep in the process of getting this application ready for the public, and it’s still a work in progress.

One of the most crucial points for me has been the importance of clear objectives. Knowing exactly what you aim to achieve can steer the project in the right direction from the start. Don’t wait around — get a proof of concept (POC) out as fast as you can. Test the raw idea before diving into complexities.

Also, don’t try to solve issues that haven’t cropped up yet, this is something we learned the hard way. Transitioning to Azure seemed like a move towards a more robust infrastructure. But it ended up complicating things and setting us back significantly. The added layers of complexity postponed our timeline for future releases. Sometimes, ‘better’ solutions can end up being obstacles if they divert you from your main goal.

Stay up to date

Learn more about MLCON

In summary, this project has been a rollercoaster of both challenges and valuable lessons learned. We’re optimistic about the future, but caution has become our new mantra. We’ve come to understand that a straightforward approach is often the most effective, and introducing unnecessary complexities can lead to unforeseen problems. With these lessons in hand, we are in the process of recalibrating our strategies and setting our sights on the next development phase.

Although we have encountered setbacks, particularly in the area of security, these experiences have better equipped us for the journey ahead. The ultimate goal remains unchanged: to provide an exceptional experience for our customers. We are fully committed to achieving this goal, one carefully considered step at a time.

Stay tuned for further updates as we continue to make progress. This project is far from complete, and we are excited to share the next chapters of our story with you.

The post Building a Proof of Concept Chatbot with OpenAIs API, PHP and Pinecone appeared first on ML Conference.

Talk of the AI Town: The Uprising of Collaborative Agents

ML editorial team — Mon, 04 Dec 2023 08:51:12 +0000

Introduction:

Open AI’s release of ChatGPT and GPT-4 has sparked a Cambrian explosion of new products and projects, shifting the landscape of artificial intelligence significantly. These models have both quantitatively and qualitatively advanced beyond their language modeling predecessors. Similarly to how the deep learning model called AlexNet significantly improved on the ImageNet benchmark for computer vision back in 2012. More importantly, these models exhibit a capability, the ability to perform many different tasks such as machine translation or when given a few examples of the task: few-shot learning. Unlike humans, most language models require large supervised datasets before they can be expected to perform a specific task. This plasticity of “intelligence” that GPT-3 was capable of opened up new possibilities in the field of AI. It is a system capable of problem-solving which enables the implementation of many long-imagined AI applications.

Even the successor model to GPT-3, GPT-4, is still just a language model at the end of the day and still quite far from Artificial General Intelligence. In general, the ”prompt to single response“ formulation of language models is much too limited to perform complex multi-step tasks. For an AI to be generally intelligent, it must seek out information, remember, learn, and interact with the world in steps. There have recently been many projects on GitHub that have essentially created self-talking loops and prompting structures on top of OpenAI’s APIs for the GPT-3.5 and GPT-4. These are models that form a system that can plan, generate code, debug, and execute programs. These systems in theory have the potential to be much more general and approach what many people think of when they hear “AI”.

Stay up to date

Learn more about MLCON

The concept of systems that intelligently interact in their environment is not completely new, and has been heavily researched in a field of AI called reinforcement learning. The influential textbook “Artificial Intelligence: A Modern Approach” by Russell and Norvig covers many different structures for how to build intelligent “agents” – entities capable of perceiving their environment and acting to achieve specific objectives. While I don’t believe Russel and Norvig imagined that these agent structures would be mostly language model-based. They did describe how they would perform their various steps with plain English sentences and questions as they were mostly for illustrative purposes. Since we now have language models capable of functionally understanding the steps and questions they use, it is much easier to implement many of these structures as real programs today.

While I haven’t seen any projects using prompts inspired by the AI: AMA textbook for their agents, the open-source community has been leveraging GPT 3.5 and GPT-4 to develop agent or agent-like programs using similar ideas. Examples of such programs include Baby AGI, AutoGPT, and MetaGPT. While these agents are not designed to interact with a game or simulated environment like traditional RL agents, They do typically generate code, detect errors, and alter their behavior accordingly. So in a sense, they are interacting with and perceiving the “environment” of programming, and are significantly more capable than anything before.

This article aims to delve into the capabilities and limitations of OpenAI’s models, examine the functionalities of agents like Baby AGI, and discuss potential future advancements in this rapidly evolving field.

Understanding the Capabilities of GPT-3.5 and GPT-4:

GPT-3.5 and GPT-4 are important milestones not only in natural language processing but also in the field of AI. Their ability to generate contextually appropriate, coherent responses to a myriad of prompts has reshaped our expectations of what a language model can achieve. However, to fully appreciate their potential and constraints, it’s necessary to delve deeper into their implementation.

One significant challenge these models face is the problem of hallucination. Hallucination refers to instances where a language model generates outputs that seem plausible but are entirely fabricated or not grounded in the input data. Hallucination is a challenge in Chat GPT as these models are fundamentally outputting the probability distribution of the next word, and that probability distribution is sampled in a weighted random fashion. This leads to the generation of responses that are statistically likely but not necessarily accurate or truthful. The limitation of relying on maximum likelihood sampling in language models is that it prioritizes coherence over veracity, leading to creative but potentially misleading outputs. This essentially limits the ability of the model to reason and make logical deductions when the output pattern is very unlikely. While they can exhibit some degree of reasoning and common sense, they don’t yet match human-level reasoning capabilities. This is because they are limited to statistical patterns present in their training data, rather than a thorough understanding of the underlying concepts.

To quantitatively assess these models’ reasoning capabilities, researchers use a range of tasks including logical puzzles, mathematical operations, and exercises that require understanding causal relationships. [https://arxiv.org/abs/2304.03439] While OpenAI does boast about GPT-4’s ability to pass many aptitude tests including the Bar exam. The model struggles to show the same capabilities with out-of-distribution logical puzzles, which can be expected when you consider the statistical nature of the models.

To be fair to these models, the role of language in human reasoning is underappreciated by the general public. Humans also use language generation as a form of reasoning, making connections, and drawing inferences through linguistic patterns. If the brain area that is responsible for language is damaged, research has shown that reasoning is impaired: [https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01523/full]. Therefore, just because language models are mostly statistical next-word generators, we shouldn’t disregard their reasoning capabilities entirely. While it has limitations, it is something that can be taken advantage of in systems. A genuine potential of language models exists to replicate certain reasoning processes and this theory of the link between reasoning and language explains their capabilities.

While GPT-3.5 and GPT-4 have made significant strides in natural language processing, there is still work to do. Ongoing research is focused on enhancing these abilities and tackling these challenges. It is important for systems today to work around these limitations and take advantage of language models’ strengths as we explore their potential applications and continue to push AI’s boundaries.

THE PECULIARITIES OF ML SYSTEMS

Machine Learning Advanced Developments

Learn more

Exploring Collaborative Agent Systems: BabyAGI, HuggingFace, and MetaGPT:

BabyAGI, created by Yohei Nakajima, serves as an interesting proof-of-concept in the domain of agents. The main idea behind it consists of creating three “sub-agents”: the Task Creator, Task Prioritizer, and Task Executor. By making the sub-agents have specific roles and collaborating by way of a task management system, BabyAGI can reason better and achieve many more tasks than a single prompt alone, hence creating the ”collaborative agent system” concept. While I do not believe the collaborative agent strategy BabyAGI implements is a completely novel concept. It is one of the early successful experiments built on top of GPT-4 with code we can easily understand. In BabyAGI, the Task Creator initiates the process by setting the goal and formulating the task list. The Task Prioritizer then rearranges the tasks based on their significance in achieving the goal, and finally, the Task Executor carries out the tasks one by one. The output of each task is stored in a vector database, which can look up data by similarity, for future reference serving as a type of memory for the Task Executor.

Fig 1. A high-level description of the BabyAGI framework

HuggingFace’s Transformers Agents, is another substantial agent framework. It has gained popularity for its ability to leverage the library of pre-trained models on HuggingFace. By leveraging the StarCoder model, the Transformers Agent can string together many different models available on HuggingFace to accomplish various tasks. It can solve a range of visual, audio, and natural language processing functionalities. However, HuggingFace agents lack error recovery mechanisms, often requiring external intervention to troubleshoot issues and continue with the task.

Fig 2. Example of HuggingFace’s Transformers Agent

MetaGPT adopts a unique approach by emulating a virtual company where different agents play specific roles. Each virtual agent within MetaGPT has its own thoughts, allowing them to contribute their perspectives and expertise to the collaborative process. This approach recognizes the collective intelligence of human communities and seeks to replicate it in AI systems.

Fig. 3. The Software Company structure of MetaGPT

BabyAGI, Transformers, and MetaGPT, with their own strengths and limitations, collectively exemplify the evolution of collaborative agent systems. Although many feel that their capabilities are underwhelming, by integrating the principles of intelligent agent frameworks with advanced language models, their authors have made significant progress in creating AI systems that can collaborate, reason, and solve complex tasks.

A Deeper Dive into the Original BabyAGI:

BabyAGI presents an intuitive collaborative agent system operating within a loop, comprising three key agents: the Task Creator, Task Prioritizer, and Task Executor, each playing a unique role in the collaborative process. Let’s examine the prompts of each sub-agent.

Fig.4 Original task creator agent prompt

The process initiates with the Task Creator, responsible for defining the goal and initiating the task list. This agent in essence sets the direction for the collaborative system. It generates a list of tasks, providing a roadmap outlining the essential steps for goal attainment.

Fig 5. Original task prioritizer agent prompt

Once the tasks are established, they are passed on to the Task Prioritizer. This agent reorders tasks based on their importance for goal attainment, optimizing the system’s approach by focusing on the most critical steps. Ensuring the system maintains efficiency by directing its attention to the most consequential tasks.

Fig 6. Original task executor agent prompt

The Task Executor then takes over following task prioritization. This agent executes tasks one by one according to the prioritized order. As you may notice in the prompt, it is only just hallucinating and performing the tasks. The output of this prompt, the result of completing the task, is appended to the task object being completed and stored in a vector database.

An intriguing aspect of BabyAGI is the incorporation of a vector database, where the task object, including the Task Executor’s output, is stored. The reason this is important is that language models are static. They can’t learn from anything other than the prompt. Using a vector database to look up similar tasks allows the system to maintain a type of memory of its experiences, both problems and solutions, which helps improve the agent’s performance when confronted with similar tasks in the future.

Vector databases work by efficiently indexing the internal state of the language model. For OpenAI’s text-embedding-ada-002 model, this internal state is a vector of length 1536. It is trained to produce similar vectors for semantically similar inputs, even if they use completely different words. In the BabyAGI system, the ability to look up similar tasks and append them to the context of the prompt is used as a way for the model to have memories of its previous experiences performing similar tasks.

As mentioned above, the vanilla version of BabyAGI operates predominantly in a hallucinating mode as it lacks external interaction. Additional tools, such as functions for saving text, interacting with databases, executing Python scripts, or even searching the web, were later integrated into the system, extending BabyAGI’s capabilities.

While BabyAGI is capable of breaking down large goals into small tasks and essentially working forever on them, it still has many limitations. Unless the task creator explicitly adds a check if a task is done, the system will tend to generate an endless stream of tasks, even after achieving the initial goal. Moreover, BabyAGI executes tasks sequentially, which slows it down significantly. Future iterations of BabyAGI, such as BabyDeerAGI, have implemented features to address these limitations, exploring parallel execution capabilities for independent tasks and more tools.

In essence, BabyAGI serves as a great introduction and starting point in the realm of collaborative agent systems. Its architecture enables planning, prioritization, and execution. It lays the groundwork for many other developers to create new systems to address the limitations and expand what’s possible.

Stay up to date

Learn more about MLCON

The Rise of Role-Playing Collaborative Agent Systems:

While not every project claims BabyAGI as its inspiration, many similar multi-role agent systems exist in projects such as MetaGPT and AutoGen. These projects are bringing a new wave of innovation into this space. Much like how BabyAGI used multiple “Agents” to manage tasks, these frameworks go a step further. This is by trying to make many different agents with distinct roles that work together to accomplish the goal. In MetaGPT the agents are working together inside a virtual company, complete with a CEO, CTO, designers, testers, and programmers. People experimenting with this framework today can get this virtual company to create various types of simple utility software and simple games successfully. Though I would say they are rarely visually pleasing.

AutoGen is going about things slightly differently but in a similar vein to the framework I’ve been working on over at my company Xpress AI.

AutoGen has a user proxy agent that interacts with the user and can create tasks for one or more assistant agents. The tool is more of a library than a standalone project so you will have to create a configuration of user proxies and assistants to accomplish the tasks you may have. I think that this is the future of how we will interact with agents. We will need those many conversation threads to interact with each other to expand the capabilities of the base model.

Why Collaborative Agents Systems are more effective

A language model is intelligent enough only by necessity. To predict the next work accurately, it has had to learn how to be rudimentarily intelligent. There is only a fixed amount of computation that can happen inside the various transformer layers inside the particular model. By giving the model a different starting point, it can put more computation and therefore thinking into its original response. Giving different roles to these specific agents helps them get out of the specific rut of wanting to be self-consistent. You can imagine how we can possibly go to an even larger scale on this idea to create AI systems closer to AGI.

Even in human society, it can be argued that we currently have various Superhuman intelligences in place. The stock market, for example, can allocate resources better than any one person could ever hope to. Take the scientific community, the paper review and publishing process are also helping humanity reach new levels of intelligence.

Even these systems need time to think or process the information. LLMs unfortunately only have a fixed amount of processing power. The future AI systems will have to include ways for the agent to think for itself, similar to how they can leverage functions today, but internally to give them the ability to apply an arbitrary amount of computation to achieve a task. Roles are one way to approach this, but it would be more effective if each agent in these simulated virtual organizations were able to individually apply arbitrary amounts of computation to their responses. Also, a system where each agent could learn from their mistakes, similar to humans, is required to really escape the cognitive limitations of the underlying language model. Without these capabilities, which have been known to the AI community as fundamental capabilities for a long time, we can’t reasonably expect these systems to be the foundation of an AGI.

Addressing Limitations and Envisioning Future Prospects:

Collaborative agent systems exhibit promising potential. However, they are still far from being truly general intelligence. Learning about these limitations can give clues to possible solutions that can pave the way for more sophisticated and capable systems.

One limitation of BabyAGI in particular lies in the lack of active perception. The Executor Agent in BabyAGI nearly always assumes that the system is in the perfect state to accomplish the task, or that the previous task was completed successfully. Since the world is not perfect it often fails to achieve the task. BabyAGI is not alone in this problem. The lack of perception greatly affects the practicality and efficacy of these systems for real-world tasks.

Error recovery mechanisms in these systems also need improvement. While a tool-enabled version of BabyAGI does often generate error-fixing tasks, the Task Prioritizer’s prioritization may not always be optimal. Causing the executor to miss the chance to easily fix the issue. Advanced prioritization algorithms, taking into account error severity and its impact on goal attainment are being worked on. The latest versions of BabyAGI have task dependency tracking which does help, but I don’t believe we have fully fixed this issue yet.

Task completion is another challenge in collaborative agent systems like BabyAGI. A robust review mechanism assessing the state of task completion and adjusting the task list accordingly could address the issue of endless task generation, enhancing the overall efficiency of the system. Since MetaGPT has managers that check the results of the individual contributors, they are more likely to detect that the task has been completed, although this way of working is quite inefficient.

Parallel execution of independent tasks offers another potential area of improvement. Leveraging multi-threading or distributed computing techniques could lead to significant speedups and more efficient resource utilization. BabyDeerAGI specifically uses dependency tracking to create independent threads of executors, while MetaGPT uses the company structure to perform work in parallel. Both are interesting approaches to the problem and, perhaps, the two approaches could be combined.

The lack of the ability to learn from experience is another fundamental limitation. As far as I know, none of the current systems utilize fine-tuning of LLMs to form long-term memories. In theory, it isn’t a complicated process but in practice gathering the data necessary, in a way that doesn’t fundamentally make the model worse, is an open problem. Training models on model-generated outputs or training on already encountered data seems to cause the models to overfit quickly; often requiring careful hand-tuning of the training hyper-parameters. To make agents that can learn from experience, a sophisticated algorithm is required, not just to perform the training, but also to gather the correct data. This process is probably similar to the limbic system in our brains, for example.

While the current crop of agent systems has various limitations, there are still many open opportunities to address them with software and structure to create even more advanced applications. Enhancing active task execution, improving error recovery mechanisms, implementing efficient review mechanisms, and exploring parallel execution capabilities can boost the overall performance of these systems.

Conclusion:

The emergence of open-source collaborative agent systems is creating a transformative era in AI. We are very close to a world where humans and AI can collaborate to solve the world’s problems. Similar to the idea of how companies or the market formed by many independent rational actors form a superhuman intelligence, the development of collaborative agent systems that have many independent sub-agents that communicate, collaborate, and reason together seems to enhance the capabilities of the language model alone to accomplish tasks, paving the way for the creation of more versatile applications.

Looking ahead, I think AI powered by collaborative agent systems has the potential to revolutionize industries such as healthcare, finance, education, and more. However, we must not forget the important sentence from an IBM manual: “A computer can never be held accountable”. In a future where we have human-level AIs that we can work hand-in-hand to tackle complex problems, it becomes increasingly important to ensure accountability measures are in place. The responsibility and accountability for their actions still ultimately lie with the humans who design, deploy, and use them.

This journey towards AGI is thrilling, and collaborative agent systems play an integral role in this transformative era of artificial intelligence.

The post Talk of the AI Town: The Uprising of Collaborative Agents appeared first on ML Conference.

AI is a Human Endeavor

ML editorial team — Tue, 29 Aug 2023 09:00:46 +0000

devmio: Could you please introduce yourself to our readers and a bit about why you are concerned with machine learning and artificial intelligence?

Mhairi Aitken: My name is Mhairi Aitken, I’m an ethics fellow at the Alan Turing Institute. The Alan Turing Institute is the UK’s National Institute for AI and data science and as an ethics fellow, I look at the ethical and social considerations around AI and data science. I work in the public policy program where our work is mostly focused on uses of AI within public policy and government, but also in relation to policy and government responses to AI as in regulation of AI and data science.

devmio: For our readers who may be unfamiliar with the Alan Turing Institute, can you tell us a little bit about it?

Mhairi Aitken: The national institute is publicly funded, but our research is independent. We have three main aims of our work. First, advancing world-class research and applying that to national and global challenges.

Second, building skills for the future. That’s both going to technical skills and training the next generation of AI and data scientists, but also to developing skills around ethical and social considerations and regulation.

Third, part of our mission is to drive an informed public conversation. We have a role in engaging with the public, as well as policymakers and a wide range of stakeholders to ensure that there’s an informed public conversation around AI and the complex issues surrounding it and clear up some misunderstandings often present in public conversations around AI.

NEW & PRACTICAL ENDEAVORS FOR ML

Machine Learning Principles

Learn more

devmio: In your talk at Devoxx UK, you said that it’s important to demystify AI. What exactly is the myth surrounding AI?

Mhairi Aitken: There’s quite a few different misconceptions. Maybe one of the biggest ones is that AI is something that is technically super complex and not something everyday people can engage with. That’s a really important myth to debunk because often there’s a sense that AI isn’t something people can easily engage with or discuss.

As AI is already embedded in all our individual lives and is having impacts across society, it’s really important that people feel able to engage in those discussions and that they have a say and influence the way AI shapes their lives.

On the other hand, there are unfounded and unrealistic fears about what risks it might bring into our lives. There’s lots of imagery around AI that gets repeated, of shiny robots with glowing brains and this idea of superintelligence. These widespread narratives around AI come back again and again, and are very present within the public discourse.

That’s a distraction and it creates challenges for public engagement and having an informed public discussion to feed into policy and regulation. We need to focus on the realities of what AI is and in most cases, it’s a lot less exciting than superintelligence and shiny robots.

devmio: You said that AI is not just a complex technical topic, but something we are all concerned with. However, many of these misconceptions stem from the problem that the core technology is often not well understood by laymen. Isn’t that a problem?

Mhairi Aitken: Most of the players in big tech are pushing this idea of AI being something about superintelligence, something far-fetched, that’s closing down the discussions. It’s creating that sense that AI is something more difficult to explain, or more difficult to grasp, then it actually is, in order to have an informed conversation. We need to do a lot more work in that space and give people the confidence to engage in meaningful discussions around AI.

And yes, it’s important to enable enough of a technical understanding of what these systems are, how they’re built and how they operate. But it’s also important to note that people don’t need to have a technical understanding to engage in discussions around how systems are designed, how they’re developed, in what contexts they’re deployed, or what purposes they are used for.

Those are political, economic, and cultural decisions made by people and organizations. Those are all things that should be open for public debate. That’s why, when we talk about AI, it’s really important to talk about it as a human endeavor. It’s something which is created by people and is shaped by decisions of organizations and people.

That’s important because it means that everyone’s voices need to be heard within those discussions, particularly communities who are potentially impacted by these technologies. But if we present it as something very complex which requires a deep technical understanding to engage with, then we are shutting down those discussions. That’s a real worry for me.

Stay tuned!
Learn more about ML Conference:

devmio: If the topic of superintelligence as an existential threat to humanity is a distraction from the real problems of AI that is being pushed by Big Tech, then what are those problems?

Mhairi Aitken: A lot of the AI systems that we interact with on a daily basis are opaque systems that make decisions about people’s lives, in everything from policing to immigration, social care and housing, or algorithms that make decisions about what information we see on social media.

Those systems rely on or are trained on data sets, which contain biases. This often leads to biased or discriminatory outcomes and impacts. Because the systems are often not transparent in the ways that they’re used or have been developed, it makes it very difficult for people to contest decisions that are having meaningful impacts on their lives.

In particular, marginalized communities, who are typically underrepresented within development processes, are most likely to be impacted by the ways these systems are deployed. This is a really, really big concern. We need to find ways of increasing diversity and inclusiveness within design and development processes to ensure that a diverse set of voices and experiences are reflected, so that we’re not just identifying harms when they occur in the real world, but anticipating them earlier in the process and finding ways to mitigate and address them.

At the moment, there are also particular concerns and risks that we really need to focus on concerning generative AI. For example, misinformation, disinformation, and the ways generative AI can lead to increasingly realistic images, as well as deep fake videos and synthetic voices or clone voices. These technologies are leading to the creation of very convincing fake content, raising real concerns for potential spread of misinformation that might impact political processes.

It’s not just becoming increasingly hard to spot that something is fake. It’s also a widespread concern that it is increasingly difficult to know what is real. But we need to have access to trustworthy and accurate information about the world for a functioning democracy. When we start to question everything as potentially fake, it’s a very dangerous place in terms of interference in political and democratic processes.

I could go on, but there are very real concrete examples of how AI is already having presented harms today and they disproportionately impact marginalized groups. A lot of the narratives of existential risk we currently see are coming from Big Tech and are mostly being pushed by privileged or affluent people. When we think about AI or how we address the risks around AI, it’s important that we shouldn’t center around the voices of Big Tech, but the voices of impacted communities.

devmio: A lot of misinformation is already on the internet and social media without the addition of AI and generative AI. So potential misuse on a large scale is of a big concern for democracies. How can western societies regulate AI, either on an EU-level or a global scale? How do we regulate a new technology while also allowing for innovation?

Mhairi Aitken: There definitely needs to be clear and effective regulation around AI. But I think that the dichotomy between regulation and innovation is false. For a start, we don’t just want any innovation. We want responsible and safe innovation that leads to societal benefits. Regulation is needed to make sure that happens and that we’re not allowing or enabling dangerous and harmful innovation practices.

Also, regulation provides the conditions for certainty and confidence for innovation. The industry needs to have confidence in the regulatory environment and needs to know what the limitations and boundaries are. I don’t think that regulation should be seen as a barrier to innovation. It provides the guardrails, clarity, and certainty that is needed.

Regulation is really important and there are some big conversations around that at the moment. The EU AI Act is likely to set an international standard of what regulation will look like in this regard. It’s going to have a big impact in the same way that GDPR had with data protection. Soon, any organization that’s operating in the EU, or that may export an AI product to the EU, is going to have to comply with the EU AI Act.

We need international collaboration on this.

devmio: The EU AI Act was drafted before ChatGPT and other LLMs became publicly available. Is the regulation still up to date? How is an institution like the EU supposed to catch up to the incredible advancements in AI?

Mhairi Aitken: It’s interesting that over the last few months, developments with large language models have forced us to reconsider some elements of what was being proposed and developed, particularly around general purpose AI. Foundation models like large language models that aren’t designed for a particular purpose can be deployed in a wide range of contexts. Different AI models or systems are built on top of them as a foundation.

That’s posed some specific challenges around regulation. Some of this is still being worked out. There are big challenges for the EU, not just in relation to foundation models. AI encompasses so many things and is used across all industries, across all sectors in all contexts, which poses a big challenge.

The UK-approach to regulation of AI has been quite different to that proposed in the EU: The UK set out a pro-innovation approach to regulation, which was a set of principles intended to equip existing UK regulatory bodies to grapple the challenges of AI. It recognized that AI is already being used across all industries and sectors. That means that all regulators have to deal with how to regulate AI in their sectors.

In recent weeks and months in the UK we have seen an increasing emphasis on regulation and AI, and increased attention at the importance of developing effective regulation. But I have some concerns that this change of emphasis has, at least in part, come from Big Tech. We’ve seen this in the likes of Sam Altman on his tour of Europe, speaking to European regulators and governments. Many voices talking about the existential risk AI poses come from Silicon Valley. This is now beginning to have an influence on policy discussions and regulatory discussions, which is worrying. It’s a positive thing that we’re having these discussions about regulation and AI, but we need those discussions to focus on real risks and impacts.

devmio: The idea of existential threat posed by AI often comes from a vision of self-conscious AI, something often called strong AI or artificial general intelligence (AGI). Do you believe AGI will ever be possible?

Mhairi Aitken: No, I don’t believe AGI will ever be possible. And I don’t believe the claims being made about an existential threat. These claims are a deliberate distraction from the discussions of regulation of current AI practices. The claim is that the technology and AI itself poses a risk to humanity and therefore, needs regulation. At the same time, companies and organizations are making decisions about that technology. That’s why I think this narrative is being pushed, but it’s never going to be real. AGI belongs in the realm of sci-fi.

There are huge advancements in AI technologies and what they’re going to be capable of doing in the near future is going to be increasingly significant. But they are still always technologies that do what they are programmed to do. We can program them to do an increasing number of things and they do it with an increasing degree of sophistication and complexity. But they’re still only doing what they’re programmed for, and I don’t think that will ever change.

I don’t think it will ever happen that AI will develop its own intentions, have consciousness, or a sense of itself. That is not going to emerge or be developed in what is essentially a computer program. We’re not going to get to consciousness through statistics. There’s a leap there and I have never seen any compelling evidence to suggest that could ever happen.

We’re creating systems that act as though they have consciousness or intelligence, but this is an illusion. It fuels a narrative that’s convenient for Big Tech because it deflects away from their responsibility and suggests that this isn’t about a company’s decisions.

devmio: Sometimes it feels like the discussions around AI are a big playing field for societal discourse in general. It is a playing field for a modern society to discuss its general state, its relation to technology, its conception of what it means to be human, and even metaphysical questions about God-like AI. Is there some truth to this?

Mhairi Aitken: There’s lots of discussions about potential future scenarios and visions of the future. I think it’s incredibly healthy to have discussions about what kind of future we want and about the future of humanity. To a certain extent this is positive.

But the focus has to be on the decisions we make as societies, and not hypothetical far-fetched scenarios of super intelligent computers. These conversations that focus on future risks have a large platform. But we are only giving a voice to Big Tech players and very privileged voices with significant influence in these discussions. Whereas, these discussions should happen at a much wider societal level.

The conversations we should be having are about how we harness the value of AI as a set of tools and technologies. How do we benefit from them to maximize value across society and minimize the risks of technologies? We should be having conversations with civil society groups and charities, members of the public, and particularly with impacted communities and marginalized communities.

We should be asking what their issues are, how AI can find creative solutions, and where we could use these technologies to bring benefit and advocate for the needs of community groups, rather than being driven by commercial for-profit business models. These models are creating new dependencies on exploitative data practices without really considering if this is the future we want.

devmio: In the Alan Turing Institute’s strategy document, it says that the institute will make great leaps in AI development in order to change the world for the better. How can AI improve the world?

Mhairi Aitken: There are lots of brilliant things that AI can do in the area of medicine and healthcare that would have positive impacts. For example, there are real opportunities for AI to be used in developing diagnostic tools. If the tools are designed responsibly and for inclusive practices, they can have a lot of benefits. There’s also opportunities for AI in relation to the environment and sustainability in terms of modeling or monitoring environments and finding creative solutions to problems.

One area that really excites me is where AI can be used by communities, civil society groups, and charities. At the moment, there’s an emphasis on large language models. But actually, when we think about smaller AI, there’s real opportunities if we see them as tools and technologies that we can harness to process complex information or automate mundane tasks. In the hands of community groups or charities, this can provide valuable tools to process information about communities, advocate for their needs, or find creative solutions.

devmio: Do you have examples of AI used in the community setting?

Mhairi Aitken: For example, community environment initiatives or sustainability initiatives can use AI to monitor local environments, or identify plant and animal species in their areas through image recognition technologies. It can also be used for processing complex information, finding patterns, classifying information, and making predictions or recommendations from information. It can be useful for community groups to process information about aspects of community life and develop evidence needed to advocate for their needs, better services, or for political responses.

A lot of big innovation is in commercially-driven development. This leads to commercial products instead of being about how these tools can be used for societal benefit on a smaller scale. This changes our framing and helps us think about who we’re developing these technologies for and how this relates to different kinds of visions of the future that benefit from this technology.

devmio: What do you think is needed to reach this point?

Mhairi Aitken: We need much more open public conversations and demands about transparency and accountability relating to AI. That’s why it’s important to counter the sensational unrealistic narrative and make sure that we focus on regulation, policy and public conversation. All of us must focus on the here and now and the decisions of companies leading the way in order to hold them accountable. We must ensure meaningful and honest dialogue as well as transparency about what’s actually happening.

devmio: Thank you for taking the time to talk with us and we hope you succeed with your mission to inform the public.

The post AI is a Human Endeavor appeared first on ML Conference.

Using OpenAI’S CLIP Model on the iPhone: Semantic Search For Your Own Pictures

lseidler — Wed, 02 Aug 2023 08:35:24 +0000

OpenAI’s CLIP Model

I first encountered the CLIP model in early 2022 while experimenting with the AI drawing model. CLIP (Contrastive Language-Image Pre-Training) is a model proposed by OpenAI in 2021. CLIP can encode images and text into representations that can be compared in the same space. CLIP is the basis for many text-to-image models (e.g. Stable Diffusion) to calculate the distance between the generated image and the prompt during training.

Fig. 1: OpenAI’s CLIP model, source: https://openai.com/blog/clip/

As shown above, the CLIP model consists of two components: Text Encoder and Image Encoder. Let’s take the ViT-B-32 (different models have different output vector sizes) version as an example:

Text Encoder can encode any text (length <77 tokens) into a 1×512 dimensional vector.
Image Encoder can encode any image into a 1×512 dimensional vector.

By calculating the distance or cosine similarity between the two vectors, we can compare the similarity between a piece of text and an image.

Stay up to date

Learn more about MLCON

Image Search on a Server

I found this to be quite fascinating, as it was the first time images and text could be compared in this way. Based on this principle, I quickly set up an image search tool on a server. First, process all images through CLIP, obtaining their image vectors, they should be a list of 1×512 vectors.

# get all images list.

img_lst = glob.glob(‘imgs/*jpg’)

img_features = []

# calculate vector for every image.

for img_path in img_lst:

image = preprocess(Image.open(img_path)).unsqueeze(0).to(device)

image_features = model.encode_image(image)

img_features.append(image_features)

Then, given the search text query, calculate its text vector (with a size of 1×512) and compare similarity with each image vector in a for-loop.

text_query = ‘lonely’

# tokenize the query then put it into the CLIP model.

text = clip.tokenize([text_query]).to(device)

text_feature = model.encode_text(text)

# compare vector similary with each image vector

sims_lst = []

for img_feature in img_features:

sim = cosin_similarity(text_feature, img_feature)

sims_lst.append(sim.item())

Finally, display the top K results in order. Here I return the top3 ranked image files, and display the most relevant result.

K = 3

# sort by score with np.argsort

sims_lst_np = np.array(sims_lst)

idxs = np.argsort(sims_lst_np)[-K:]

# display the most relevant result.

imagedisplay(filename=img_lst[idxs[-1]])

I discovered that its image search results were far superior to those of Google, here are the top 3 results when I search for the keyword “lonely”:

Integrating CLIP into iOS with Swift

After marveling at the results, I wondered: Is there a way to bring CLIP to mobile devices? After all, the place where I store the most photos is neither my MacBook Air nor my server, but rather my iPhone.

To port a large GPU-based model to the iPhone, operator support and execution efficiency are the two most critical factors.

1. Operator Support

Fortunately, in December 2022, Apple demonstrated the feasibility of porting Stable Diffusion to iOS, proving that the deep learning operators needed for CLIP are supported in iOS 16.0.

Fig. 2: Pictures generated by Stable Diffusion

2. Execution Efficiency

Even with operator support, if the execution efficiency is extremely slow (for example, calculating vectors for 10,000 images takes half an hour, or searching takes 1 minute), porting CLIP to mobile devices would lose its meaning. These factors can only be determined through hands-on experimentation.

I exported the Text Encoder and Image Encoder to the CoreML model using the coremltools library. The final models has a total file size of 300MB. Then, I started writing Swift code.

I use Swift to load the Text/Image Encoder models and calculate all the image vectors. When users input a search keyword, the model first calculates the text vector and then computes its cosine similarity with each of the image vectors individually.

The core code is as follows:

// load the Text/Image Encoder model.

let text_encoder = try MLModel(contentsOf: TextEncoderURL, configuration: config)

let image_encoder = try MLModel(contentsOf: ImageEncoderURL, configuration: config)

// given a prompt/photo, calculate the CLIP vector for it.

let text_feature = text_encoder.encode(“a dog”)

let image_feature = image_encoder.encode(“a dog”)

// compute the cosine similarity.

let sim = cosin_similarity(img_feature, text_feature)

As a SwiftUI beginner, I found that Swift doesn’t have a specific implementation for cosine similarity. Therefore, I used Accelerate to write one myself, the code below is a Swift translation of cosine similarity from Wikipedia.

import Accelerate

func cosine_similarity(A: MLShapedArray, B: MLShapedArray) -> Float {

let magnitude = vDSP.rootMeanSquare(A.scalars) * vDSP.rootMeanSquare(B.scalars)

let dotarray = vDSP.dot(A.scalars, B.scalars)

return dotarray / magnitude

}

The reason I split Text Encoder and Image Encoder into two models is because, when actually using this Photos search app, your input text will always change, but the content of the Photos library is fixed. So all your image vectors can be computed once and saved in advance. Then, the text vector is computed for each of your searches.

Furthermore, I implemented multi-core parallelism when calculating similarity, significantly increasing search speed: a single search for less than 10,000 images takes less than 1 second. Thus, real-time text searching from tens of thousands of Photos library becomes possible.

Below is a flowchart of how Queryable works:

Fig. 3: How the app works

Performance

But, compared to the search function of the iPhone Photos, how much does the CLIP-based album search capability improve? The answer is: overwhelmingly better. With CLIP, you can search for a scene in your mind, a tone, an object, or even an emotion conveyed by the image.

Fig. 4: Search for a scene, an object, a tone or the meaning related to the photo with Queryable.

To use Queryable, you need to first build the index, which will traverse your album, calculate all the image vectors and store them. This takes place only once, the total time required for building the index depends on the number of your photos, the speed of indexing is ~2000 photos per minute on the iPhone 12 mini. When you have new photos, you can manually update the index, which is very fast.

In the latest version, you have the option to grant the app access to the network in order to download photos stored on iCloud. This will only occur when the photo is included in your search results, the original version is stored on iCloud, and you have navigated to the details page and clicked the download icon. Once you grant the permissions, you can close the app, reopen it, and the photos will be automatically downloaded from iCloud.

3. Any requirements for the device?

iOS 16.0 or above
iPhone 11 (A13 chip) or later models

The time cost for a search also depends on your number of photos: for <10,000 photos it takes less than 1s. For me, an iPhone 12 mini user with 35,000 photos, each search takes about 2.8s.

Q&A on Queryable

1.On Privacy and security issues.

Queryable is designed as an OFFLINE app that does not require a network connection and will never request network access, thereby avoiding privacy issues.

2. What if my pictures are stored on iCloud?

Due to the inability to connect to a network, Queryable can only use the cache of the low-definition version of your local Photos album. However, the CLIP model itself resizes the input image to a very small size (e.g. ViT-B-32 is 224×224), so if your image is stored on iCloud, it actually does not affect search accuracy except that you cannot view its original image in search result.

The post Using OpenAI’S CLIP Model on the iPhone: Semantic Search For Your Own Pictures appeared first on ML Conference.

AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence

eguemuesdere@sandsmedia.com — Wed, 21 Jun 2023 10:20:21 +0000

**devmio: Hello, Christoph! Could you tell us what LAION is and what role you play there?**

**Christoph Schuhmann:** LAION stands for Large-Scale Artificial Intelligence Open Network. First and foremost, it’s simply a huge community of people who share the dream of open-source AI models, research, and datasets. That’s what connects us all. We have a [Discord server](https://discord.com/invite/xBPBXfcFHd) where anyone can come in and share a bit about the latest research in the field. You can also propose a new project and find people to work on it with you. And if you ask the mods, me, or other people, you might even get a channel for your project. That’s basically the core.

When we had such surprising success with our first dataset called [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/), we set up a small non-profit association that doesn’t actually do anything. We have a bank account with a bit of money coming into it from a few companies that support us. That’s primarily Hugging Face, but also StabilityAI, although we’re mostly supported not by money but by cloud compute.

StabilityAI, for example, has a huge cluster with 4000 or now 5600 GPUs, and there we or our members who are approved by the core team can use preemptable GPUs, for example, what is not being used at the moment and is idle.

**devmio: So we can just come to you and contribute? Propose our ideas and ask for help with our projects or help with ongoing projects?**

**Christoph Schuhmann:** Exactly! You can now come to our Discord server and say that you want to contribute to a project or help us with PR or whatever. You are most welcome!

**devmio: Is LAION based in Germany? And you are the chairman and co-founder?**

**Christoph Schuhmann:** Exactly. I am a physics and computer science teacher, I have been regularly involved with machine learning, and I also have a background in reform-oriented education. I made a Kickstarter documentary seven or eight years ago about schools where you can learn without grades and curriculum. After that took off, I did tutorials on how to start such an independent school. So I knew how to set up a grassroots non-profit organization. I am not paid for my work at LAION.

## The Beginnings of LAION

**devmio: How did LAION come to life? How did you get to know the other members?**

**Christoph Schuhmann:** I actually started LAION after reading a lot about deep learning and machine learning and doing online courses in my spare time over the last five to six years. When the first version of DALL-E was published at the beginning of 2021, I was totally shocked by how good it was. At that time, however, many non-computer scientists didn’t find it that impressive.

I then asked on a few Discord servers about machine learning and what we would need to replicate something similar and make it open-source. There was a well-known open-source programmer at the time called Philip Wang (his alias on GitHub is lucidrains) who is a legend in the community because whenever a new paper comes out he has the associated codebase implemented within a few days. He also built an implementation of the first version of DALL-E in Pytorch called [DALLE-pytorch](https://github.com/lucidrains/DALLE-pytorch). This model was then trained by a few people using small data sets on Discord, and that was proof of concept.

But the data was missing, and I suggested going to [Common Crawl](https://commoncrawl.org/), a non-profit from Seattle that scraps HTML code from the internet every two to three months and makes it available. A snapshot, so to speak, of the HTML code of all possible websites, which is 250 terabytes zip file. I then suggested downloading a gigabyte as a test and wrote a script that extracts image tags together with alt tags and then uses the CLIP model to see how well they fit together.

Then two “machine learning nerds”, who were much better at it than I was at the time, implemented it efficiently but didn’t finish it. That was a shame, but they were developing the GPT open-source variant [GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj) and therefore didn’t have the time.

Then in the spring of 2021, I sat down and just wrote down a huge spaghetti code in a Google Colab and then asked around on Discord who wanted to help me with it. Someone got in touch, who later turned out to be only 15 at the time. And he wrote a tracker, basically a server that manages lots of colabs, each of which gets a small job, extracts a gigabyte, and then uploads the results. At that time, the first version was still using Google Drive.

## The Road to the LAION-400M Dataset

It was a complete disaster because Google Drive wasn’t suitable for it, but it was the easiest thing we could do quickly. Then I looked for some people on a Discord server, made some more accounts, and then we ended up with 50 Google Colabs working all the time.

But it worked, and then, within a few weeks, we had filtered 3 million image-text pairs, which at the time was more than Google’s [Conceptual Captions](https://ai.google.com/research/ConceptualCaptions/), a very well-known dataset of 2019. That little success got us so much attention on the Discord server that people just started supporting us and writing things like, “I have 50 little virtual machines here from my work, you could use them, I don’t need them right now,” or “I have another 3090 lying around here with me, I can share it with you.”

After three months, we had 413 million filtered image-text pairs. That was our LAION-400M dataset. At the time, it was by far the largest image-text dataset freely available, over 30 times larger than [Google’s Conceptual Caption 12M](https://github.com/google-research-datasets/conceptual-12m), with about 12 million pairs.

We then did a [blog post about our dataset](https://laion.ai/blog/laion-400-open-dataset/), and after less than an hour, I already had an email from the Hugging Face people wanting to support us. I had then posted on the Discord server that if we had $5,000, we could probably create a billion image-text pairs. Shortly after, someone already agreed to pay that: “If it’s so little, I’ll pay it.” At some point, it turned out that the person had his own startup in text-to-image generation, and later he became the chief engineer of Midjourney.

As you can see, it was simply a huge community, just 100 people who only knew each other from chat groups with aliases. At some point, I made the suggestion to create an association, with a banking account, etc. That’s how LAION was founded.

## Even Bigger: LAION-5B and LAION-Aesthetics

We then also got some financial support from Hugging Face and started working on LAION-5B, which is a dataset containing five billion image-text pairs. By the end of 2021, we were done with just under 70 percent of it, and then we were approached by someone who wanted to create a start-up that was like OpenAI but really open-source. He offered to support us with GPUs from AWS. This was someone who introduced himself as a former investment banker or hedge fund manager, which I didn’t quite believe at first. In the end, it was just some guy from Discord. But then the access data for the first pods came, and it turned out that the guy was Emad Mostaque, the founder of StabilityAI.

**devmio: What is the relationship between LAION and Stability AI?**

**Christoph Schuhmann:** Contrary to what some AI-art critics claim, we are not a satellite organisation of Stability AI. On the contrary, Stability AI came to us after the LAION-5B dataset was almost finished and wanted to support us unconditionally. They then did the same with LAION-Aesthetics.

**devmio: Could you explain what LAION-Aesthetics is?**

**Christoph Schuhmann:** I trained a model that uses the CLIP embeddings of the LAION images to estimate how pretty the images are on a scale of one to ten. It’s a very small model, a multilayer perceptron running on a CPU. At some point, I ran the model over a couple of 100,000 images, sorted them, and thought that the ones with the high scores looked really good. The next step was to run it on 2.3 billion CLIP embeddings.

## From LAION-Aesthetics to Stable Diffusion

**devmio: How did LAION-Aesthetics help with the development of Stable Diffusion?**

**Christoph Schuhmann:** I had already heard about Robin Rombach, who was still a student in Heidelberg at the time and had helped develop latent diffusion models at the CompVis Group. Emad Mostaque, the founder of StabilityAI, told me in May 2022 that he would like to support Robin Rombach with compute time, and that’s how I got in touch with Robin.

I then sent him the LAION-Aesthetics dataset. The dataset can be thought of as a huge Excel spreadsheet containing links to images and the associated alt text. In addition, each image is given a score, such as whether something contains a watermark or smut. Robin and his team later trained the first prototype of Stable Diffusion on this. However, the model only got the name Stable Diffusion through Stability AI, to whom the model then migrated.

LAION also got access to the Stability AI cluster. But we were also lucky enough to be able to use JUWELS, one of the largest European supercomputers, because one of our founding members, Jenia Jitsev, is the lab director at the Jülich Supercomputer Center for Deep Learning. We then applied for compute time to train our own OpenCLIP models. And now we have the largest CLIP models available in open source.

## LAION’s OpenCLIP

**devmio: What exactly do CLIP models do? And what makes LAION’s OpenCLIP so special?**

**Christoph Schuhmann:** On the Stability AI cluster, a Ph.D. student from UC Washington has trained a model called CLIP-ViT-G. This model can tell you how well an image matches a text, and this model has managed to crack the 80 percent zero-shot mark. This means that we have now built a general-purpose AI model that is better than the best state-of-the-art models from five years ago that were built and trained specifically for this purpose.

These CLIP models are in turn used as text encoders, as “text building blocks” by Stable Diffusion and by many other models. CLIP models have an incredible number of applications. For example, they can be used for zero-shot image segmentation, zero-shot object detection with bounding boxes, zero-shot classification, or even for text-to-image generation.

We have trained and further developed these models. We now have a variant that not only trains these CLIP models but also generates captions through a text decoder. This model is called [CoCa](https://laion.ai/blog/coca/) and is quite close to the state of the art.

We have many such projects running at the same time, sometimes so many that I almost lose track of them. Currently, we cooperate with Mila, an institute of excellence from Montreal, and together we have access to the second largest supercomputer in the US, Summit. We have been given 6 million GPU hours there and are training all kinds of models.

**devmio: You have already talked a lot about Stable Diffusion, and Robin Rombach, the inventor, is a member of your team. Is Stable Diffusion managed by you, is that “your” model?**

**Christoph Schuhmann:** No, we don’t have anything to do with that for now. But we have made the development and training of Stable Diffusion easier with LAION-Aesthetics and LAION-5B.

## Open Source as a Superpower

**devmio: LAION is committed to making the latest developments in AI freely available. Why is open source so important in AI?**

**Christoph Schuhmann:** Let’s take the sentence: “AI should be open source so that it is available to the general public.” Now let’s take that sentence and replace “AI” with “superpowers”: “Superpowers should be open source and available to the public.” In this case, it becomes much more obvious what I’m actually getting at.

Imagine if there was such a thing as superpowers, and only OpenAI, Microsoft, Google, maybe the Chinese and American governments, and five other companies, have control over it and can decide what to do with it. Now, you could say that governments only ever want what’s best for their citizens. That’s debatable, of course, but let’s assume that’s the case. But does that also apply to Microsoft? Do they also have our best interests at heart, or does Microsoft simply want to sell its products?

If you have a very dark view of the world, you might say that there are a lot of bad people out there, and if everyone had superpowers now, there would certainly be 10, 20, or 30 percent of all people who would do really bad things. That’s why we have to control such things, for example through the state. But if you have a rather positive and optimistic view of the world, like me, for example, then you could say that most people are relatively nice. No angels, no do-gooders, but most people don’t want to actively do something bad, or destroy something, but simply live their lives. There are some people who are do-gooders and also people who have something bad in mind. But the latter are probably clearly in the minority.

If we assume that everyone has superpowers, then everyone would also have the opportunity to take action against destructive behaviour and limit its effects. In such a world, there would be a lot of positive things. Things like superpower art, superpower music, superpower computer games, and superpower productivity of companies that simply produce goods for the public. If you now ask yourself what kind of world you would like to live in and assume that you have a rather positive worldview, then you will probably decide that it would be good to make superpowers available to the general public as open source. And once you understand that, it’s very easy to understand that AI should also be open source.

AI is not the same as superpowers, of course, but in a world in which the internet plays an ever greater role, in which every child grows up with YouTube, in which AI is getting better and better, in which more and more autonomous systems are finding their way into our everyday lives, AI is incredibly important. Software and computerised things are sort of superpowers. And that’s going to get much more blatant, especially with ChatGPT. In three to four years, ChatGPT will be much better than it is today.

Now imagine if the whole world used technologies like ChatGPT and only OpenAI and Microsoft, Google and maybe two or three other big companies controlled those technologies. They can cut you off at any time, or tell you “Sorry, but I can’t do this task, it’s unethical in my opinion”, “I have to block you for an hour now”, or “Sorry, your request might be in competition with a Microsoft product, now I have to block you forever. Bye.”

**devmio: We had also spoken to other experts, for example, Pieter Buteneers and Christoph Henkelkmann, who had similar concerns. But the question remains whether everyone should really have unrestricted access to such technologies, right?**

**Christoph Schuhmann:** A lot of criticism, not directed at LAION but at Stable Diffusion, goes in this direction. There is criticism that there are open-source models like Stable Diffusion that can be used to create negative content, circumvent copyright and create fakes, etc. Of course, it’s wrong to violate copyright, and it’s also wrong to create negative content and fakes. But imagine if these technologies were only in the hands of Microsoft, Google, and a few more large research labs. They would develop really well in the background, and at some point, you would be able to generate everything perfectly with them. And then they leak out or there is a replica, and society is not prepared at all. Small and medium-sized university labs wouldn’t be prepared at all to look at the source code and discover the problems.

We have something similar with LAION-5B. There are also some questionable images in the dataset that we were unable to filter. As a result, there is also a disclaimer that it is a research dataset that should be thoroughly filtered and examined before being used in production. You have to handle this set carefully and responsibly. But this also means that you can find things in the set that you would like to remove from the internet.

For example, there is an organisation of artists, [Have I Been Trained](https://haveibeentrained.com/), that provides a tool that artists can use to determine if their artwork is included in LAION-5B. This organisation has simply taken our open-source code and used it for their own purposes to organise the disappointed artists.

And that’s a great thing because now all those artists who have images on the internet that they don’t want there can find them and have them removed. And not only artists! For example, if I have a picture of myself on the internet that I don’t want there, I can find out through LAION-5B where it is being used. We don’t have the images stored in LAION-5B, we just have a table with the links, it’s just an index. But through that, you can find out which URL is linked to the image and then contact the owners of the site and have the image removed. By doing this, LAION generates transparency and gives security researchers an early opportunity to work with these technologies and figure out how to make them more secure. And that’s important because this technology is coming one way or another.

In probably a lot less than five years, you’re going to be able to generate pretty much anything in terms of images that you can describe in words, photo-realistically, so that a human being with the naked eye can’t tell whether it’s a photo or not.

## AI in Law, Politics, and Society

**devmio: Because you also mentioned copyright: The legal situation in Germany regarding AI, copyright, and other issues is probably not entirely clear. Are there sufficient mechanisms? Do you think that the new EU regulations that are coming will be sufficient while not hindering creativity and research?**

**Christoph Schuhmann:** I am not a lawyer, but we have good lawyers advising us. There is a Data Mining Law, an EU-wide exception to copyright. It allows non-profit institutions, such as universities, but also associations like ours, whose focus is on research and who make their results publicly available, to download and analyse things that are openly available on the internet.

We are allowed to temporarily store the links, texts, whatever, and when we no longer need them for research, we have to delete them. This law explicitly allows data mining for research, and that is very good. I don’t think all the details of what’s going to happen in the future, especially with ChatGPT and other generative AIs for text and images, were anticipated in these laws. The people who made the law probably had more statistical analysis of the internet in mind and less training data for AIs.

I would like to see more clarity from legislators in the future. But I think that the current legal situation in Germany is very good, at least for non-profit organisations like LAION. I’m a bit worried that when the [EU AI Act](https://digital-strategy.ec.europa.eu/de/policies/european-approach-artificial-intelligence), which is being drafted, comes, something like general purpose AI, like ChatGPT, would be classified as high risk. If that were to be the case, it would mean that if you as an organisation operate or train a ChatGPT-like service, you would have to constantly account for everything meticulously and tick off a great many compliance rules, catalogues, and checklists.

Even if this is certainly well-intentioned, it would also extremely restrict research and development, especially of open source, associations, and of grassroots movements, so that only Big Tech Corporate would be able to comply with all the rules. Whether this will happen is unclear so far. I don’t want high-risk applications like facial recognition to go unregulated either. And I don’t want to be monitored all day.

But if any lawmakers are reading this: Politicians should keep in mind that it is very important to continue to enable open-source AI. It would be very good if we could continue to practice as we have been doing. Not only for LAION but for Europe. I am sure that quite a lot of companies and private people, maybe even state institutions can benefit from such models as CLIP or from the datasets that we are making.

And I believe that this can generate a lot of value for citizens and companies in the EU. So I would even go so far as to call for politicians and donors to maybe think about building something similar to a CERN for AI. With a billion euros, you could probably build a great open-source supercomputer that all companies and universities, in fact, anyone, could use to do AI research under two conditions: First, the whole thing has to be reviewed by some smart people, maybe experts and people from the open-source community. Second, all results, research papers, checkpoints of models, and datasets must be released under a fully open-source licence.

Because then a lot of companies that can’t afford a supercomputer at the moment could open source their research there and only keep the fine-tuning or anything that is really sensitive to the business model on the companies’ own computers. But all the other stuff happens openly. That would be great for a lot of companies, that would be great for a lot of medium and small universities, and that would also be great for groups like LAION.

_**Editor’s note**: After the interview, LAION started a petition for a CERN-like project. Read more on [LAION’s blog](https://laion.ai/blog/petition/)._

## AI for a Better World

**Christoph Schuhmann:** Another application for AI would be a project close to my heart: Imagine there is an open-source ChatGPT. You would then take, say, 100 teachers and have them answer questions from students about all sorts of subjects. For these questions, you could make really nice step-by-step explanations that really make sense. And then, you would collect data from the 100 teachers for the school material up to the tenth grade. That’s at least similar everywhere in the Western world, except, of course, history, politics, etc. But suppose you were to simply break down the subject matter from 100 countries, from 100 teachers, from the largest Western countries, and use that to fine-tune a ChatGPT model.

You need a model that has maybe 20 to 30 billion parameters, and you could use it to give access to first-class education to billions of children in the Third World who don’t have schools but have an old mobile phone and internet access. You don’t need high-tech future technology, you can do that with today’s technology. And these are big problems of the world that could be addressed with it.

Or another application: My mum is 83 years old, she can’t handle a computer and is often lonely. Imagine if she had a Siri that she could have a sensible conversation with. Not as a substitute for human relationships, but as a supplement. How many lonely old people do you think would be happier if they could just ask what’s going on in the world. Or “Remember when I told you that story, Siri? Back in my second marriage 30 years ago?” That would make a lot of people happier. And I think things like that can have a lot of effect with relatively little financial outlay.

**devmio: And what do you see next in AI development?**

**Christoph Schuhmann:** What I just talked about could happen in the next five years. Everything that happens after that, I can’t really predict. It’s going to be insane.

**devmio: Thank you very much for taking the time to talk to us!**

The post AI as a Superpower: LAION and the Role of Open Source in Artificial Intelligence appeared first on ML Conference.

ChatGPT and Artificial General Intelligence: The Illusion of Understanding

eguemuesdere@sandsmedia.com — Mon, 05 Jun 2023 13:21:35 +0000

Upon its release, ChatGPT immediately drew praise from tech experts and the media as “mind blowing” and the “next big disruptor,” while a recent Microsoft report praised GPT-4, the latest iteration of OpenAI’s tool, for its ability to solve novel and difficult tasks with “human-level performance” in advanced careers such as coding, medicine, and law. Google responded to the competition by launching its own AI-based chatbot and service, Bard.

On the flip side, ChatGPT has been roundly criticized for its inability to answer simple logic questions or work backwards from a desired solution to the steps needed to achieve it. Teachers and school administrators voiced fears that students would use the tool to cheat, while political conservatives complained that Chat generates answers with a liberal bias. Elon Musk, Apple co-founder Steve Wozniak, and others signed an open letter recommending a six-month pause in AI development, noting “Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.”

The one factor missing from virtually all these comments – regardless of whether they regard ChatGPT as a huge step forward or a threat to humanity – is a recognition that no matter how impressive, ChatGPT merely gives the illusion of understanding. It is simply manipulating symbols and code samples which it has pulled from the Internet without any understanding of what they mean. And because it has no true understanding, it is neither good nor bad. It is simply a tool which can be manipulated by humans to achieve certain outcomes, depending on the intentions of the users.

It is that difference that distinguishes ChatGPT, and all other AI for that matter, from AGI – artificial general intelligence, defined as the ability of an intelligent agent to understand or learn any intellectual task that a human can. While ChatGPT undoubtedly represents a major advance in self-learning AI, it is important to recognize that it only seems to understand. Like all other AI to date, it is completely reliant on datasets and machine learning. ChatGPT simply appears more intelligent because it depends on bigger and more sophisticated datasets.

RETHINK YOUR APPROACHES

Business & Strategy

Learn more

While some experts continue to argue that at some point in the future, AI will morph into AGI, that outcome seems highly unlikely. Because today’s AI is entirely dependent on massive data sets, there is no way to create a dataset big enough for the resulting system to cope with completely unanticipated situations. In short, AI has no common sense and we simply can’t store enough examples to handle every possible situation. Further, AI, unlike humans, is unable to merge information from multiple senses. So while it might be possible to stitch language and image processing applications together, researchers have not found a way to integrate them in the same seamless way that a child integrates vision, language, and hearing.

For today’s AI to advance to something approaching real human-like intelligence, it must have three essential components of consciousness: an internal mental model of surroundings with the entity at the center; a perception of time which allows for a prediction of future outcome(s) based on current actions; and an imagination so that multiple potential actions can be considered and their outcomes evaluated and chosen. Just like the average three-year-old child, it must be able to explore, experiment, and learn about real objects, interpreting everything it knows in the context of everything else it knows.

To get there, researchers must shift their reliance on ever-expanding datasets to a more biologically plausible system modelled on the human brain, with algorithms that enable it to build abstract “things” with limitless connections and context.

While we know a fair amount about the brain’s structure, we still don’t know what fraction of our DNA defines the brain or even how much DNA defines the structure of its neocortex, the part of the brain we use to think. If we presume that generalized intelligence is a direct outgrowth of the structure defined by our DNA and that structure could be defined by as little as one percent of that DNA, though, it is clear that AGI emergence depends not on more computer power or larger data sets but on what to write as the fundamental AGI algorithms.

With that in mind, it seems highly likely that a broader context that is actually capable of understanding and learning gradually could emerge if all of today’s AI systems could be built on a common underlying data structure that allowed their algorithms to begin interacting with each other. As these systems become more advanced, they would slowly begin to work together to create a more general intelligence that approaches the threshold for human-level intelligence, enabling AGI to emerge. To make that happen, though, our approach must change. Bigger and better data sets don’t always win the day.

The post ChatGPT and Artificial General Intelligence: The Illusion of Understanding appeared first on ML Conference.

AI Alignment

eguemuesdere@sandsmedia.com — Thu, 30 Mar 2023 15:21:35 +0000

Two positions can be identified in the AI discourse. First, “We’ll worry about that later, when the time comes” and second, “This is a problem for nerds who have no ethical values anyway”. Both positions are misguided, as the problem has existed for a long time and, moreover, there are certainly ways of setting boundaries for AI. Rather, there is a lack of consensus on what those boundaries should be.

AI Alignment [1] is concerned with aligning AI to desired goals. The first challenge here is to agree on these goals in the first place. The next difficulty is that it is not (yet?) possible to give these goals directly and explicitly to an AI system.

For example, Amazon developed a system several years ago that helps select suitable applicants for open positions ([2], [3]). For this, resumes of accepted and unaccepted applicants were used to train an AI system. Although they contained no explicit information about gender, male applicants were systematically preferred. We will discuss how this came about in more detail later. But first, this raises several questions: Is this desirable, or at least acceptable? And if not, how do you align the AI system so that it behaves as you want it to? In other words, how do you successfully engage in AI alignment?

Stay up to date

Learn more about MLCON

For some people, AI Alignment is an issue that will become more important in the future when machines are so intelligent and powerful that they might think the world would be better without humans [4]. Nuclear war provoked by supervillains is mentioned as another possibility of AI’s fatal importance. Whether these fears could ever become realistic remains speculation.

The claims being discussed as part of the EU’s emerging AI regulation are more realistic. Depending on what risk is realistically inherent in an AI system, different regulations may be applied here. This is shown in Figure 1, which is based on a presentation for the EU [5]. Four ranges from “no risk” to “unacceptable risk” are distinguished. In this context, a system with no significant risk only has the recommendation of a “Code of Conduct”, while a social credit system, as applied in China [6], is simply not allowed. However, this scheme only comes into effect if there is no specific law.

Fig. 1: Regulation based on outgoing risk, adapted from [5]

Alignment in Machine Learning Systems

A machine learning system is trained using sample data. It learns to mimic this sample data. In the best and most desirable case, the system can generalize beyond this sample data and recognizes an abstract pattern behind it. If this succeeds, the system can also react meaningfully to data that it has never seen before. Only then can we speak of learning or even a kind of understanding that goes beyond memorization.

This also happened in the example of Amazon’s applicant selection, as shown in a simplified form in Figure 2.

Fig. 2: How to learn from examples, also known as supervised learning

Here is another example. We use images of dogs and cats as sample data for a system, training it to distinguish between them. In the best case, after training, the system also recognizes cats that are not contained in the training data set. It has learned an abstract pattern of cats, which is still based on the given training data, however.

Therefore, this system can only reproduce what already exists. It is descriptive or representative, but hardly normative. In the Amazon example, it replicates past decisions. These decisions seemed to be that men simply had a better chance of being accepted. So, at least the abstract model would be accurate. Alternatively, perhaps there were just more examples of male applicants, or some other unfortunate circumstance caused the abstract model not to be a good generalization of the example data.

At its best, however, such an approach is analytical in nature. It shows the patterns of our sample data and their backgrounds, meaning that men performed better on job applications. If that matches our desired orientation, there is no further problem. But what if it doesn’t? That’s what we’re assuming and Amazon was of that opinion as well, since they scrapped the system.

MYRIAD OF TOOLS & FRAMEWORKS

Tools, APIs & Frameworks

Learn more

Pre-assumptions, aka: Priors

How to provide a machine learning system additional information about our desired alignment in addition to sample data has been commonly understood for a long time. This is used to provide world or domain knowledge to the system to guide and potentially simplify or accelerate training. You support the learning process by specifying which domain to look for abstract patterns in the data. Therefore, a good abstract pattern can be learned even if the sample data describes it inadequately. In machine learning, data being an inadequate description of the desired abstract model is the rule, rather than the exception. Yann LeCun, a celebrity on the scene, vividly elaborates on this in a Twitter thread [7].

This kind of previous assumption is also called a prior. An illustrative example of a prior is linearity. As an explanation, let’s take another application example. For car insurance, estimating accident risk is crucial. For an estimation, characteristics of the drivers and vehicles to be insured are collected. These characteristics are correlated with existing data on accident frequency in a machine-learning model. The method used for this is called supervised learning, and it is the same as described above.

For this purpose, let us assume that the accident frequency increases linearly with increased distance driven. The more one drives, the more accidents occur. This domain knowledge can be incorporated into the training process. This way, you can hope for a simpler model and potentially even less complex training. In the simplest case, linear regression [8] can be used here, which produces a reasonable model even with little training data or effort. Essentially, training consists of choosing the parameters for a straight line, slope, and displacement, to best fit the training data. Because of its simplicity, the advantage of this model is its good explainability and low resource requirement. This is because a linear relationship, “one-to-one”, is intellectually easy, and a straight-line equation can be calculated on a modern computer with extremely little effort.

However, it is also possible to describe the pattern contained in the training data and correct it normatively. For this, let us assume that the relationship between age and driving ability is clearly over-linear. Driving ability does not decline in proportion to age, but at a much faster rate. Or, to put it another way, the risk of accidents increases disproportionately with age. That’s how it is in the world, and that’s what the data reflects. Let’s assume that we don’t want to give up on this important influence completely. However, we equally want to avoid excessive age discrimination. Therefore, we decide to allow a linear dependence at most. We can support the model and align it with our needs. This relationship is illustrated in Figure 3. The simplest way to implement this is the aforementioned linear regression.

Fig. 3: Normative alignment of training outcomes

Now, you could also argue that models usually have not only one input, but many, which act in combination on the prediction. Moreover, in our example, the linear relationship between distance driven and accident frequency does not need to be immediately plausible. Don’t drivers with little driving experience have a higher risk? In that case, you could imagine a partial linear relationship. In the beginning, the risk decreases in relation to the distance driven, but then it increases again after a certain point and remains linear. There are also tools for these kinds of complex correlations. In the deep learning field, TensorFlow Lattice [9] offers the possibility of specifying a separate set of alignments for each individual influencing factor. This is also possible in a nonlinear or only partially linear way.

In addition to these relatively simple methods, there are other ways to influence. These include the learning algorithms you choose, the sample data selected, and, especially in deep learning, the neural network’s architecture and learning approach. These interventions in the training process are technically challenging and must be performed sparingly under supervision. Depending on the training data, otherwise, it may become impossible to train a good model with the desired priors.

Is all this not enough? Causal Inference

The field of classical machine learning is often accused of falling short. People say that these techniques are suitable for fitting straight lines and curves to sample data, but not for producing intelligent systems that behave as we want them to. In a Twitter thread by Pedro Domingos [10], typical representatives of a more radical course such as Gary Marcus and Judea Pearl also come forward. They agree that without modeling causality (Causal Inference), there will be no really intelligent system or AI Alignment.

In general, this movement can be accused of criticizing existing approaches but not having any executable systems to show for themselves. Nevertheless, Causal Inference has been a hyped topic for a while now and you should at least be aware of this critical position.

THE PECULIARITIES OF ML SYSTEMS

Machine Learning Advanced Developments

Learn more

ChatGPT, or why 2023 is a special year for AI and AI Alignment.

Regardless of whether someone welcomes current developments in AI or is more fearful or dismissive of them, one thing seems certain: 2023 will be a special year in the history of AI. For the first time, an AI-based system, ChatGPT [11], managed to create a veritable boom of enthusiasm among a broad mass of the population. ChatGPT is a kind of chatbot that you can converse about any topic with, and not just in English. There are further articles for a general introduction to ChatGPT.

ChatGPT is simply the most prominent example of a variety of systems already in use in many places. They all share the same challenge: how do we ensure that the system does not issue inappropriate responses? One obvious approach is to check each response from the system for appropriateness. To do this, we can train a system using sample data. This data consists of pairs of texts and a categorization of whether they match our alignment or not. Operating this kind of system is shown in Figure 4. OpenAI, the producer of ChatGPT, offers this functionality already trained and directly usable as an API [12].

This approach can be applied to any AI setting. The system’s output is not directly returned, but first checked for your desired alignment. When in doubt, a new output can be generated by the same system, another system can be consulted, or the output can be denied completely. ChatGPT is a system that works with probabilities and is able to give any number of different answers to the same input. Most AI systems cannot do this and must choose one of the other options.

As mentioned at the beginning, we as a society still need to clarify which systems we consider risky. Where do we want to demand transparency or even regulation? Technically, this is already possible for a system like ChatGPT by inserting a kind of watermark [13] into generated text. This works by selecting words from a restricted list and assuming that a human making this specific combination has an extremely low probability. This can be used to establish the machine as the author. Additionally, the risk of plagiarism is greatly reduced because the machine – imperceivable to us – does not write exactly like a human. In fact, OpenAI is considering using these watermarks in ChatGPT [14]. There are other methods that work without watermarks to find out whether a text comes from a particular language model [15]. This only requires access to the model under suspicion. The obvious weakness is knowing or guessing the model under suspicion.

Fig. 4: A moderation system filters out undesirable categories

Conclusion

As AI systems become more intelligent, the areas where they can be used become more important and therefore, riskier. On the one hand, this is an issue that affects us directly today. On the other hand, an AI that wipes out humanity is just material for a science fiction movie.

However, targeting these systems for specific goals can only be achieved indirectly. This is done by selecting sample data and Priors that are introduced into these systems. Therefore, it may also be useful to subject the system’s results to further scrutiny. These are issues that are already being discussed at both the policy and technical levels. Neither group, those who see AI as a huge problem, and those who think no one cares, are correct.

Links & References

[1] https://en.wikipedia.org/wiki/AI_alignment

[2] https://www.bbc.com/news/technology-45809919

[3] https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G

[4] https://www.derstandard.de/story/2000142763807/chatgpt-so-koennte-kuenstliche-intelligenz-die-menschheit-ausloeschen

[5] https://www.ceps.eu/wp-content/uploads/2021/04/AI-Presentation-CEPS-Webinar-L.-Sioli-23.4.21.pdf

[6] https://en.wikipedia.org/wiki/Social_Credit_System

[7] https://twitter.com/ylecun/status/1591463668612730880?t=eyUG-2osacHHE3fDMDgO3g

[8] https://en.wikipedia.org/wiki/Linear_regression

[9] https://www.tensorflow.org/lattice/overview

[10] https://twitter.com/pmddomingos/status/1576665689326116864

[11] https://openai.com/blog/chatgpt/

[12] https://openai.com/blog/new-and-improved-content-moderation-tooling/

[13] https://arxiv.org/abs/2301.10226 and https://twitter.com/tomgoldsteincs/status/1618287665006403585

[14] https://www.businessinsider.com/openai-chatgpt-ceo-sam-altman-responds-school-plagiarism-concerns-bans-2023-1

[15] https://arxiv.org/abs/2301.11305

The post AI Alignment appeared first on ML Conference.

Google Bard: The Answer to ChatGPT?

lseidler — Tue, 14 Feb 2023 08:37:50 +0000

What is Google Bard?

Google Bard is an experimental, conversational artificial intelligence based on the Large Language Model LaMDA. This is how Google describes the technology on the Google blog. It is intended to make it possible to gain access to complex information and facts in dialog form. An example from the developer’s blog Preparing discoveries from the James Webb Space Telescope for a 9-year-old child.

What distinguishes Google’s Bard from OpenAI’s ChatGPT is that Bard can fall back on current information from the Internet right from the start. Since ChatGPT is based on the GPT-3.5 model, it only knows about texts written up to mid-2021 and can only fall back on these. Our expert Pieter Buteneers told us in an interview about ChatGPT that it is “basically a summary of the Internet until the end of 2021”. Adding up-to-date information from the Internet is the next big step, he said, “If ChatGPT can search the Internet, Google and Stack Overflow will be history. It makes sense to assume that will happen.” That’s the path Google is now apparently taking with Bard.

Microsoft, which is said to have recently invested 10 billion dollars in OpenAI, will likely announce soon that ChatGPT will be integrated into the search engine Bing. Currently, Google is and remains largely unrivaled in the search engine business. In December 2022, Google’s search engine had a market share of just under 85%, followed in second place by Microsoft’s Bing, with just under 9%. With the new developments, however, some movement could occur, depending on who succeeds in implementing the respective AI better.

Google’s conversational AI Bard is currently not yet available to the public, but only to a select group of “trustful testers”. In the coming weeks, the AI will be made available to a wider audience. To what extent the AI will be usable for free remains to be seen. ChatGPT has only recently introduced a payment model that promises better access and shorter loading times for $20 per month.

Stay up to date

Learn more about MLCON

The models: GPT and LaMDA

An AI is only as good as the model it is based on. OpenAI’s GPT model (“Generative Pretrained Transformer”), on which ChatGPT is based, has been in development for some time. GPT-3 was introduced in mid-2020 and is a Large Language Model (LLM) that was initially designed to complete texts. In an interview, Christoph Henkelmann told us more about GPT:

“GPT stands for “Generalised Pretraining for Transformers.”GPT is a family of architectures and various resulting models. Many neural networks follow this architectural pattern. GPT models are trained to simply complete text. They perform the same functions as old Nokia phones with T9. GPT attempts to predict what the next text block will be as you type. To train GPT, you collect vast amounts of text, for example, with web scraping. Then you let a neural network make predictions, sometimes for months.”

This LLM was fine-tuned to dialog progressions, eventually modeling it into ChatGPT.

Google’s LaMDA (“Language Model for Dialogue Applications”) was released in May 2021. It is an LLM that was trained on dialog progressions from the beginning. Earlier in 2018, Google presented another model called Bert (“Bidirectional Encoder Representations from Transformers”). Similar to the GPT model, Bert is based on transformers, a technology that uses a neural network architecture to fill in cloze texts: “(…) with BERT, the big breakthrough came because developers trained the model to fill in missing text parts. If you want to fill in a cloze, first you have to understand what the text is about.” That’s what Pieter Buteneers told us in an interview about Dall-E and image-generating AI.

Buteneers continues, “The big breakthrough that BERT brought was to take the old models that work well on a CPU and put them into a new architecture that computes fast on a GPU. That’s how we were able to make these huge leaps forward. Since 2018, these models have gotten bigger and smarter, so you can do more and more exciting things with them.”

MYRIAD OF TOOLS & FRAMEWORKS

Tools, APIs & Frameworks

Learn more

What is Google doing in the AI field?

With all the excitement about ChatGPT, there hasn’t been much talk about the progress Google has already made in the AI field, especially with regard to LLMs. As previously mentioned, Google is at the forefront of natural language processing with BERT and LaMDA.

Google is also active in the area of text-to-image AIs with Imagen and even claims to produce better results than OpenAI’s Dall-E when measured against human evaluation. Work is also underway to extend Imagen to video generation. Music is also no longer out of reach for Google’s AI. With MusicLM, Google Research introduces artificial intelligence that can generate music based on text input. There is no accounting for taste, but the technical implementation is highly exciting.

Considering how incredibly versatile Google is in the artificial intelligence field, it’s fair to ask how the tech giant seems to have been overtaken by OpenAI. Not that OpenAI is a small software company or an underdog. Its market value is in the tens of billions.

However, due to Google’s dominance for years, you can still wonder how it came to this. Our expert Pieter Buteneers also told us in an interview: “I know from my contacts at Google Deep Mind that there was a bit of a crisis in the team there when GPT-3 was released. They were desperately trying to find a solution as quickly as possible. And that was just GPT-3. But with ChatGPT, I think the whole company is shaking, not just Google Deep Mind, but all of Google, or at least those who understand what ChatGPT can do.” CNBC also reported panic in Google’s executive suite.

The CNBC article reported Google’s concern of losing their reputation, which is understandable. As linguistically convincing as OpenAI’s ChatGPT may be, the AI can reproduce misinformation. This is due to the model itself, says Christoph Henkelmann: “The answer is not the result of a conclusion, but the result of training. ChatGPT has seen texts, (…) and learned from them. The closer my wording was to the ‘read’, the better the answer was.” Henkelmann went on to tell us, “It doesn’t even understand that it’s answering a question. It simply wants to complete or continue the text. And that’s the problem: GPT will continue the text even if it doesn’t have an answer. The model is not able to reflect that. There will always be a text that is statistically a likely answer for GPT, and so it can’t state, ‘I don’t know something.’”

For Google, which primarily earned its reputation as an accurate and precise search engine, this is a major problem. Additionally, certain limits have to be imposed on an AI like ChatGPT, which, for example, prohibit it from playing out or generating certain information. Therefore, ChatGPT does not answer some queries at all if they violate the guidelines. If you ask the AI how to build a bomb, you will (fortunately) not get the answer you were hoping for.

And that doesn’t even touch on the issue of sexist, racist, or other discriminatory and misanthropic outputs of an AI, which in part only reproduces what it already has. Microsoft had a similar experience in 2016 with its chatbot Tay.

We experienced similar concerns during the release of OpenAI’s image AI Dall-E, when accusations were made that the release was irresponsible because restrictions were missing. Image generation by artificial intelligences in particular presents the problem of ethical and moral boundaries in the truest sense of the word in a very graphic way. The concern that an AI trained and managed by Google could produce reprehensible results of whatever kind would be reputationally damaging is justified.

However, it is much more likely that the new chatbots are a problem for Google’s business model, which is largely financed by advertising revenue. It’s not impossible that Bard or even ChatGPT as a Bing implementation could phase out advertising. Classic search engine advertising is based on clicks, and these could fail to materialize if the AI response is enough for users.

If the answer to a question is resolved on Google itself, there is much less motivation for end users to visit or look at the actual source. This is not only a problem for Google, which promises to drive traffic to the advertisers’ site through ad placement, but also for the operators of websites themselves, whose content may now only be rendered by an LLM. The fear is that the source will not be visited.

Of course, Google already displays answers to certain questions directly on their site. But it is clear that the search engine advertising model, at least in its current form, is no longer sacrosanct and will undergo a transformation if the search engine operators want to implement AI.

THE PECULIARITIES OF ML SYSTEMS

Machine Learning Advanced Developments

Learn more

What’s next?

The release of ChatGPT, which hit many experts unexpectedly, has marked a new battleground for the tech giants. This is not a reason to panic yet, nor is it a reason to worry about an AI revolution. However, the developments are likely to be especially exciting in terms of how we search for information on the Internet in the future and how we are presented with the results. It remains to be seen what this means for the major search engine operators’ business models, content optimization, and SEO.

At least for us as viewers, turning the search engine business upside down on the way to artificial general intelligence (AGI) is pretty interesting. We will see what long-term challenges and improvements for users will result from this little AI war.

As long as Bard is not yet publicly available, it’s hard to evaluate what the AI is really capable of. We don’t know yet if it is as user-friendly as ChatGPT. If the rumors are true, we should learn more about possible implementation of ChatGPT in Bing very soon. Until then, we will have to be patient, but we can be excited about the very near future.

Be part of this exchange with leading experts at the heart of current AI developments, at MLCon Munich 2023.

The post Google Bard: The Answer to ChatGPT? appeared first on ML Conference.

ChatGPT: The Big Disruptor?

lseidler — Wed, 01 Feb 2023 10:36:45 +0000

History shows us that new technological milestones which have the potential to bring about real change are usually accompanied by a, sometimes alarmist, public debate. This was already the case with the spread of newspapers and magazines in the 18th century and with the introduction of the railroad for passenger transport in the 19th century, now it is the case with ChatGPT and advancements in ML and AI.

Time and again, the term disruptive technology is used to refer to such a new milestone in technological progress. Often, however, the term is used inflationary, and mainly by business-minded and marketing-savvy entrepreneurs. Rarely are the products or services mentioned in the same breath real disruptions. ChatGPT may be different.

Because such technologies, which provoke a collective change of mind, a great self-questioning, and imitation, are not self-promoters’ intention. They are real and have a very direct influence on our actions, our thinking, and can cause the upheaval of an entire industry. They reach the very core of our collective experience and existence. But by definition, these disruptors are not single technologies, but rather processes.

Stay up to date

Learn more about MLCON

We are currently experiencing the beginning of such a process, the beginning of radical change caused by the advancement of key technologies in the field of artificial intelligence and machine learning. The leading example of this new wave of technologies, of course, is ChatGPT by OpenAI, an extremely user-oriented exemplification of artificial intelligence, that for many, marks a turning point in the advancement of artificial intelligence or even a milestone on our way to artificial general intelligence (AGI). ChatGPT has surprised even veteran experts and has shaken their confidence in predicting future developments.

But today, such technologies are not only visible to experienced AI and ML specialists, but to complete laymen, and especially to large tech companies, which sense their opportunity to be at the core of the next big thing and at the heart of the next megatrend. While this is of course driven by monetary incentives and business decisions, it is at its core about staying ahead of the curve and becoming part of and shaping a technological revolution that will change the lives of generations to come.

And these developments are not happening in the tech bubble exclusively. Mainstream news outlets have been reporting on exciting but also concerning developments in artificial intelligence for some time now. And even though you can accuse some of them of jumping on a hype train, the wheels are not only in motion, but moving billions of dollars, resources, and people in a very short time:

Microsoft is investing $10 billion in OpenAI, Google’s parent company Alphabet is bringing back its two founders, and Meta is shifting more and more budget toward their AI department. Artificial intelligence has long ceased to be a technical gimmick of hermitical developers or even the pipedream of a generation of science fiction-influenced entrepreneurs. Artificial intelligence is currently electrifying an entire industry and, beyond that, modern societies (not to mention the impact on education and other public sectors).

This dynamic inspires us to further develop our ML Conference as a space where like-minded people can exchange ideas. Not only to advance their careers in the field of AI and ML but to push themselves, their projects, or their company to the edge of innovation, to develop great and practical innovations that can change the course of a business or redefine strategic goals.

What we see exemplified in ChatGPT is a collection of different technologies, strategies, and models that can be applied in almost every department, no matter how mundane. Of course, ChatGPT is more than the sum of simple and available technologies. But at the same time, it is far from being a magical black box accessible only to a select few. As with many key technologies, success is not only reserved for the brilliant few minds of our time, but also for the people who work on implementing them in everyday areas, who embrace the challenges ahead of us and tap into the seemingly inexhaustible possibilities to change our lives for the better.

Sufficient application of ML and AI can range from fine-tuning models by ML developers to prompt engineering, the art of designing on-target text inputs that drive your machine learning model to optimal performance. With prompt engineering, the goal is to develop the perfect inputs to deliver the results your application or business needs. People implementing those techniques, fine-tuning models, and developing new and creative ways to work with new technologies are the real disruptors.

MYRIAD OF TOOLS & FRAMEWORKS

Tools, APIs & Frameworks

Learn more

With MLCon, we do not only want to do our part in this democratization of technologies. We want to introduce you to the most essential technologies, methods, and tools that can move you, your team and your company forward. We will help pave the way for responsible AI use and through workshops, sessions, and talks from our experts, help spread all currently available information on the topic.

The question of the future will not be how much of a part an AI plays in a particular product or service. The question of the future will not be how certain aspects of our daily work can be abbreviated or simplified through the clever use of various technologies. The question of the future will be which tasks will be taken over by AI completely, how that shapes our society, and what that means for us as humans.

Be part of this exchange with leading experts at the heart of current AI developments, at MLCon Munich 2023.

The post ChatGPT: The Big Disruptor? appeared first on ML Conference.