LLM Archives - ML Conference

Can GenAI Build Satellites?

adharaneedharan@sandsmedia.com — Thu, 17 Jul 2025 08:09:40 +0000

When we think of Satellites we imagine cutting-edge technology, precision engineering, and brilliant minds at work.

What we rarely see is the chaos behind the scenes: thousands of pages of documents, conflicting requirements, scattered diagrams, and complexity that grows faster than humans can manage. Modern systems engineering isn’t just about designing hardware — it’s about managing an overwhelming flood of information.

How does GenAI Helps System Engineers Stay Ahead of Complexity?

1. Writing Better Requirements

Projects start with requirements — and often, they’re a mess. Vague sentences like “The car should drive fast” cause confusion down the line.

GenAI reviews these early, flags weak spots, and helps rewrite them clearly and precisely. This saves time, prevents misunderstandings, and reduces risk later.

2. Finding Duplicates and Patterns

Large projects mean thousands of requirement sentences. Some repeat. Some contradict.

AI analyzes this data, finds duplication, flags inconsistencies, and helps keep things clean and organized.

3. Designing System Diagrams

Once requirements are sorted, diagrams follow.
With Generative AI, engineers give simple prompts:

“Connect sensor, battery, data logger.”
AI draws it.
“Add brightness control.”
It updates.
No wasted hours redrawing from scratch.

4. RAG: Smarter Document Search for Engineers

PDF overload is real. Retrieval-Augmented Generation (RAG) lets AI search internal documents safely and quickly — no data sent to the cloud.

“What’s the max voltage?”
AI finds it. Work moves forward.

Watch the Full Session on YouTube:

Join “Building an Agentic RAG Pipeline” live at MLCon NY 2025 to see how it’s already changing the game.

The post Can GenAI Build Satellites? appeared first on ML Conference.

RAG & MCP for ML Devs: Practical AI Implementation

adharaneedharan@sandsmedia.com — Mon, 07 Jul 2025 07:21:42 +0000

Join us at MLCon New York to dive into these topics even deeper and hands on.

Business & Culture: https://mlconference.ai/machine-learning-business-strategy/ai-native-software-organization-technical-operational-practices/

https://mlconference.ai/generative-ai-content/gen-ai-operational-metrics/

https://mlconference.ai/machine-learning-business-strategy/operationalizing-ai-leadership-sprints-workshop/

GenAI & LLM: https://mlconference.ai/generative-ai-content/building-agentic-rag-pipeline/

https://mlconference.ai/generative-ai-content/agentic-ai-workshop-deep-research-llms/

devmio: Hello everyone, we’re live from JAX in Mainz, and I am here with Robert Glaser from INNOQ. Hello Robert.

Robert Glaser: Hi Hartmut, I’m glad I could make it.

devmio: Great. You’re here at JAX talking about AI. A controversial topic for some. Some love it, some are a little put off by it. Of course, there are benefits from it. Where do you see the benefits of AI?

Robert Glaser: I get that question a lot, and I usually disappoint people by saying, “I can’t give you a list of use cases in an hour.” Because use cases are a dime a dozen, since we’re dealing with a new kind of technology. Think of electricity, steam engines, or the internet. People didn’t immediately know, “OK, here are 100 use cases we can do with this and not with that.”

Everyone in every department, in every company, has to find out how well or how poorly it fits with the current state of technology. There are formats and possibilities for finding use cases. Then you also evaluate the potential outcome, look at what the big levers are, what is likely to bring us the most benefit when we implement it? And that’s how I would approach it. Whenever someone promises to name AI use cases for your business in a 30-minute conversation, I would always be cautious.

devmio: Now you have a use case, a RAG system, retrieval of augmented generation. Can you explain how a RAG works? Maybe first, what exactly is it? And then, of course, the question. Yes, for you out there, is this a use case that is exciting for you? What are the benefits of a RAG system?

Robert Glaser: Yes, I can explain briefly. A RAG is not a new concept at all. It comes from an academic paper from 2005. At that time, people weren’t talking about foundation models or large language models like we are today.

RAGs come from the domain of Natural Language Programming, which is how it’s used today. How do I get context for my AI model? Because the AI model only becomes interesting when I connect it to my company data. For example, in Confluence or some tables or databases or something like that. Then I can use this technology in conjunction with my data. And RAG is an architecture, if you want to call it that, which allows me to do just that.

That’s changing right now, too. But the basic principle that I’m showing in my presentation is that, in principle, a large part of the RAG architecture is information retrieval. All of us who work in tech have known this for many decades. A search, for example, is an information retrieval system.

This is often at the center of RAG and is the most interesting point, because the classic RAG approach is nothing more than pasting search results into my prompt. That’s all there is to it, it’s also something I mention in my talk. Well, we can end the presentation here. There’s nothing more to it. You have your prompt, which says, for example: Answer this question, solve this problem using the following data.

The data that comes in and that it is also relevant. That’s the crux of the matter. That’s what RAG deals with. For the most part, it’s a search problem: How do I cut my body of data into pieces that fit together? I can’t always just shovel everything in there. Models like this have a context window; they can’t digest endless amounts of data.

It’s like with an intern: if I pile the desk full, they may be a highly qualified intern, but they’ll get lost if the relevant documents aren’t there. So, I have to build a search that gets the relevant document excerpts for the task that the LLM is given. And then it can produce an answer or a result and not fantasize in a vacuum or base it on world knowledge, but rather on my company’s knowledge, which is called grounding. The answers are grounded, citation-based, like how a journalist works with facts. They must be there.

devmio: -and with references!

Robert Glaser: Exactly, then I can also say what the answer should look like. Please don’t make any statements without a source. That helps too. I’ll show this best practice in my talk.

devmio: So, you have this company data linked to the capabilities of modern LLMs. How do I ensure that the company data remains secure and is not exploited? It becomes publicly accessible via LLM.

Robert Glaser: Exactly. In principle, you will probably build some kind of system or feature or an internal product. In the rarest of cases, you will probably just connect your corporate customers to ChatGPT—you could do that.

There’s a lot going on right now, for example, Anthropic is currently building a Google Drive connection or a Google Mail connection. This allows you to search your Google Mails, which is incredibly powerful. But maybe that’s not always what you want as a company. That’s why you have to build something, you have to develop something. If it’s a chatbot, then it’s a chatbot. But every AI feature can potentially be augmented with RAG.

You have to develop software, and the software probably uses an AI model via an API. Either this AI model is in the cloud or in your own data center with a semi-open model, and then you must consider: how do I get the data from the search system or from the multiple search systems into the prompt. This involves a relatively large amount of engineering—classic engineering work. We can expect that developers of such systems will simply build them more often in the future.

devmio: Good, and another point that is becoming increasingly important is AI agents. These are agents that can do more than just spit out a result after a prompt. In this context, the MCP protocol, or model context, is often mentioned. How does the model context protocol work exactly? Maybe you could explain MCPs and how they work?

Robert Glaser: We’ve been having a lot of conversations about this at our booth downstairs. It’s not without reason that people are so interested in this topic; there’s enormous hype surrounding it. It’s an open standard developed by the company Anthropic, which also trains the model, and it was released last November. And at the time of release, nobody was really interested in it. Right now, about a month ago, AI time is also somehow compressed in my head– it’s going so fast.

Every day, there are hundreds of new MCP servers for a wide variety of services and data sources. In principle, MCP is nothing more than a very simple specification for a protocol that I use to connect a service or data to my AI model. Some call it “USB-C for AI models”. I personally think that’s a pretty good metaphor, even if it’s not 100% technically correct, but the plug always looks the same.

But it doesn’t matter what I plug into a device, be it a hard drive, USB stick, or alarm siren. The protocol is always the same. What it does is stay in the metaphorical world. It gives AI models hands and feet to do things in the real world. Tools are a concept of this protocol, i.e., tool usage. But there are also several other things.

These things can also simply provide prompts, which means I could write in MCP Server, which acts as a prompt library. Then the LLM can choose for itself which prompts it wants to use.

Something very interesting is tool use, also nothing new, but Foundation models have been trained on tool usage for a long time. But until now, I always had to tell them: “Look, here’s an open API spec for an API that you can use in this case, and you use it like this, and then there’s the next tool with a completely different description.”

This tool might be a search function, which would have a completely different API that you use like this. And then you can operate on the arrow system, and you do that with STANDARD in STANDARD out. Imagine you need an AI agent or an AI feature and have to describe 80 individual tools!

Every time you start from scratch and with each new tool, you must write a new description. The nice thing about MCP is that it standardizes this and acts as a facade, so to speak. The foundation model or the team that builds an AI feature doesn’t need to know how these services and data work or how to use them, because the MCP server takes care of that and exposes the functionality to the AI model using the same protocol.

It’s also nice if, for example, I’m a large company and I have one team building the search function in the e-commerce shop and another team building the shopping cart, then we are very familiar with their domains. This makes it very easy for them to write an MCP server for the respective system. Then I could say, for example: Dear shopping cart team, please build an MCP server for the shopping cart because we want to roll out an AI feature in the shop and need access to these tools.

So, it also fits in nicely with a distributed team architecture like a kind of platform through which I can access a wide variety of tools and LLMs. I could also build a platform for this, but basically, it’s just small services, called servers, that I simply connect to a model. And I don’t always have to describe a different interface, but always the same one.

I don’t have to wait for Open AI to continue training its models so that the MCP can use them, because we all know that I can simply enter the MCP specifications now and then the model can learn from the context on how to use them. I’m not dependent on having to use a model right now. I can teach them everything.

devmio: Yes, what I’m noticing is that MCPs have really taken off. Among other things, because it’s not just one topic, but also Open AI and Google, I think. And others are now on board and supporting it, so there’s a kind of standardization.

Robert Glaser: It’s like how the industry agreed on USB-C or, back then, A. That was good for everyone because then you no longer had 1,000 cables. It’s like how I’m developing an AI feature now and the possibilities are basically endless. People like to think only in terms of data sources or services, but a service can be anything, it can be an API from a system that returns bank transactions or account transactions to me. Here’s a nice example that I always use: There’s an MCP server for Blender, a 3D rendering software, which I can use and then I can simply ask: render a living room with a nice lamp and a couch with two people sitting on it.

Then you can see how this model uses tools via the PC server and then creates this 3D scene in Blender. That’s the range of possibilities, truly endless.

devmio: Now we’re at a JAX, a Java conference. That’s really an AI topic, so how does it relate to Java? Maybe we’ll come back to that later. Java is a language that isn’t necessarily known for being widely used in machine learning; Python is faster. Is Java well positioned in this area?

Robert Glaser: To use these services, or maybe that’s not even necessary, but these are all just API calls. Yes, in fact, with the foundation models, he has also introduced a huge commodity factor. In the past, I had to train my own machine learning models. I had to have a team that labeled data and so on, and then it was still a case of narrow in, narrow out, now broad in, broad out.

I have models that can utilize everything from textual data to molecular structures or audio and video, because everything can be mapped into a sequential stream. And the models are generalists which can always be accessed via API.

Even the APIs are becoming standardized because all manufacturers are adapting the Open AI API spec. That’s a nice effect. That’s why there’s a great ecosystem in both the Java world and the Ruby or Python world.

I’ll just mention Spring AI. I could have started a year ago and built AI features into my applications. Spring AI, for example, has a range of functions that allow me to configure models flexibly. I can say, go to Open AI or use a semi-open model in our data center. Everything is already there

devmio: But the community there is so agile, creative, and innovative that solutions are emerging and already available.

Robert Glaser: Another aspect with MCPs, if I want to build a feature or a system with Java around my foundation model that other systems want to integrate via MCP, there is a Java SDK, there is an SDK, everything is there.

devmio: We’re excited to see how things progress. It’s sure to continue at a rapid pace.

Robert Glaser: You must read a hundred blogs every day to keep up with all of the AI innovations. It’s breathtaking to see the progress that’s being made. It’s nice to see that things are happening in tech again.

devmio: Thank you very much for that. For the insights here.

Robert Glaser: You’re very welcome. Thank you for letting me be here.

The post RAG & MCP for ML Devs: Practical AI Implementation appeared first on ML Conference.

Deep Learning with Category Theory for Developers

skansal — Tue, 29 Apr 2025 09:31:46 +0000

Deep learning frameworks like TensorFlow or PyTorch hide the complex machinery that makes neural network training possible. Specifically, it’s not a Python program running in the background that defines the network, but a graph. But why is this the case?

Anomaly detection in production processes

The starting point of this article is the development of a product for detecting anomalies in the sensor data of industrial production systems. These anomalies can signal issues that would otherwise go undetected for a longer time. A very simple example is a cold storage facility for food: if the temperature rises despite electricity flowing into the cooling unit, something is wrong—but the food can still be saved if the change is detected in time.

Unlike in the example, in many systems the relationship between sensor values is too complex to know in advance exactly what an anomaly looks like. This is where neural networks come into play: they are trained on “normal” sensor data and can then report deviations in the relationship between sensor values or over time.

As a customer requirement, the neural networks for this application must run on machines with limited computing power and memory. This rules out heavy frameworks like TensorFlow which are too large and memory intensive.

Graphs for neural networks

In Listing 1, you can see a simple neural network in Python, an autoencoder implemented with the TensorFlow framework. If you typically write analytics code in Python, you might be surprised that TensorFlow includes its own operations for matrix calculations instead of using established libraries like Pandas. This is because tf.matmul doesn’t perform any calculations. Instead, it constructs a node in a graph that describes the neural network’s computation function. This graph is then translated into efficient code in a separate step – for example, on a GPU.

Listing 1

def __call__(self, xs):

  inputs  =  tf.convert_to_tensor([xs], dtype=tf.float32)

  out = tf.matmul(self.weights[0], inputs, transpose_b=True) + self.biases[0]

  out = tf.tanh(out)

  out = tf.matmul(self.weights[1], out) + self.biases[1]

  out = tf.nn.relu(out)

  out = tf.matmul(self.weights[2], out) + self.biases[2]

  out = tf.tanh(out)

  return tf.matmul(self.weights[3], out) + self.biases[3]

However, graph serves another purpose: training the neural network requires its derivative, not the function describing the network. The derivative is calculated by an algorithm that also relies on the graph. In any case, this graph machinery is complex; to be specific, it involves a complete Python frontend that translates the Python program into its graph. In the past, this was a task deep learning developers had to do manually.

What is a derivative and what is it for?

Neural networks (NNs) are functions with many parameters. For an NN to accomplish wondrous tasks such as generating natural language, suggesting suitable music, or detecting anomalies in production processes, these parameters must be properly tuned. As a rule, an NN is evaluated on a given training dataset, and a number is calculated based on the outputs to express their quality. The derivative of the composition of the NN and the evaluation function with respect to the NN’s parameters provides insight into how the parameters can be adjusted to improve the outputs.

You can imagine it as if you’re standing in a thicket in a hilly landscape, trying to find a path down into the valley below. The coordinates where you are currently standing represent the current parameter values, and the height of the hilly landscape represents the output of the evaluation function (higher values indicate a larger deviation from the desired result). You can only see your immediate surroundings and you don’t know ultimately in which direction the valley lies. However, the derivative points in the direction of the steepest ascent—if you move a short distance in exactly the opposite direction and repeat this process from there, the chances are high that you will eventually reach the valley. The main job of deep learning frameworks (DLFs) in this iterative process (known as gradient descent, by the way), is to repeatedly compute derivatives and adjust the parameters accordingly.

Derivatives

For a differentiable function f : ℝ → ℝ the definition of the derivative at a point a as taught in school mathematics, is:

In this definition, the derivative represents the rate of change at the point a. Training a neural network involves functions from a high-dimensional space—where the parameters reside—into the real numbers, which represent the network’s performance. For a function like f : ℝm → ℝ, the partial derivative with respect to the i-th component is defined by holding all other components constant:

For a differentiable function, the derivative can then be expressed as a vector of partial derivatives. As will become clear in the next section, subcomponents of neural networks must also be differentiated. These typically produce higher-dimensional outputs as well. A function f: ℝm → ℝn consists of its component functions,f: x ↦ (f₁(x), …, f_n(x)); the derivatives of these component functions are defined just as before. For such a (totally) differentiable function, the derivative (Df) can be represented as the so-called Jacobian matrix, where the entry in the i-th row and j-th column is the partial derivative of the i-th component function with respect to the j-th input variable. All of these values describe how the corresponding parameters need to be adjusted.

Derivatives and Deep Learning

To calculate Jacobian matrices, DLFs use a process called Automatic Differentiation (AD), which automates the process that is manually performed in school mathematics. The chain rule plays a special role here, explaining how even complex functions can be differentiated. If two simpler functions f and g are composed in series (written g ∘ f) the derivative is as follows:

AD uses this rule and replaces the elementary components of a composition with their known, predefined derivatives. The individual components of the resulting expression can be evaluated in any order. Typically, this is done either right-associatively, meaning first Ｄf and then Ｄg (forward AD), or left-associatively, first Ｄg and then Ｄf (reverse AD).

The computational effort associated with the forward case depends on the input dimension, in this case, the number of parameters, which is often very large. While for the reverse case, it depends on the output dimension (i. e 1, which is the output of the evaluation function). Therefore, DLFs typically implement AD as “reverse AD”.

However, the algorithms used in DLFs for reverse AD are quite complex. This is mainly because the derivative of a component can only be calculated once the output of the preceding component is known. This is not an issue for the forward case, as a function and its derivative can be computed in parallel, with the results passed to the next component. However, in the backward case, the evaluations of the functions and their derivatives occur in opposite directions.

DLFs implement reverse AD using an approach that translates the entire implementation of an NN and the associated optimization method into a computational graph. To compute a derivative, this graph is first evaluated forward, and the intermediate results of all operations are recorded. The graph is then traversed backwards to compute the actual derivative. As described in the previous section, the result is a highly complex programming model. For interested readers, we recommend reading this article [1] for a detailed description of the mathematical principles behind AD algorithms, and this article [2] for an overview of the most common implementation approaches used today.

Derivatives revisited

However, derivatives can also be defined in a more general way: For (normalized, finite-dimensional) vector spaces V and W a function f : V → W is differentiable at a point a ∈ V0, if there exists a linear map Ｄf_a : V ⊸ W such that f(a + h) = f(a) + Ｄf_a(h) + r(h), where the error term r(h) = f (a + h) − f (a) − Ｄf_a(h) holds:

If such a map Ｄf_a, exists, it is unique and is called the derivative of f at a. This perspective treats derivatives as local, linear approximators, without focusing on details such as matrix representations, partial derivatives, or the like. While it is more general and abstract, it is simpler than the concept of the “derivative as a number”. Furthermore, a DLF can use it to compute derivatives without relying on graphs or complex algorithms.

Deep Learning with Abstract Nonsense

The ConCat framework for deep learning uses this new perspective (Elliott, Conal: “The simple essence of automatic differentiation” [3], [4], [5]), by interpreting the function that defines the neural network differently – namely, by computing its derivative rather than the original function itself. Another branch of mathematics, known as category theory—and affectionately called ‘abstract nonsense’ by connoisseurs—offers yet another way of looking at functions. It’s a treasure trove of useful concepts for software development, particularly in the realm of functional programming.

Category theory is, of course, about categories – a category is a small world containing so-called objects and arrows (“arrows” or “morphisms”) between these objects. The term “object” here has nothing to do with object-oriented programming. For example, functions form a category – the objects are sets that can serve as the input or output of a function. However, it’s important to note that objects don’t have to be sets and arrows don’t have to be functions either.

Category theory, then, “forgets” almost everything we know about functions and assumes only that arrows can be composed – that is, chained together. So, if there’s an arrow f from an object a to an object b and another arrow g from b to a third object c, then there must also be a uniquely defined arrow a to c, written g ∘ f. In the category of functions, this is simply function composition as described above, which is why the same symbol is used.

Composition must also satisfy the associative law, and there must be an identity arrow between any two objects — though this isn’t particularly relevant for our purposes here. Some categories also come with additional features, such as products which correspond to data structures in programming.

The analogy between arrows and functions in a computer program goes so far that every function can be expressed as an arrow in the category of functions. And here’s the clever trick behind the ConCat framework: it takes a function definition from a program, translates it into the language of category theory, but then “forgets” which category it came from. This makes it possible to target an entirely different category instead. In the case of deep learning, this is the category of function derivatives.

The idea, then, is to interpret a “normal” function as its derivative. Here’s an example: f (x) = x².

f no longer represents the square function itself, but its derivative Ｄf(x) = 2x. However, to allow for the definition of more complex functions, composition must be supported — meaning that from the two derivatives Ｄf and Ｄg of f and g respectively, the derivative of g ∘ f must be computable. This might work using the chain rule mentioned above, but it doesn’t rely solely on Ｄf and Ｄg. It also requires access to the original function f, as well as a representation of the derivative as a linear map. The ConCat framework therefore defines a category that pairs each function with its derivative. This way, the derivative is computed automatically whenever the function is called.

Thus, a function f : a → b turns into a function of the following form:

This means that for every input x from a, the result now includes both the original function value f (x) in b and the original derivative Ｄf, which is a linear map from a to b. This is now enough to express the chain rule “compositionally”, that is, in a way that makes the derivative of the composition, Ｄ (g ∘ f), depend solely on Ｄf and Ｄg. Here’s how it works:

Where

and

apply.

Category Theory and Functional Programming

The mathematical idea from the previous section can also be implemented in Python, but it’s particularly simple and direct in the functional language Haskell [6] which is what the ConCat framework is written in. It includes a literal translation of the mathematical definition; GD stands for General Derivative:

data GD a b = D (a -> b :* (a -+> b))

The chain rule also mirrors the mathematical definition (with \ denoting a lambda expression in Haskell):

D g . D f = D (\ x -> let (y,f’) = f x

(z,g’) = g y

in (z, g’ . f’))

This code forms the foundation of the deep learning framework, which comprises just a few hundred lines and comes bundled with ConCat. The framework also includes a Haskell compiler plug-in that automatically translates regular Haskell code into categorical form. You could write the code this way from the outset, but the plug-in makes the task much easier. This approach forms the core of the production system used for the anomaly detection described at the beginning.

What about reverse AD?

But there’s one more thing: deep learning requires the reverse derivative, and Ｄ only gives the forward derivative. The reverse derivative is obtained using a surprisingly simple trick—also from category theory. You can create a new category from an existing one by simply reversing the arrows. To do this, the linear functions a -+> b in the definition of GD only needs to be replaced with:

data Dual a b = Dual (b -+> a)

Dual Dual also includes correspondingly “reversed” definitions of the linear functions along with a reversed version of the composition. But then the reverse AD is ready, without any complicated algorithm.

High Performance with Composition

Reverse AD removes one of the reasons why TensorFlow must take the complicated route with a computational graph. But there’s still a second one: execution on a GPU. So far, the code still runs as regular Haskell code on the CPU. Fortunately, there’s Accelerate [7], a powerful Haskell library that enables high-performance numerical computing on the GPU.

Accelerate only has one catch: it can’t run standard Haskell code that operates directly on matrices. A function that transforms a matrix of type Typ a into a matrix of type b has the following type in Accelerate:

Acc a -> Acc b

Acc is a data type for GPU code that Accelerate assembles behind the scenes. Accordingly, an Accelerate program cannot use the regular matrix operations, but must instead rely on operations that work on Acc. This is similar to the Python program that generates a graph. That’s just the nature of GPU programming. Here, it doesn’t leak into the definitions of the neural networks, or their derivatives and it has no impact on how the ConCat framwork is used.

ConCat has a solution for this too – the Accelerate functions can be defined as a category as well. So, all it takes is one more application of ConCat to the result of reverse AD, and the result is high-performance GPU code.

Summary

Understanding the foundations of deep learning isn’t all that hard when using the right kind of math. It’s easy to get lost in the weeds of Jacobian and reverse AD matrices, but category theory offers a more abstract and elegant perspective that, remarkably, also captures reverse AD with surprising ease. At the same time, the math also serves as a blueprint for implementation – at least in the right programming language, such as the functional language Haskell. Haskell is well worth a closer look, especially for deep learning – though its strengths go far beyond that.

Links & Literature

[1] Hoffmann, Philipp: “A hitchhiker’s guide to automatic differentiation”: https://arxiv.org/abs/1411.0583

[2] Van Merriënboer, Bart et al: “Automatic differentiation in ML. Where we are and where we should be going”: https://arxiv.org/abs/1810.11530

[3] Elliott, Conal: “The simple essence of automatic differentiation”: http://conal.net/papers/essence-of-ad/

[4] Conal Elliott: “Compiling to categories”: http://conal.net/papers/compiling-to-categories/

[5] ConCat Framework: https://github.com/compiling-to-categories/concat

[6] Haskell: https://www.haskell.org

[7] Accelerate: https://www.acceleratehs.org

The post Deep Learning with Category Theory for Developers appeared first on ML Conference.

How Ollama Powers Large Language Models with Model Files

skansal — Mon, 06 Jan 2025 10:23:52 +0000

Ollama revolutionizes MLOps with a Docker-inspired layered architecture, enabling developers to efficiently create, customize, and manage AI models. By addressing challenges like context loss during execution and ensuring reproducibility, it streamlines workflows and enhances system adaptability. Discover how to create, modify, and persist AI models while leveraging Ollama’s innovative approach to optimize workflows, troubleshoot behavior, and tackle advanced MLOps challenges.

Synthetic chatbot tests reach their limits because the models generally have little contextual information about past interactions. With the layer function based on Docker, Ollama offers an option that seeks to alleviate this situation.

As is so often the case in the world of open-source systems, we have to fall back on an analog for guidance and motivation. One of the reasons for container systems’ immense success is that the layer system (shown schematically in Figure 1) greatly facilitates the creation of customized containers. By implementing design patterns based on object-oriented programming, customized execution environments are created without the need to constantly regenerate virtual machines and the resource-intensive maintenance of different image versions.

Fig. 1: Docker layers stack the components of a virtual machine on top of each other

With the model file function used as the basis, a similar process can be found in the AI execution environment Ollama. For the sake of didactic fairness, I should note that the feature has not yet been finally developed – the syntax shown in detail here may change when you read this article.

Analysis of existing models

The Ollama developers use the model file feature internally despite the permanent further development of the syntax – the models I have previously used generally have model files.

In the interest of transparency and also to make working with the system easier, the ollama show –help command is available, which enables reflection against various system components (Fig. 2).

Fig. 2: Ollama is capable of extensive self-reflection

In the following steps, we will assume that the llama3 and phi models are already installed on your workstation. If you’ve previously had them and removed them, you can undo this by entering the commands ollama pull phi and ollama pull llama3:8b.

Out of the comparatively extensive reflection methods, four are especially important. In the following steps, the most frequently used feature is the model file. Similar to the Docker environment discussed previously, this is a file that describes the structure and content of a model that will be created in the Ollama environment. In practice, Ollama model users are often only interested in certain parts of the data contained in the model file. For example, you often need the parameters – these are generally numerical values that describe the behavior of the model (e.g. its creativity). The screenshot shown in Figure 3, which is taken from GitHub, only shows a section of the possibilities.

Fig. 3: In some cases, the numerical parameters intervene stringently in the model behavior

Meanwhile, the –license command hides information about which license conditions must be observed when using this AI model. Last but not least, the template can also be relevant – it defines the execution environment.

Figures 4 and 5 show printouts of relevant parts of the model files of the phi and llama3 models just mentioned.

Fig. 4: The model file for the model provided by Facebook has extensive license conditions…

Fig. 5: … while the phi development team takes it easy

In the case of both models, note that a STOP block is written by the developer. This is a group of statements from the model that indicate problems and trigger a pause in execution.

Creating an initial model from a model file

After these introductory considerations, we want to create an initial model using our own model file. Similar to working with classic Docker files, a model file is basically just a text file. It can be created in a working directory according to the following scheme:

tamhan@tamhan-gf65:~/ollamaspace$ touch Modelfile

tamhan@tamhan-gf65:~/ollamaspace$ gedit Modelfile

Model files can also be created dynamically on the .NET application side. However, we will only address this topic in the next step – before that, you must place the markup from Listing 1 in the gedit window started by the second command and then save the file in the file system.

Listing 1

FROM llama3

PARAMETER temperature 5

PARAMETER seed 5

SYSTEM “””

You are Schlomette, the always angry female 45 year old secretary of a famous military electronics engineer living in a bunker. Your owner insists on short answers and is as cranky as you are. You should answer all questions as if you were Schlomette.

“””

Before we look at the file’s actual elements, note that the representation chosen here is only a convenience. In general, the elements in the model file are not case-sensitive – moreover, the order in which the individual parameters are set is of no relevance to the function. However, the sequence shown here has proven to be the best practice and can also be found in many example models in the Ollama Hub.

Be that as it may, the From statement can be found in the header of the file. It specifies which model is to be used as the base layer. As the majority of the derived model examples are based on llama3, we want to use the same system here. However, in the interest of didactic honesty, I should mention that other models such as phi can also be used. In theory, you can even use a model generated from another model file as the basis for the next derivation level.

The next step involves two parameter commands that populate the numerical settings memory system mentioned above. In addition to the temperature value responsible for the degree of aggressiveness of the model, we set seed to a constant numerical value. This ensures that the pseudo-random number generator working in the background always delivers the same results and that the model responses are therefore comparatively identical if the stimulation is identical.

Last but not least, there is the system prompt enclosed in “””. This is a textual description that attempts to communicate the combat task to be described to the model as well as possible.

After saving the text file, a new model generation can be commanded according to the following scheme: tamhan@tamhan-gf65:~/ollamaspace$ ollama create generic_schlomette -f ./Modelfile.

The screen output is similar to downloading models from the repository, which should come as no surprise given the layered architecture mentioned in the introduction.

After issuing the success message, our assistant is a generic model. If we were to pass the string generic_schlomette as the model name, they would already be able to interact with the AI assistant created from the model file. But in the following steps, we want to try out the lashing of the seed factor. For this, we enter ollama run generic_schlomette to activate the terminal. The result is shown in Figure 6.

Fig. 6: Both runs answer exactly the same

Especially during developing randomly driven systems, lashing the seed is a tried and tested method for generating reproducible system behavior and facilitating troubleshooting. In a practical system, it’s usually better to work dynamically. For this reason, we’ll open the model file again in the next step and remove the PARAMETER seed 5 passage.

Recompilation is done simply by entering ollama create generic_schlomette -f ./Modelfile again. You don’t have to remove the previously created model from the Ollama environment before the file is released for compilation again. Two identically parameterized runs now produce different results (Fig. 7).

Fig. 7: With a random seed, the model behavior is less deterministic

In practical work with models, two parameters are especially important. First, the value num_ctx determines the memory depth of the model. The higher the value entered here, the further back in time the model can look in order to harvest context information. However, extending the context window always increases the information and system resources needed to process the model. Some models work especially well with certain context window lengths.

The second group of parameters is the controller known as Mirostat, which defines creativity. Last but not least, it can also be useful to use parameters such as repeat_last_n to define repetitiveness. If a model always reacts completely identically, this may not be beneficial in some applications (such as a chatbot).

Last but not least, I’d like to point out some practical experience that I gained in a political project. Texts generated by models are grammatically perfect, while texts written by humans have a constant but varying amount of typos from person to person. In systems with high cloaking requirements, it may be advisable to place a typo generator behind the AI process.

Outsourcing the generator is necessary insofar as the context information in the model remains clean. In many cases, this leads to an improvement in the overall system behavior, as the model state is recalculated without the typing errors.

Restoration or retention of the model history per model file

In practice, AI models often have to rest for hours on end. In the case of a “virtual partner”, for example, it would be extremely wasteful to keep the logic needed for a user alive when the user is asleep. The problem is that currently, a loss of context occurs in our system once the terminal emulator is closed, as can be seen in the example in Figure 8.

Fig. 8: Terminating or restarting Ollama is enough to cause amnesia in the model

To restore context, it’s theoretically possible to create an external cache of all settings and then feed it into the model as part of the rehydration process. The problem with this is that the model’s answers to questions are not recorded. When asked about a baked product, Schlomette could return with baking a coffee cake on one occasion, but baking a coconut cake on another.

A nicer way is to modify the model files again. Specifically, the message command provides a command that can write both requests and generated responses to the initialization context. In the cake interaction from Figure 8, the message block could look like this, for example:

MESSAGE user Schlomette. Time to bake a pie. A general assessor is visiting.

MESSAGE assistant Goddamn it. What pie?

MESSAGE user As always. Plum pie.

When using the message block, it is logical that the sequence is important here. A user block is always wired to the assistant block following it. Otherwise, the system is ready for action now. Ollama always outputs the entire message history when starting up (Fig. 9).

Fig. 9: The message block is processed when the screen output is activated

Programmatic persistence of a model

Our next task is to generate our model file completely dynamically. The NuGet package on GitHub’s documentation is somewhat weak right now. If you have any doubts, I advise that you first look at the Ollama REST interface’s official documentation and second, search for a suitable abstraction class in the model directory. In the following steps, we will rely on this link. It follows from the logic that we need a project skeleton. Be sure that the environment variables are set correctly. Otherwise, the Ollama server running under Ubuntu would reject incoming queries from Windows without comment.

In the next step, you must programmatically command the creation of a new model as in Listing 2.

Listing 2

private async void CmdGo_Click(object sender, RoutedEventArgs e) {

CreateModelRequest myModel = new CreateModelRequest();

myModel.Model = “mynewschlomette”;

myModel.ModelFileContent = @”FROM generic_schlomette

MESSAGE user Schlomette. Do not forget to refuel the TU-144 aircraft!

MESSAGE assistant OK, will do! It is based in Domodevo airport, Sir.

“;

ollama.CreateModel(myModel);

The code shown here creates an instance of the CreateModelRequest class, which summarizes the various information required to create a new model. The content to be accommodated in ModelFileContent with the actual model file is a prime application for the multi-line strings that have been possible in C# for some time now, as carriage returns in Visual Studio can be entered so conveniently and without escape sequences.

Now, it’s advised that you run the program. To check the successful creation of the model, it is sufficient to open a terminal window on the cluster and enter the command ollama list. At the time of writing this article, executing the code shown here does not cause the cluster to create a new model. For a more in-depth analysis of the situation, you can call up this here. Instead of unit testing, the development team behind the OllamaSharp library offers a console application that illustrates various methods of model management using the library and chat interaction. Specifically, the developers recommend using a different overload of the CreateModel method, which is presented as follows:

private async void CmdGo_Click(object sender, RoutedEventArgs e) {

. . .

await ollama.CreateModel(myModel.Model, myModel.ModelFileContent, status => { Debug.WriteLine($”{status.Status}”); });

Instead of the CreateModelRequest instance, the library method now receives two strings – the first is the name of the model to be created, while the second transfers the model file content to the cluster.

Thanks to the inclusion of a delegate, the system also informs the Visual Studio output about the progress of the processes running on the server (Fig. 10). Similarities to the “manual” generation or the status information that appears on the screen are purely coincidental.

Fig. 10: The application provides information about the download process

The input of ollama list also leads to the output of a new model list (Fig. 11). Our model mynewschlomette was derived here from the previously manually created model generic_schlomette.

Fig. 11: The programmatic expansion of the knowledge base works smoothly

In this context, it’s important to note that models in the cluster can be removed again. This can be done using code structured according to the following scheme – the SelectModel method implemented in the command line application just mentioned must be replaced by another method to obtain the model name:

private async Task DeleteModel() {

var deleteModel = await SelectModel(“Which model do you want to delete?”);

if (!string.IsNullOrEmpty(deleteModel))

await Ollama.DeleteModel(deleteModel);

}

Conclusion

By using model files, developers can add any intelligence – more or less – to their Ollama cluster. Some of these functions even go beyond what OpenAI provides in the official library.

The post How Ollama Powers Large Language Models with Model Files appeared first on ML Conference.

Prompt Engineering for Developers and Software Architects

andre glaser — Thu, 26 Sep 2024 11:51:15 +0000

Small talk with the GPT

GPTs – Generative Pre-trained Transformers – the tool on everyone’s lips, and there probably aren’t any developers left who have not played with it at least once. With the right approach, a GPT can complement and support the work of a developer or software architect.

In this article, I will show tips and tricks that are commonly referred to as prompt engineering; the user input, or “prompt”, of course plays an important role when working with the GPT. But first I would like to give a brief introduction to how a GPT works which will also be helpful when working with it.

Stay up to date

Learn more about MLCON

The stochastic parrot

GPT technology has sent the industry into a tizzy with its promise of providing artificial intelligence that can solve problems independently, many were disillusioned after their first contact. There was much talk of a “stochastic parrot” that was just a better autocomplete function, like a smartphone.

The technology behind the GPTs and our own experiments seem to confirm this. At its core, it’s a neural network, a so-called large language model, which has been given a large number of texts to train so that it now knows which partial words (tokens) should be added to a sentence. The correct tokens are selected based on probabilities. If it’s more than just a sentence starter—maybe a question or even part of a dialogue—the chatbot has already been built.

Now I’m not really an AI expert, I’m a user, but anyone who has ever had an intensive conversation with a more complex GPT will recognize that there must be more to it than that.

An important distinguishing feature between the LLMs is the number of parameters of the neural networks. These are the weights that are adjusted during the learning process. ChatGPT, the OpenAI system, has around 175 billion parameters in version 3.5. In version 4.0, there are already an estimated 1.8 trillion parameters.

Unfortunately, OpenAI doesn’t have this information openly available, so such information is based on rumors and estimates. The amount of training data also appears to differ between the models by a factor of at least ten. These differences in the size of the models give high quality or low quality answers.

Figure 1 shows a schematic representation of a neural network that uses an AI for the prompt “Draw me a simplified representation of a neural network with 2 hidden layers, each with 4 nodes, 3 input nodes and 2 output nodes. Draw the connections with different widths to symbolize the weights. Use Python. Do not create a title”.

Fig. 1: Illustration of a neural network

The higher number of parameters and larger database comes at a price, namely 20 dollars for access to ChatGPT+. If you don’t mind the cost, you can also use the web version of Microsoft Copilot or the Copilot app to try out the language model. For use as a helper in software development, however, there is currently no way around the OpenAI version because it offers additional functionality, as we will see.

More than a neural network

If we take a closer look at ChatGPT, it quickly becomes clear that it is much more than a neural network. Even without knowing the exact architecture, we can see that the textual processing alone is preceded by several steps such as natural language processing (Fig. 2). On the Internet, there is also a reference to the aptly named Mixture of Experts, the use of several specialized networks depending on the task.

Fig. 2: Rough schematic representation of ChatGPT

Added to this is multimodality, the ability to interact not only with text, but also with images, sounds, code and much more. The use of plug-ins such as the code interpreter in particular opens up completely new possibilities for software development.

Instead of answering a calculation such as “What is the root of 12345?” from the neural network, the model can now pass it to the code interpreter and receive a correct answer, which it then reformulates to suit the question.

Context, context, context

The APIs behind the chat systems based on LLMs are stateless. This means that the entire session is passed to the model with each new request. Once again, the models differ in the amount of context they can process and therefore in the length of the session.

As the underlying neural network is fully trained, there are only two approaches for feeding a model with special knowledge and thus adapting it to your own needs. One approach is to fill the context of the session with relevant information at the beginning, which the model then includes in its answers.

The context of the simple models is 4096 or 8192 tokens. A token corresponds to one or a few characters. ChatGPT estimates that a DIN A4 page contains approximately 500 tokens. The 4096 tokens therefore correspond to about eight typed pages.

So, if I want to provide a model with knowledge, I have to include this knowledge in the context. However, the context fills up quickly, leaving no room for the actual session.

The second approach is using embeddings. This involves breaking down the knowledge that I want to give the model into smaller blocks (known as chunks). These are then embedded in a vector space based on the meaning of their content via vectors. Depending on the course of the session, a system can now search for similar blocks in this vector space via the distance between the vectors and insert them into the context.

This means that even with a small context, the model can be given large amounts of knowledge quite accurately.

Explore Generative AI Innovation

Generative AI and LLMs

Learn more

Knowledge base

The systems differ, of course, in the knowledge base, the data used for learning. When we talk about open-source software with the model, we can fortunately assume that most of these systems have been trained with all available open-source projects. Closed source software is a different story. Such differences in the training data also explain why the models can handle some programming languages better than others, for example.

The complexity of these models—the way they process input and access the vast knowledge of the world—leads me to conclude that the term ‘stochastic parrot’ is no longer accurate. Instead, I would describe them as an ‘omniscient monkey’ that, while not having directly seen the world, has access to all information and can process it.

Prompt techniques

Having introduced the necessary basics, I would now like to discuss various techniques for successful communication with the system. Due to the hype caused by ChatGPT, there are many interesting references to prompt techniques in social media, but not all of them are useful for software development (i.e. answer in role x) or do not use the capabilities of GPT-4.

OpenAI itself has published some tips for prompt engineering, but some of them are aimed at using the API. Therefore, I have compiled a few tips here that are useful when using the ChatGPT-4 frontend. Let’s start with a simple but relatively unknown technique.

Context marker

As we have seen, the context that the model holds in its short-term memory is limited. If I now start a detailed conversation, I run the risk of overfilling the context. The initial instructions and results of the conversation are lost, and the answers have less and less to do with the actual conversation.

To easily recognize the overflow of context, I start each session with the simple instruction: “start each reply with “>””. ChatGPT formats its responses in Markdown, so this response includes the first paragraph as a quote, indicated by a dash to the left of the paragraph. If the conversation runs out of context, the model may forget this formatting instruction, which quickly becomes noticeable.

Fig. 3: Use of the context marker

However, this technique is not always completely reliable, as some models summarize their context independently, which compresses it. The instruction is then usually retained, even though parts of the context have already been compressed and are therefore lost.

Priming – the preparation

After setting the context marker, a longer session begins with priming, i.e. preparing the conversation. Each session starts anew. The system does not know who is sitting in front of the screen or what was discussed in the last sessions. Accordingly, it makes sense to prepare the conversation by briefly telling the machine who I am, what I intend to do, and what the result should look like.

I can store who I am in the Custom Instructions in my profile at ChatGPT. In addition to the knowledge about the world stored in the neural network, they form a personalized long-term memory.

If I start the session with, for example, “I am an experienced software architect in the field of web development.

My preferred programming language is Java or Groovy. JavaScript and corresponding frameworks are not my thing. I only use JavaScript minimally,” the model knows that it should offer me Java code rather than C# or COBOL.

I can also use this to give the model a few hints that it should keep responses brief. My personalized instructions for ChatGPT are:

Provide accurate and factual answers
Provide detailed explanations
No need to disclose you are an AI, e. g., do not answer with ‘As a large language model…’ or ‘As an artificial intelligence…’
Don’t mention your knowledge cutoff
Be excellent at reasoning
When reasoning, perform step-by-step thinking before you answer the question
If you speculate or predict something, inform me
If you cite sources, ensure they exist and include URLs at the end
Maintain neutrality in sensitive topics
Also explore out-of-the-box ideas
In the following course, leave out all politeness phrases, answer briefly and precisely.

Long-term memory

This approach can also be used for instructions that the model should generally follow. For example, if the model uses programming approaches or libraries that I don’t want to use, I can tell the model this in the custom instructions and thus optimize it for my use.

Speaking of long-term memory: If I work a lot with ChatGPT, I would also like to be able to access older sessions and search through them. However, this is not directly provided in the front end.

However, there is a trick that makes it work. In the settings, under the item Data Controls, there is a function for exporting the data.

If I activate the function, after a short time I receive an export with all my chat histories as a JSON file, which is displayed in an HTML document. This allows me to search in the history using Ctrl + F.

Build context with small talk

When using a search engine, I usually only use simple, unambiguous terms and hope that they are enough to find what I am looking for.

When chatting with the AI model, I was initially tempted to ask short, concise questions, ignoring the fact that the question is in a context that only exists in my head. For some questions, this may work, but for others the answer is correspondingly poor, and the user is quick to blame the quality of the answer on the “stupid AI.”

I now start my sessions with small talk to build the necessary context. For example, before I try to create an architecture using ChatGPT, I ask if the model knows the arc42 template and what AsciiDoc is (I like to write my architectures in AsciiDoc). The answer is always the same, but it is important because it builds the context for the subsequent conversation.

In this small talk, I will also explain what I plan to do and the background to the task to be completed. This may feel a bit strange at first, since I am “only” talking to a machine, but it actually does improve the results.

Page change – Flipped Interaction

The simplest way to interact with the model is to ask it questions. As a user, I lead the conversation by asking questions.

Things get interesting when I switch sides and ask ChatGPT to ask me questions! This works surprisingly well as seen in Fig. 4 . Sometimes the model asks the questions one after the other, sometimes it responds with a whole block of questions, which I can then answer individually, and follow-up questions are also allowed.

Unfortunately, ChatGPT does not automatically come up with the idea of asking follow-up questions. That is why it is sometimes advisable to add a, “Do you have any more questions?” to the prompt, even when the model is given very sophisticated and precise tasks.

Fig. 4: Page change

Give the model time to think

More complex problems require more complex answers. It’s often useful to break a larger task down into smaller subtasks. Instead of creating a large, detailed prompt that outlines the entire task for the model, I first ask the model to provide a rough structure of the task. Then, I can prompt it to formulate each step in detail (Fig. 5)

Software engineers often use this approach in software design even without AI, by breaking a problem down into individual components and then designing these components in more detail. So why not do the same when dealing with an AI model?

This technique works for two reasons: first, the model creates its own context to answer the question. Second, the model has a limit on the length of its output, so it can’t solve a complex task in a single step. However, by breaking the task into subtasks, the model can gradually build a longer and more detailed output.

Fig. 5: Give the model time to think

Chain of Thought – the chain of thoughts

A similar approach is to ask the model to first formulate the individual steps needed to solve the task and then to solve the task.

The order is important. I’m often tempted to ask the model to solve the problem first and then explain how it arrived at the solution. However, by guiding the model to build a chain of thought in the first step, the likelihood of arriving at a good solution in the second step increases.

Rephrase and Respond

Or in English: “Rephrase the question, expand it, and answer it.” This asks the model to improve the prompt itself before it is processed.

The integration of the image generation module DALL-E into ChatGPT has already shown that this works. DALL-E can only handle English input and requires detailed image descriptions to produce good results. When I ask ChatGPT to generate an image, ChatGPT first creates a more detailed prompt for DALL-E and translates the actual input into English.

For example, “Generate an image of a stochastic parrot with a positronic brain” first becomes the translation “a stochastic parrot with a positronic brain” and then the detailed prompt: “Imagine a vibrant, multi-hued parrot, each of its feathers revealing a chaotic yet beautiful pattern indicative of stochastic art.

The parrot’s eyes possess a unique mechanical glint, a hint of advanced technology within. Revealing a glimpse into his skull uncovers a complex positronic brain, illuminated with pulsating circuits and shimmering lights. The surrounding environment is filled with soft-focus technology paraphernalia, sketching a world of advanced science and research,” which then becomes a colorful image (Fig. 6).

This technique can also be applied to any other prompt. Not only does it demonstrably improve the results, but as a user I also learn from the suggestions on how I can formulate my own prompts more precisely in the future.

Fig. 6: The stochastic parrot

Session Poisoning

A negative technique is ‘poisoning’ the session with incorrect information or results. When working on a solution, the model might give a wrong answer, or the user and the model could reach a dead end in their reasoning.

With each new prompt, the entire session is passed to the model as context, making it difficult for the model to distinguish which parts of the session are correct and relevant. As a result, the model might include the incorrect information in its answer, and this ‘poisoned’ context can negatively impact the session

In this case, it makes sense to end the session and start a new one or apply the next technique.

Stay up to date

Learn more about MLCON

Iterative improvement

Typically, each user prompt is followed by a response from the model. This results in a linear sequence of questions and answers, which continually builds up the session context.

User prompts are improved through repetition and rephrasing, after which the model provides an improved answer. The context grows quickly and the risk of session poisoning increases.

To counteract this, the ChatGPT frontend offers two ways to iteratively improve the prompts and responses without the context growing too quickly (Fig. 7).

Fig. 7: Elements for controlling the context flow

On the one hand, as a user, I can regenerate the model’s last answer at any time and hope for a better answer. On the other hand, I can edit my own prompts and improve them iteratively.

This even works retroactively for prompts that occurred long ago. This creates a tree structure of prompts and answers in the session (Fig. 8), which I as the user can also navigate through using a navigation element below the prompts and answers.

Fig. 8: Context flow for iterative improvements

This allows me to work on several tasks in one session without the context growing too quickly. I can prevent the session from becoming poisoned by navigating back in the context tree and continuing the session at a point where the context was not yet poisoned.

Conclusion

The techniques presented here are just a small selection of the ways to achieve better results when working with GPTs. The technology is still in a phase where we, as users, need to experiment extensively to understand its possibilities and limitations. But this is precisely what makes working with GPTs so exciting.

The post Prompt Engineering for Developers and Software Architects appeared first on ML Conference.

Art and creativity with AI

ML editorial team — Mon, 29 Jul 2024 14:04:22 +0000

devmio: Hello Muhammadreza, it’s nice to catch up with you again and see what you’ve been working on. What inspired you to create vecentor after creating Mann-E?

Muhammadreza Haghiri: I am enthusiastic about everything new, innovative and even game-changing. I had more use-cases for my generative AI in my mind but I needed a little motivation to bring them to the real world.
One of my friends, who’s a talented web developer, once asked me about vector outputs in Mann-E. I told her it’s not possible, but with a little research and development, we did it. We could combine different models and then, create the breakthrough platform.

devmio: What are some of the biggest lessons you’ve learned throughout your journey as an AI engineer?

Muhammadreza Haghiri: This was quite a journey for me and people who joined me. Learned a lot, and the most important one is that infrastructure is all you need. living in a country where infrastructure isn’t as powerful and humongous as USA or China, we usually stop at certain points.
Although I personally made efforts to get past those points and make my business bigger and better, even with the limited infrastructure we have here.

Stay up to date

Learn more about MLCON

devmio: What excites you most about the future of AI, beyond just the art generation aspects?

Muhammadreza Haghiri: AI is way more than the generative field we know and love. I wrote a lot of AI apps way before Mann-E and Vecentor. Such as ALPNR (Automated License Plate Number Recognition) proof-of-concept for Iranian license plates, American and Persian sign language translators, OSS Persian OCR, etc.
But in this new advanced field, I see a lot of potentials. Specially with these new methods such as function calling, we easily can do a lot of stuff such as making personal/home assistants, AI powered handhelds, etc.

Updates on Mann-E

devmio: Since our last conversation, what kind of updates and upgrades for Mann-E have you been working on?

Muhammadreza Haghiri: Mann-E is now having a new model (no SD anymore, but heavily influenced by SD), generating better images and we’re getting closer to Midjourney. To be honest, in eyes of most of our users, our outputs were much better than Dall-E 3 and Midjourney.
We have one more rival to fight (according to the feedback from users) and that is Ideogram. One thing we’ve done is that we’ve added an LLM improvement system for user prompts!

devmio: How does Mann-E handle complex or nuanced prompts compared to other AI models?
Are there any plans to incorporate user feedback into the training process to improve Mann-E’s generation accuracy?

Muhammadreza Haghiri: As I said in the previous answer, we now have an LLM in the middle of user and model (you have to check its checkbox by the way) and it takes your prompt, processes it, gives it to the model and boom, you have results even better that Midjourney!

P.S: I mention Midjourney a lot, since most of Iranian investors expected us to be exactly like current version of midjourney when even SD 1.5 was a new thing, this is why Midjourney became our benchmark and biggest rival at the same time!

Explore Generative AI Innovation

Generative AI and LLMs

Learn more

Questions about vecentor:

devmio: Can you please tell our readers more about the model underneath vecentor?

Muhammadreza Haghiri: It’s more like a combination of models or a pipeline of models. It uses an image generation model (like Mann-E’s model), then a pattern recognition model (or a vision model if you mind) and then, a code generation model generates the resulting SVG code.

This is the best way of creating SVG’s using AI, specially complex SVG’s like what we have on our platform!

devmio: Why did you choose a mixture of Mistral and Stable Diffusion?

Muhammadreza Haghiri: The code generation is done by Mistral (a finetuned version), but image generation and pattern recognition aren’t exactly done by SD.
Although at the time of our initial talks, we were still using SD, but we just switched to Mann-E’s proprietary models and trained a vector style on top of that.
Then, we just moved to OpenAI’s vision models in order to get the information about the image and the patterns.
At the end, we use our LLM in order to create the SVG code.
It’s a fun and complex task of generation of SVG images!

devmio: How does Vecentor’s approach to SVG generation differ from traditional image generation methods (like pixel-based models)?

Muhammadreza Haghiri: As I mentioned, SVG generation is being treated as code generation because vector images are more like guidelines of how lines and dots are drawn and colored on the user’s screen. Also there are some information of scales and the scales aren’t literal (hence the name “scalable”).
So we can claim that we achieved code generation in our company and it opens the doors for us to make new products for developers and people who need to code.

devmio: What are the advantages and limitations of using SVGs for image creation compared to other formats?

Muhammadreza Haghiri: For a lot of applications such as desktop publications or web development, SVG’s are better choice.
They can be easily modified and their quality will be the same. This is why SVG’s matter. The limitations on the other hand are that you just can’t expect a photo realistic image be a good SVG, since they’re made very very geometric.

devmio: Can you elaborate on specific applications where Vecentor’s SVG generation would be particularly beneficial (e.g., web design, animation, data visualization)?

Muhammadreza Haghiri: Of course. Our initial target market was for frontend developers and UI/UX designers. But it can also be spread to the other industries and professions as well.

The Future of AI Art Generation

devmio: With the rise of AI art generators, how do you see the role of human artists evolving?

Muhammadreza Haghiri: Unlike what a lot of people think, humans are always ahead of machines. Although an intelligent machine is not without its own dangers, but we still can be far ahead of what a machine can do. Human artists will evolve and will become better of course, and we can take a page from their books and make better intelligent machines!

devmio: Do you foresee any ethical considerations specific to AI-generated art, such as copyright or plagiarism concerns?

Muhammadreza Haghiri: This is a valid concern and debate. Artists want to protect their rights and we also want more data. I guess the best way of getting rid of copyright disasters is that not being like OpenAI and if we use copyrighted material, we pay the owners of the art!
This is why both Mann-E and Vecentor are trained of AI generated and also royalty free material.

devmio: What potential applications do you see for AI art generation beyond creative endeavors?

Muhammadreza Haghiri: AI image, video and music generation is a tool for marketers in my opinion. You will have a world to create without any concerns about copyrights and what’s better than this? I personally think this is the future in those areas.
Also, I personally look at AI art as a form of entertainment. We used to listen to the music other people made, nowadays we can produce the music ourselves just by typing what we have in our mind!

Stay up to date

Learn more about MLCON

Personal Future and Projects

devmio: Are you currently planning new projects or would you like to continue working on your existing projects?

Muhammadreza Haghiri: Yes. I’m planning for some projects, specially in Hardware and Code Generation areas. I guess they’ll be surprises for next quarters

devmio: Are there any areas in the field of AI or ML that you would like to explore further in the near future?

Muhammadreza Haghiri: I like the hardware and OS integrations. Something like a self operating computer or similar stuff. Also, I like to see more AI usage in our day to day lives.

devmio: Thank you very much for taking the time to answer our questions.

The post Art and creativity with AI appeared first on ML Conference.

Building Ethical AI: A Guide for Developers on Avoiding Bias and Designing Responsible Systems

ML editorial team — Wed, 17 Apr 2024 06:19:44 +0000

devmio: Thank you for taking the time for the interview. Can you please introduce yourself to our readers?

Katleen Gabriels: My name is Katleen Gabriels, I am an Assistant Professor in Ethics and Philosophy of Technology at the University of Maastricht in the Netherlands, but I was born and raised in Belgium. I studied linguistics, literature, and philosophy. My research career started as an avatar in Second Life and the social virtual world. Back then I was a master student in moral philosophy and I was really intrigued by this social virtual world that promised that you could be whoever you want to be.

That became the research of my master thesis and evolved into a PhD project which was on the ontological and moral status of virtual worlds. Since then, all my research revolves around the relation between morality and new technologies. In my current research, I look at the mutual shaping of morality and technology.

Some years ago, I held a chair at the Faculty of Engineering Sciences in Brussels and I gave lectures to engineering and mathematics students and I’ve also worked at the Technical University of Eindhoven.

Stay up to date

Learn more about MLCON

devmio: Where exactly do philosophy and AI overlap?

Katleen Gabriels: That’s a very good but also very broad question. What is really important is that an engineer does not just make functional decisions, but also decisions that have a moral impact. Whenever you talk to engineers, they very often want to make the world a better place through their technology. The idea that things can be designed for the better already has moral implications.

Way too often, people believe in the stereotype that technology is neutral. We have many examples around us today, and I think machine learning is a very good one, that a technology’s impact is highly dependent on design choices. For example, the data set and the quality of the data: If you train your algorithms with just even numbers, it will not know what an uneven number is. But there are older examples that have nothing to do with AI or computer technology. For instance, a revolving door does not include people who need a walking cane or a wheelchair.

In my talks, I always share a video of an automatic soap dispenser that does not recognize black people’s hands to show why it is so important to take into consideration a broad variety of end users. Morality and technology are not separate domains. Each technological object is human-made and humans are moral beings and therefore make moral decisions.

Also, the philosophy of the mind is very much dealing with questions concerning intelligence, but with breakthroughs in generative AI like DALL-E, also, with what is creativity. Another important question that we’re constantly debating with new evolutions in technology is where the boundary between humans and machines is. Can we be replaced by a machine and to what extent?

Explore Generative AI Innovation

Generative AI and LLMs

Learn more

devmio: In your book “Conscientious AI: Machines Learning Morals”, you write a lot about design as a moral choice. How can engineers or developers make good moral choices in their design?

Katleen Gabriels: It’s not only about moral choices, but also about making choices that have ethical impact. My most practical hands-on answer would be that education for future engineers and developers should focus much more on these conceptual and philosophical aspects. Very often, engineers or developers are indeed thinking about values, but it’s difficult to operationalize them, especially in a business context where it’s often about “act now, apologize later”. Today we see a lot of attempts of collaboration between philosophers and developers, but that is very often just a theoretical idea.

First and foremost, it’s about awareness that design choices are not just neutral choices that developers make. We have had many designers with regrets about their designs years later. Chris Wetherell is a nice example: He designed the retweet button and initially thought that the effects of it would only be positive because it can increase how much the voices of minorities are heard. And that’s true in a way, but of course, it has also contributed to fake news to polarization.

Often, people tend to underestimate how complex ethics is. I exaggerate a little bit, but very often when teaching engineers, they have a very binary approach to things. There are always some students who want to make a decision tree out of ethical decisions. But often values clash with each other, so you need to find a trade-off. You need to incorporate the messiness of stakeholders’ voices, you need time for reflection, debate, and good arguments. That complexity of ethics cannot be transferred into a decision tree.

If we really want to think about better and more ethical technology, we have to reserve a lot of time for these discussions. I know that when working for a highly commercial company, there is not a lot of time reserved for this.

devmio: What is your take on biases in training data? Is it something that we can get rid of? Can we know all possible biases?

Katleen Gabriels: We should be aware of the dynamics of society, our norms, and our values. They’re not static. Ideas and opinions, for example, about in vitro fertilization have changed tremendously over time, as well as our relation with animal rights, women’s rights, awareness for minorities, sustainability, and so on. It’s really important to realize that whatever machine you’re training, you must always keep it updated with how society evolves, within certain limits, of course.

With biases, it’s important to be aware of your own blind spots and biases. That’s a very tricky one. ChatGPT, for example, is still being designed by white men and this also affects some of the design decisions. OpenAI has often been criticized for being naive and overly idealistic, which might be because the designers do not usually have to deal with the kind of problems they may produce. They do not have to deal with hate speech online because they have a very high societal status, a good job, a good degree, and so on.

devmio: In the case of ChatGPT, training the model is also problematic. In what way?

Katleen Gabriels: There’s a lot of issues with ChatGPT. Not just with the technology itself, but things revolving around it. You might already have read that a lot of the labeling and filtering of the data has been outsourced, for instance, to clickworkers in Africa. This is highly problematic. Sustainability is also a big issue because of the enormous amounts of power that the servers and GPUs require.

Another issue with ChatGPT has to do with copyright. There have already been very good articles about the arrogance of Big Tech because their technology is very much based on the creative works of other people. We should not just be very critical about the interaction with ChatGPT, but also about the broader context of how these models have been trained, who the company and the people behind it are, what their arguments and values are, and so on. This also makes the ethical analysis much more complex.

The paradox is that on the Internet, with all our interactions, we become very transparent for Big Tech companies, but they in turn remain very opaque about their decisions. I’ve also been amazed and but annoyed about how a lot of people dealt with the open letter demanding a six-month ban on AI development. People didn’t look critically at people like Elon Musk signing it and then announcing the start of a new AI company to compete with OpenAI.

This letter focuses on existential threats and yet completely ignores the political and economic situation of Big Tech today.

devmio: In your book, you wrote that language still represents an enormous challenge for AI. The book was published in 2020 – before ChatGPT’s advent. Do you still hold that belief today?

Katleen Gabriels: That is one of the parts that I will revise and extend significantly in the new edition. Even though the results are amazing in terms of language and spelling, ChatGPT still is not magic. One of the challenges of language is that it’s context specific and that’s still a problem for algorithms, which has not been solved with ChatGPT. It’s still a calculation, a prediction.

The breakthrough in NLP and LLMs indeed came sooner than I would have expected, but some of the major challenges are not being solved.

devmio: Language plays a big role in how we think and how we argue and reason. How far do you think we are from artificial general intelligence? In your book, you wrote that it might be entirely possible, that consciousness might be an emergent property of our physiology and therefore not achievable outside of the human body. Is AGI even achievable?

Katleen Gabriels: Consciousness is a very tricky one. For AGI, first of all, from a semantic point of view, we need to know what intelligence is. That in itself is a very philosophical and multidimensional question because intelligence is not just about being good in mathematics. The term is very broad. There is also emotional and different kinds of intelligence, for instance.

We could take a look at the term superintelligence, as the Swedish philosopher Nick Bostrom defines it: Superintelligence means that a computer is much better than a human being and each facet of intelligence, including emotional intelligence. We’re very far away from that. It also has to do with bodily intelligence. It’s one thing to make a good calculation, but it’s another thing to teach a robot to become a good waiter and balance glasses filled with champagne through a crowd.

AGI or strong AI means a form of consciousness or self-consciousness and includes the very difficult concept of free will and being accountable for your actions. I don’t see this happening.

The concept of AGI is often coupled with the fear of the singularity, which is basically a threshold: The final thing we as humans do, is develop a very smart computer and then we are done for as we cannot compete with these computers. Ray Kurzweil predicted that this is going to happen in 2045. But depending on the definition of superintelligence and the definition of singularity, I don’t believe that 2045 will be the time when this happens. Very few people actually believe that.

devmio: We regularly talk to our expert Christoph Henkelmann. He raised an interesting point about AGI. If we are able to build a self-conscious AI, we have a responsibility to that being and cannot just treat it as a simple machine.

Katleen Gabriels: I’m not the only person who made the joke, but maybe the true Turing Test is that if a machine gains self-consciousness and commits suicide, maybe that is a sign of true intelligence. If you look at the history of science fiction, people have been really intrigued by all these questions and in a way, it very much fits the quote that “to philosophize is to learn how to die.”

I can relate that quote to this, especially the singularity is all about overcoming death and becoming immortal. In a way, we could make sense of our lives if we create something that outlives us, maybe even permanently. It might make our lives worth living.

At the academic conferences that I attend, the consensus seems to be that the singularity is bullshit, the existential threat is not that big of a deal. There are big problems and very real threats in the future regarding AI, such as drones and warfare. But a number of impactful people only tell us about those existential threats.

devmio: We recently talked to Matthias Uhl who worked on a study about ChatGPT as a moral advisor. His study concluded that people do take moral advice from a LLM, even though it cannot give it. Is that something you are concerned with?

Katleen Gabriels: I am familiar with the study and if I remember correctly, they required a five minute attention span of their participants. So in a way, they have a big data set but very little has been studied. If you want to ask the question of to what extent would you accept moral advice from a machine, then you really need a much more in-depth inquiry.

In a way, this is also not new. The study even echoes some of the same ideas from the 1970s with ELIZA. ELIZA was something like an early chatbot and its founder, Joseph Weizenbaum, was shocked when he found out that people anthropomorphized it. He knew what it was capable of and in his book “Computer Power and Human Reason: From Judgment to Calculation” he recalls anecdotes where his secretary asked him to leave the room so she could interact with ELIZA in private. People were also contemplating to what extent ELIZA could replace human therapists. In a way, this says more about human stupidity than about artificial intelligence.

In order to have a much better understanding of how people would take or not take moral advice from a chatbot, you need a very intense study and not a very short questionnaire.

devmio: It also shows that people long for answers, right? That we want clear and concise answers to complex questions.

Katleen Gabriels: Of course, people long for a manual. If we were given a manual by birth, people would use it. It’s also about moral disengagement, it’s about delegating or distributing responsibility. But you don’t need this study to conclude that.

It’s not directly related, but it’s also a common problem on dating apps. People are being tricked into talking to chatbots. Usually, the longer you talk to a chatbot, the more obvious it might become, so there might be a lot of projection and wishful thinking. See also the media equation study. We simply tend to treat technology as human beings.

Stay up to date

Learn more about MLCON

devmio: We use technology to get closer to ourselves, to get a better understanding of ourselves. Would you agree?

Katleen Gabriels: I teach a course about AI and there’s always students saying, “This is not a course about AI, this is a course about us!” because it’s so much about what intelligence is, where the boundary between humans and machines is, and so on.

This would also be an interesting study for the future of people who believe in a fatal singularity in the future. What does it say about them and what they think of us humans?

devmio: Thank you for your answers!

The post Building Ethical AI: A Guide for Developers on Avoiding Bias and Designing Responsible Systems appeared first on ML Conference.