RAG & MCP for ML Devs: Practical AI Implementation

adharaneedharan@sandsmedia.com — Mon, 07 Jul 2025 07:21:42 +0000

Join us at MLCon New York to dive into these topics even deeper and hands on.

Business & Culture: https://mlconference.ai/machine-learning-business-strategy/ai-native-software-organization-technical-operational-practices/

https://mlconference.ai/generative-ai-content/gen-ai-operational-metrics/

https://mlconference.ai/machine-learning-business-strategy/operationalizing-ai-leadership-sprints-workshop/

GenAI & LLM: https://mlconference.ai/generative-ai-content/building-agentic-rag-pipeline/

https://mlconference.ai/generative-ai-content/agentic-ai-workshop-deep-research-llms/

devmio: Hello everyone, we’re live from JAX in Mainz, and I am here with Robert Glaser from INNOQ. Hello Robert.

Robert Glaser: Hi Hartmut, I’m glad I could make it.

devmio: Great. You’re here at JAX talking about AI. A controversial topic for some. Some love it, some are a little put off by it. Of course, there are benefits from it. Where do you see the benefits of AI?

Robert Glaser: I get that question a lot, and I usually disappoint people by saying, “I can’t give you a list of use cases in an hour.” Because use cases are a dime a dozen, since we’re dealing with a new kind of technology. Think of electricity, steam engines, or the internet. People didn’t immediately know, “OK, here are 100 use cases we can do with this and not with that.”

Everyone in every department, in every company, has to find out how well or how poorly it fits with the current state of technology. There are formats and possibilities for finding use cases. Then you also evaluate the potential outcome, look at what the big levers are, what is likely to bring us the most benefit when we implement it? And that’s how I would approach it. Whenever someone promises to name AI use cases for your business in a 30-minute conversation, I would always be cautious.

devmio: Now you have a use case, a RAG system, retrieval of augmented generation. Can you explain how a RAG works? Maybe first, what exactly is it? And then, of course, the question. Yes, for you out there, is this a use case that is exciting for you? What are the benefits of a RAG system?

Robert Glaser: Yes, I can explain briefly. A RAG is not a new concept at all. It comes from an academic paper from 2005. At that time, people weren’t talking about foundation models or large language models like we are today.

RAGs come from the domain of Natural Language Programming, which is how it’s used today. How do I get context for my AI model? Because the AI model only becomes interesting when I connect it to my company data. For example, in Confluence or some tables or databases or something like that. Then I can use this technology in conjunction with my data. And RAG is an architecture, if you want to call it that, which allows me to do just that.

That’s changing right now, too. But the basic principle that I’m showing in my presentation is that, in principle, a large part of the RAG architecture is information retrieval. All of us who work in tech have known this for many decades. A search, for example, is an information retrieval system.

This is often at the center of RAG and is the most interesting point, because the classic RAG approach is nothing more than pasting search results into my prompt. That’s all there is to it, it’s also something I mention in my talk. Well, we can end the presentation here. There’s nothing more to it. You have your prompt, which says, for example: Answer this question, solve this problem using the following data.

The data that comes in and that it is also relevant. That’s the crux of the matter. That’s what RAG deals with. For the most part, it’s a search problem: How do I cut my body of data into pieces that fit together? I can’t always just shovel everything in there. Models like this have a context window; they can’t digest endless amounts of data.

It’s like with an intern: if I pile the desk full, they may be a highly qualified intern, but they’ll get lost if the relevant documents aren’t there. So, I have to build a search that gets the relevant document excerpts for the task that the LLM is given. And then it can produce an answer or a result and not fantasize in a vacuum or base it on world knowledge, but rather on my company’s knowledge, which is called grounding. The answers are grounded, citation-based, like how a journalist works with facts. They must be there.

devmio: -and with references!

Robert Glaser: Exactly, then I can also say what the answer should look like. Please don’t make any statements without a source. That helps too. I’ll show this best practice in my talk.

devmio: So, you have this company data linked to the capabilities of modern LLMs. How do I ensure that the company data remains secure and is not exploited? It becomes publicly accessible via LLM.

Robert Glaser: Exactly. In principle, you will probably build some kind of system or feature or an internal product. In the rarest of cases, you will probably just connect your corporate customers to ChatGPT—you could do that.

There’s a lot going on right now, for example, Anthropic is currently building a Google Drive connection or a Google Mail connection. This allows you to search your Google Mails, which is incredibly powerful. But maybe that’s not always what you want as a company. That’s why you have to build something, you have to develop something. If it’s a chatbot, then it’s a chatbot. But every AI feature can potentially be augmented with RAG.

You have to develop software, and the software probably uses an AI model via an API. Either this AI model is in the cloud or in your own data center with a semi-open model, and then you must consider: how do I get the data from the search system or from the multiple search systems into the prompt. This involves a relatively large amount of engineering—classic engineering work. We can expect that developers of such systems will simply build them more often in the future.

devmio: Good, and another point that is becoming increasingly important is AI agents. These are agents that can do more than just spit out a result after a prompt. In this context, the MCP protocol, or model context, is often mentioned. How does the model context protocol work exactly? Maybe you could explain MCPs and how they work?

Robert Glaser: We’ve been having a lot of conversations about this at our booth downstairs. It’s not without reason that people are so interested in this topic; there’s enormous hype surrounding it. It’s an open standard developed by the company Anthropic, which also trains the model, and it was released last November. And at the time of release, nobody was really interested in it. Right now, about a month ago, AI time is also somehow compressed in my head– it’s going so fast.

Every day, there are hundreds of new MCP servers for a wide variety of services and data sources. In principle, MCP is nothing more than a very simple specification for a protocol that I use to connect a service or data to my AI model. Some call it “USB-C for AI models”. I personally think that’s a pretty good metaphor, even if it’s not 100% technically correct, but the plug always looks the same.

But it doesn’t matter what I plug into a device, be it a hard drive, USB stick, or alarm siren. The protocol is always the same. What it does is stay in the metaphorical world. It gives AI models hands and feet to do things in the real world. Tools are a concept of this protocol, i.e., tool usage. But there are also several other things.

These things can also simply provide prompts, which means I could write in MCP Server, which acts as a prompt library. Then the LLM can choose for itself which prompts it wants to use.

Something very interesting is tool use, also nothing new, but Foundation models have been trained on tool usage for a long time. But until now, I always had to tell them: “Look, here’s an open API spec for an API that you can use in this case, and you use it like this, and then there’s the next tool with a completely different description.”

This tool might be a search function, which would have a completely different API that you use like this. And then you can operate on the arrow system, and you do that with STANDARD in STANDARD out. Imagine you need an AI agent or an AI feature and have to describe 80 individual tools!

Every time you start from scratch and with each new tool, you must write a new description. The nice thing about MCP is that it standardizes this and acts as a facade, so to speak. The foundation model or the team that builds an AI feature doesn’t need to know how these services and data work or how to use them, because the MCP server takes care of that and exposes the functionality to the AI model using the same protocol.

It’s also nice if, for example, I’m a large company and I have one team building the search function in the e-commerce shop and another team building the shopping cart, then we are very familiar with their domains. This makes it very easy for them to write an MCP server for the respective system. Then I could say, for example: Dear shopping cart team, please build an MCP server for the shopping cart because we want to roll out an AI feature in the shop and need access to these tools.

So, it also fits in nicely with a distributed team architecture like a kind of platform through which I can access a wide variety of tools and LLMs. I could also build a platform for this, but basically, it’s just small services, called servers, that I simply connect to a model. And I don’t always have to describe a different interface, but always the same one.

I don’t have to wait for Open AI to continue training its models so that the MCP can use them, because we all know that I can simply enter the MCP specifications now and then the model can learn from the context on how to use them. I’m not dependent on having to use a model right now. I can teach them everything.

devmio: Yes, what I’m noticing is that MCPs have really taken off. Among other things, because it’s not just one topic, but also Open AI and Google, I think. And others are now on board and supporting it, so there’s a kind of standardization.

Robert Glaser: It’s like how the industry agreed on USB-C or, back then, A. That was good for everyone because then you no longer had 1,000 cables. It’s like how I’m developing an AI feature now and the possibilities are basically endless. People like to think only in terms of data sources or services, but a service can be anything, it can be an API from a system that returns bank transactions or account transactions to me. Here’s a nice example that I always use: There’s an MCP server for Blender, a 3D rendering software, which I can use and then I can simply ask: render a living room with a nice lamp and a couch with two people sitting on it.

Then you can see how this model uses tools via the PC server and then creates this 3D scene in Blender. That’s the range of possibilities, truly endless.

devmio: Now we’re at a JAX, a Java conference. That’s really an AI topic, so how does it relate to Java? Maybe we’ll come back to that later. Java is a language that isn’t necessarily known for being widely used in machine learning; Python is faster. Is Java well positioned in this area?

Robert Glaser: To use these services, or maybe that’s not even necessary, but these are all just API calls. Yes, in fact, with the foundation models, he has also introduced a huge commodity factor. In the past, I had to train my own machine learning models. I had to have a team that labeled data and so on, and then it was still a case of narrow in, narrow out, now broad in, broad out.

I have models that can utilize everything from textual data to molecular structures or audio and video, because everything can be mapped into a sequential stream. And the models are generalists which can always be accessed via API.

Even the APIs are becoming standardized because all manufacturers are adapting the Open AI API spec. That’s a nice effect. That’s why there’s a great ecosystem in both the Java world and the Ruby or Python world.

I’ll just mention Spring AI. I could have started a year ago and built AI features into my applications. Spring AI, for example, has a range of functions that allow me to configure models flexibly. I can say, go to Open AI or use a semi-open model in our data center. Everything is already there

devmio: But the community there is so agile, creative, and innovative that solutions are emerging and already available.

Robert Glaser: Another aspect with MCPs, if I want to build a feature or a system with Java around my foundation model that other systems want to integrate via MCP, there is a Java SDK, there is an SDK, everything is there.

devmio: We’re excited to see how things progress. It’s sure to continue at a rapid pace.

Robert Glaser: You must read a hundred blogs every day to keep up with all of the AI innovations. It’s breathtaking to see the progress that’s being made. It’s nice to see that things are happening in tech again.

devmio: Thank you very much for that. For the insights here.

Robert Glaser: You’re very welcome. Thank you for letting me be here.

The post RAG & MCP for ML Devs: Practical AI Implementation appeared first on ML Conference.

Interview Archives - ML Conference

RAG & MCP for ML Devs: Practical AI Implementation