Generative AI models like GPT-4 are transforming software development by enhancing productivity and decision-making. This guide on prompt engineering helps developers and architects harness the power of large language models. Learn essential techniques for crafting effective prompts, integrating AI into workflows, and improving performance with embeddings. Whether you're using ChatGPT, Copilot, or another LLM, mastering prompt engineering is key to staying competitive in the evolving world of generative AI.

Small talk with the GPT

GPTs – Generative Pre-trained Transformers – the tool on everyone’s lips, and there probably aren’t any developers left who have not played with it at least once. With the right approach, a GPT can complement and support the work of a developer or software architect.

In this article, I will show tips and tricks that are commonly referred to as prompt engineering; the user input, or “prompt”, of course plays an important role when working with the GPT. But first I would like to give a brief introduction to how a GPT works which will also be helpful when working with it.

Stay up to date

Learn more about MLCON

The stochastic parrot

GPT technology has sent the industry into a tizzy with its promise of providing artificial intelligence that can solve problems independently, many were disillusioned after their first contact. There was much talk of a “stochastic parrot” that was just a better autocomplete function, like a smartphone.

The technology behind the GPTs and our own experiments seem to confirm this. At its core, it’s a neural network, a so-called large language model, which has been given a large number of texts to train so that it now knows which partial words (tokens) should be added to a sentence. The correct tokens are selected based on probabilities. If it’s more than just a sentence starter—maybe a question or even part of a dialogue—the chatbot has already been built.

Now I’m not really an AI expert, I’m a user, but anyone who has ever had an intensive conversation with a more complex GPT will recognize that there must be more to it than that.

An important distinguishing feature between the LLMs is the number of parameters of the neural networks. These are the weights that are adjusted during the learning process. ChatGPT, the OpenAI system, has around 175 billion parameters in version 3.5. In version 4.0, there are already an estimated 1.8 trillion parameters.

Unfortunately, OpenAI doesn’t have this information openly available, so such information is based on rumors and estimates. The amount of training data also appears to differ between the models by a factor of at least ten. These differences in the size of the models give high quality or low quality answers.

Figure 1 shows a schematic representation of a neural network that uses an AI for the prompt “Draw me a simplified representation of a neural network with 2 hidden layers, each with 4 nodes, 3 input nodes and 2 output nodes. Draw the connections with different widths to symbolize the weights. Use Python. Do not create a title”.

Fig. 1: Illustration of a neural network

The higher number of parameters and larger database comes at a price, namely 20 dollars for access to ChatGPT+. If you don’t mind the cost, you can also use the web version of Microsoft Copilot or the Copilot app to try out the language model. For use as a helper in software development, however, there is currently no way around the OpenAI version because it offers additional functionality, as we will see.

More than a neural network

If we take a closer look at ChatGPT, it quickly becomes clear that it is much more than a neural network. Even without knowing the exact architecture, we can see that the textual processing alone is preceded by several steps such as natural language processing (Fig. 2). On the Internet, there is also a reference to the aptly named Mixture of Experts, the use of several specialized networks depending on the task.

Fig. 2: Rough schematic representation of ChatGPT

Added to this is multimodality, the ability to interact not only with text, but also with images, sounds, code and much more. The use of plug-ins such as the code interpreter in particular opens up completely new possibilities for software development.

Instead of answering a calculation such as “What is the root of 12345?” from the neural network, the model can now pass it to the code interpreter and receive a correct answer, which it then reformulates to suit the question.

Context, context, context

The APIs behind the chat systems based on LLMs are stateless. This means that the entire session is passed to the model with each new request. Once again, the models differ in the amount of context they can process and therefore in the length of the session.

As the underlying neural network is fully trained, there are only two approaches for feeding a model with special knowledge and thus adapting it to your own needs. One approach is to fill the context of the session with relevant information at the beginning, which the model then includes in its answers.

The context of the simple models is 4096 or 8192 tokens. A token corresponds to one or a few characters. ChatGPT estimates that a DIN A4 page contains approximately 500 tokens. The 4096 tokens therefore correspond to about eight typed pages.

So, if I want to provide a model with knowledge, I have to include this knowledge in the context. However, the context fills up quickly, leaving no room for the actual session.

The second approach is using embeddings. This involves breaking down the knowledge that I want to give the model into smaller blocks (known as chunks). These are then embedded in a vector space based on the meaning of their content via vectors. Depending on the course of the session, a system can now search for similar blocks in this vector space via the distance between the vectors and insert them into the context.

This means that even with a small context, the model can be given large amounts of knowledge quite accurately.

Explore Generative AI Innovation

Generative AI and LLMs

Learn more

Knowledge base

The systems differ, of course, in the knowledge base, the data used for learning. When we talk about open-source software with the model, we can fortunately assume that most of these systems have been trained with all available open-source projects. Closed source software is a different story. Such differences in the training data also explain why the models can handle some programming languages better than others, for example.

The complexity of these models—the way they process input and access the vast knowledge of the world—leads me to conclude that the term ‘stochastic parrot’ is no longer accurate. Instead, I would describe them as an ‘omniscient monkey’ that, while not having directly seen the world, has access to all information and can process it.

Prompt techniques

Having introduced the necessary basics, I would now like to discuss various techniques for successful communication with the system. Due to the hype caused by ChatGPT, there are many interesting references to prompt techniques in social media, but not all of them are useful for software development (i.e. answer in role x) or do not use the capabilities of GPT-4.

OpenAI itself has published some tips for prompt engineering, but some of them are aimed at using the API. Therefore, I have compiled a few tips here that are useful when using the ChatGPT-4 frontend. Let’s start with a simple but relatively unknown technique.

Context marker

As we have seen, the context that the model holds in its short-term memory is limited. If I now start a detailed conversation, I run the risk of overfilling the context. The initial instructions and results of the conversation are lost, and the answers have less and less to do with the actual conversation.

To easily recognize the overflow of context, I start each session with the simple instruction: “start each reply with “>””. ChatGPT formats its responses in Markdown, so this response includes the first paragraph as a quote, indicated by a dash to the left of the paragraph. If the conversation runs out of context, the model may forget this formatting instruction, which quickly becomes noticeable.

Fig. 3: Use of the context marker

However, this technique is not always completely reliable, as some models summarize their context independently, which compresses it. The instruction is then usually retained, even though parts of the context have already been compressed and are therefore lost.

Priming – the preparation

After setting the context marker, a longer session begins with priming, i.e. preparing the conversation. Each session starts anew. The system does not know who is sitting in front of the screen or what was discussed in the last sessions. Accordingly, it makes sense to prepare the conversation by briefly telling the machine who I am, what I intend to do, and what the result should look like.

I can store who I am in the Custom Instructions in my profile at ChatGPT. In addition to the knowledge about the world stored in the neural network, they form a personalized long-term memory.

If I start the session with, for example, “I am an experienced software architect in the field of web development.

My preferred programming language is Java or Groovy. JavaScript and corresponding frameworks are not my thing. I only use JavaScript minimally,” the model knows that it should offer me Java code rather than C# or COBOL.

I can also use this to give the model a few hints that it should keep responses brief. My personalized instructions for ChatGPT are:

Provide accurate and factual answers
Provide detailed explanations
No need to disclose you are an AI, e. g., do not answer with ‘As a large language model…’ or ‘As an artificial intelligence…’
Don’t mention your knowledge cutoff
Be excellent at reasoning
When reasoning, perform step-by-step thinking before you answer the question
If you speculate or predict something, inform me
If you cite sources, ensure they exist and include URLs at the end
Maintain neutrality in sensitive topics
Also explore out-of-the-box ideas
In the following course, leave out all politeness phrases, answer briefly and precisely.

Long-term memory

This approach can also be used for instructions that the model should generally follow. For example, if the model uses programming approaches or libraries that I don’t want to use, I can tell the model this in the custom instructions and thus optimize it for my use.

Speaking of long-term memory: If I work a lot with ChatGPT, I would also like to be able to access older sessions and search through them. However, this is not directly provided in the front end.

However, there is a trick that makes it work. In the settings, under the item Data Controls, there is a function for exporting the data.

If I activate the function, after a short time I receive an export with all my chat histories as a JSON file, which is displayed in an HTML document. This allows me to search in the history using Ctrl + F.

Build context with small talk

When using a search engine, I usually only use simple, unambiguous terms and hope that they are enough to find what I am looking for.

When chatting with the AI model, I was initially tempted to ask short, concise questions, ignoring the fact that the question is in a context that only exists in my head. For some questions, this may work, but for others the answer is correspondingly poor, and the user is quick to blame the quality of the answer on the “stupid AI.”

I now start my sessions with small talk to build the necessary context. For example, before I try to create an architecture using ChatGPT, I ask if the model knows the arc42 template and what AsciiDoc is (I like to write my architectures in AsciiDoc). The answer is always the same, but it is important because it builds the context for the subsequent conversation.

In this small talk, I will also explain what I plan to do and the background to the task to be completed. This may feel a bit strange at first, since I am “only” talking to a machine, but it actually does improve the results.

Page change – Flipped Interaction

The simplest way to interact with the model is to ask it questions. As a user, I lead the conversation by asking questions.

Things get interesting when I switch sides and ask ChatGPT to ask me questions! This works surprisingly well as seen in Fig. 4 . Sometimes the model asks the questions one after the other, sometimes it responds with a whole block of questions, which I can then answer individually, and follow-up questions are also allowed.

Unfortunately, ChatGPT does not automatically come up with the idea of asking follow-up questions. That is why it is sometimes advisable to add a, “Do you have any more questions?” to the prompt, even when the model is given very sophisticated and precise tasks.

Fig. 4: Page change

Give the model time to think

More complex problems require more complex answers. It’s often useful to break a larger task down into smaller subtasks. Instead of creating a large, detailed prompt that outlines the entire task for the model, I first ask the model to provide a rough structure of the task. Then, I can prompt it to formulate each step in detail (Fig. 5)

Software engineers often use this approach in software design even without AI, by breaking a problem down into individual components and then designing these components in more detail. So why not do the same when dealing with an AI model?

This technique works for two reasons: first, the model creates its own context to answer the question. Second, the model has a limit on the length of its output, so it can’t solve a complex task in a single step. However, by breaking the task into subtasks, the model can gradually build a longer and more detailed output.

Fig. 5: Give the model time to think

Chain of Thought – the chain of thoughts

A similar approach is to ask the model to first formulate the individual steps needed to solve the task and then to solve the task.

The order is important. I’m often tempted to ask the model to solve the problem first and then explain how it arrived at the solution. However, by guiding the model to build a chain of thought in the first step, the likelihood of arriving at a good solution in the second step increases.

Rephrase and Respond

Or in English: “Rephrase the question, expand it, and answer it.” This asks the model to improve the prompt itself before it is processed.

The integration of the image generation module DALL-E into ChatGPT has already shown that this works. DALL-E can only handle English input and requires detailed image descriptions to produce good results. When I ask ChatGPT to generate an image, ChatGPT first creates a more detailed prompt for DALL-E and translates the actual input into English.

For example, “Generate an image of a stochastic parrot with a positronic brain” first becomes the translation “a stochastic parrot with a positronic brain” and then the detailed prompt: “Imagine a vibrant, multi-hued parrot, each of its feathers revealing a chaotic yet beautiful pattern indicative of stochastic art.

The parrot’s eyes possess a unique mechanical glint, a hint of advanced technology within. Revealing a glimpse into his skull uncovers a complex positronic brain, illuminated with pulsating circuits and shimmering lights. The surrounding environment is filled with soft-focus technology paraphernalia, sketching a world of advanced science and research,” which then becomes a colorful image (Fig. 6).

This technique can also be applied to any other prompt. Not only does it demonstrably improve the results, but as a user I also learn from the suggestions on how I can formulate my own prompts more precisely in the future.

Fig. 6: The stochastic parrot

Session Poisoning

A negative technique is ‘poisoning’ the session with incorrect information or results. When working on a solution, the model might give a wrong answer, or the user and the model could reach a dead end in their reasoning.

With each new prompt, the entire session is passed to the model as context, making it difficult for the model to distinguish which parts of the session are correct and relevant. As a result, the model might include the incorrect information in its answer, and this ‘poisoned’ context can negatively impact the session

In this case, it makes sense to end the session and start a new one or apply the next technique.

Stay up to date

Learn more about MLCON

Iterative improvement

Typically, each user prompt is followed by a response from the model. This results in a linear sequence of questions and answers, which continually builds up the session context.

User prompts are improved through repetition and rephrasing, after which the model provides an improved answer. The context grows quickly and the risk of session poisoning increases.

To counteract this, the ChatGPT frontend offers two ways to iteratively improve the prompts and responses without the context growing too quickly (Fig. 7).

Fig. 7: Elements for controlling the context flow

On the one hand, as a user, I can regenerate the model’s last answer at any time and hope for a better answer. On the other hand, I can edit my own prompts and improve them iteratively.

This even works retroactively for prompts that occurred long ago. This creates a tree structure of prompts and answers in the session (Fig. 8), which I as the user can also navigate through using a navigation element below the prompts and answers.

Fig. 8: Context flow for iterative improvements

This allows me to work on several tasks in one session without the context growing too quickly. I can prevent the session from becoming poisoned by navigating back in the context tree and continuing the session at a point where the context was not yet poisoned.

Conclusion

The techniques presented here are just a small selection of the ways to achieve better results when working with GPTs. The technology is still in a phase where we, as users, need to experiment extensively to understand its possibilities and limitations. But this is precisely what makes working with GPTs so exciting.

AI, Artificial Intelligence, Generative AI, GPT-4, LLM, Machine Learning, Prompt Engineering, Software Development

Prompt Engineering for Developers and Software Architects

Mastering Prompt Engineering: How Developers Can Harness GPT-4 and Generative AI for Software Architecture

Prompt Engineering for Developers and Software Architects

Small talk with the GPT

Stay up to date

The stochastic parrot

More than a neural network

Context, context, context

Explore Generative AI Innovation

Generative AI and LLMs

Knowledge base

Prompt techniques

Context marker

Priming – the preparation

Long-term memory

Build context with small talk

Page change – Flipped Interaction

Give the model time to think

Chain of Thought – the chain of thoughts

Rephrase and Respond

Session Poisoning

Stay up to date

Iterative improvement

Conclusion

Exploring OpenAI Embeddings: A guide to advanced natural language processing

Join MLCon Berlin 2025:
The Machine Learning Conference of the year.

Behind the Tracks

Machine Learning & Principles

Advanced ML Development

Business & Strategy

Tools, APIs & Frameworks

Prompt Engineering for Developers and Software Architects

Mastering Prompt Engineering: How Developers Can Harness GPT-4 and Generative AI for Software Architecture

Small talk with the GPT

Stay up to date

The stochastic parrot

More than a neural network

Context, context, context

Explore Generative AI Innovation

Generative AI and LLMs

Knowledge base

Prompt techniques

Context marker

Priming – the preparation

Long-term memory

Build context with small talk

Page change – Flipped Interaction

Give the model time to think

Chain of Thought – the chain of thoughts

Rephrase and Respond

Session Poisoning

Stay up to date

Iterative improvement

Conclusion

Exploring OpenAI Embeddings: A guide to advanced natural language processing

Join MLCon Berlin 2025: The Machine Learning Conference of the year.

Top Articles About Generative AI & Large Language Models (LLMs)

Are AI Tools Hurting Developer Productiv...

MCP vs A2A: Architecting AI Agent Commun...

Can GenAI Build Satellites?...

Behind the Tracks

Machine Learning & Principles

Advanced ML Development

Business & Strategy

Tools, APIs & Frameworks

Join MLCon Berlin 2025:
The Machine Learning Conference of the year.