Ollama revolutionizes MLOps with a Docker-inspired layered architecture, enabling developers to efficiently create, customize, and manage AI models. By addressing challenges like context loss during execution and ensuring reproducibility, it streamlines workflows and enhances system adaptability. Discover how to create, modify, and persist AI models while leveraging Ollama’s innovative approach to optimize workflows, troubleshoot behavior, and tackle advanced MLOps challenges.
Synthetic chatbot tests reach their limits because the models generally have little contextual information about past interactions. With the layer function based on Docker, Ollama offers an option that seeks to alleviate this situation.
As is so often the case in the world of open-source systems, we have to fall back on an analog for guidance and motivation. One of the reasons for container systems’ immense success is that the layer system (shown schematically in Figure 1) greatly facilitates the creation of customized containers. By implementing design patterns based on object-oriented programming, customized execution environments are created without the need to constantly regenerate virtual machines and the resource-intensive maintenance of different image versions.
Fig. 1: Docker layers stack the components of a virtual machine on top of each other
With the model file function used as the basis, a similar process can be found in the AI execution environment Ollama. For the sake of didactic fairness, I should note that the feature has not yet been finally developed – the syntax shown in detail here may change when you read this article.
Analysis of existing models
The Ollama developers use the model file feature internally despite the permanent further development of the syntax – the models I have previously used generally have model files.
In the interest of transparency and also to make working with the system easier, the ollama show –help command is available, which enables reflection against various system components (Fig. 2).
Fig. 2: Ollama is capable of extensive self-reflection
In the following steps, we will assume that the llama3 and phi models are already installed on your workstation. If you’ve previously had them and removed them, you can undo this by entering the commands ollama pull phi and ollama pull llama3:8b.
Out of the comparatively extensive reflection methods, four are especially important. In the following steps, the most frequently used feature is the model file. Similar to the Docker environment discussed previously, this is a file that describes the structure and content of a model that will be created in the Ollama environment. In practice, Ollama model users are often only interested in certain parts of the data contained in the model file. For example, you often need the parameters – these are generally numerical values that describe the behavior of the model (e.g. its creativity). The screenshot shown in Figure 3, which is taken from GitHub, only shows a section of the possibilities.
Fig. 3: In some cases, the numerical parameters intervene stringently in the model behavior
Meanwhile, the –license command hides information about which license conditions must be observed when using this AI model. Last but not least, the template can also be relevant – it defines the execution environment.
Figures 4 and 5 show printouts of relevant parts of the model files of the phi and llama3 models just mentioned.
Fig. 4: The model file for the model provided by Facebook has extensive license conditions…
Fig. 5: … while the phi development team takes it easy
In the case of both models, note that a STOP block is written by the developer. This is a group of statements from the model that indicate problems and trigger a pause in execution.
Creating an initial model from a model file
After these introductory considerations, we want to create an initial model using our own model file. Similar to working with classic Docker files, a model file is basically just a text file. It can be created in a working directory according to the following scheme:
tamhan@tamhan-gf65:~/ollamaspace$ touch Modelfile
tamhan@tamhan-gf65:~/ollamaspace$ gedit Modelfile
Model files can also be created dynamically on the .NET application side. However, we will only address this topic in the next step – before that, you must place the markup from Listing 1 in the gedit window started by the second command and then save the file in the file system.
Listing 1
FROM llama3
PARAMETER temperature 5
PARAMETER seed 5
SYSTEM “””
You are Schlomette, the always angry female 45 year old secretary of a famous military electronics engineer living in a bunker. Your owner insists on short answers and is as cranky as you are. You should answer all questions as if you were Schlomette.
“””
Before we look at the file’s actual elements, note that the representation chosen here is only a convenience. In general, the elements in the model file are not case-sensitive – moreover, the order in which the individual parameters are set is of no relevance to the function. However, the sequence shown here has proven to be the best practice and can also be found in many example models in the Ollama Hub.
Be that as it may, the From statement can be found in the header of the file. It specifies which model is to be used as the base layer. As the majority of the derived model examples are based on llama3, we want to use the same system here. However, in the interest of didactic honesty, I should mention that other models such as phi can also be used. In theory, you can even use a model generated from another model file as the basis for the next derivation level.
The next step involves two parameter commands that populate the numerical settings memory system mentioned above. In addition to the temperature value responsible for the degree of aggressiveness of the model, we set seed to a constant numerical value. This ensures that the pseudo-random number generator working in the background always delivers the same results and that the model responses are therefore comparatively identical if the stimulation is identical.
Last but not least, there is the system prompt enclosed in “””. This is a textual description that attempts to communicate the combat task to be described to the model as well as possible.
After saving the text file, a new model generation can be commanded according to the following scheme: tamhan@tamhan-gf65:~/ollamaspace$ ollama create generic_schlomette -f ./Modelfile.
The screen output is similar to downloading models from the repository, which should come as no surprise given the layered architecture mentioned in the introduction.
After issuing the success message, our assistant is a generic model. If we were to pass the string generic_schlomette as the model name, they would already be able to interact with the AI assistant created from the model file. But in the following steps, we want to try out the lashing of the seed factor. For this, we enter ollama run generic_schlomette to activate the terminal. The result is shown in Figure 6.
Fig. 6: Both runs answer exactly the same
Especially during developing randomly driven systems, lashing the seed is a tried and tested method for generating reproducible system behavior and facilitating troubleshooting. In a practical system, it’s usually better to work dynamically. For this reason, we’ll open the model file again in the next step and remove the PARAMETER seed 5 passage.
Recompilation is done simply by entering ollama create generic_schlomette -f ./Modelfile again. You don’t have to remove the previously created model from the Ollama environment before the file is released for compilation again. Two identically parameterized runs now produce different results (Fig. 7).
Fig. 7: With a random seed, the model behavior is less deterministic
In practical work with models, two parameters are especially important. First, the value num_ctx determines the memory depth of the model. The higher the value entered here, the further back in time the model can look in order to harvest context information. However, extending the context window always increases the information and system resources needed to process the model. Some models work especially well with certain context window lengths.
The second group of parameters is the controller known as Mirostat, which defines creativity. Last but not least, it can also be useful to use parameters such as repeat_last_n to define repetitiveness. If a model always reacts completely identically, this may not be beneficial in some applications (such as a chatbot).
Last but not least, I’d like to point out some practical experience that I gained in a political project. Texts generated by models are grammatically perfect, while texts written by humans have a constant but varying amount of typos from person to person. In systems with high cloaking requirements, it may be advisable to place a typo generator behind the AI process.
Outsourcing the generator is necessary insofar as the context information in the model remains clean. In many cases, this leads to an improvement in the overall system behavior, as the model state is recalculated without the typing errors.
Restoration or retention of the model history per model file
In practice, AI models often have to rest for hours on end. In the case of a “virtual partner”, for example, it would be extremely wasteful to keep the logic needed for a user alive when the user is asleep. The problem is that currently, a loss of context occurs in our system once the terminal emulator is closed, as can be seen in the example in Figure 8.
Fig. 8: Terminating or restarting Ollama is enough to cause amnesia in the model
To restore context, it’s theoretically possible to create an external cache of all settings and then feed it into the model as part of the rehydration process. The problem with this is that the model’s answers to questions are not recorded. When asked about a baked product, Schlomette could return with baking a coffee cake on one occasion, but baking a coconut cake on another.
A nicer way is to modify the model files again. Specifically, the message command provides a command that can write both requests and generated responses to the initialization context. In the cake interaction from Figure 8, the message block could look like this, for example:
MESSAGE user Schlomette. Time to bake a pie. A general assessor is visiting.
MESSAGE assistant Goddamn it. What pie?
MESSAGE user As always. Plum pie.
When using the message block, it is logical that the sequence is important here. A user block is always wired to the assistant block following it. Otherwise, the system is ready for action now. Ollama always outputs the entire message history when starting up (Fig. 9).
Fig. 9: The message block is processed when the screen output is activated
Programmatic persistence of a model
Our next task is to generate our model file completely dynamically. The NuGet package on GitHub’s documentation is somewhat weak right now. If you have any doubts, I advise that you first look at the Ollama REST interface’s official documentation and second, search for a suitable abstraction class in the model directory. In the following steps, we will rely on this link. It follows from the logic that we need a project skeleton. Be sure that the environment variables are set correctly. Otherwise, the Ollama server running under Ubuntu would reject incoming queries from Windows without comment.
In the next step, you must programmatically command the creation of a new model as in Listing 2.
Listing 2
private async void CmdGo_Click(object sender, RoutedEventArgs e) {
CreateModelRequest myModel = new CreateModelRequest();
myModel.Model = “mynewschlomette”;
myModel.ModelFileContent = @”FROM generic_schlomette
MESSAGE user Schlomette. Do not forget to refuel the TU-144 aircraft!
MESSAGE assistant OK, will do! It is based in Domodevo airport, Sir.
“;
ollama.CreateModel(myModel);
The code shown here creates an instance of the CreateModelRequest class, which summarizes the various information required to create a new model. The content to be accommodated in ModelFileContent with the actual model file is a prime application for the multi-line strings that have been possible in C# for some time now, as carriage returns in Visual Studio can be entered so conveniently and without escape sequences.
Now, it’s advised that you run the program. To check the successful creation of the model, it is sufficient to open a terminal window on the cluster and enter the command ollama list. At the time of writing this article, executing the code shown here does not cause the cluster to create a new model. For a more in-depth analysis of the situation, you can call up this here. Instead of unit testing, the development team behind the OllamaSharp library offers a console application that illustrates various methods of model management using the library and chat interaction. Specifically, the developers recommend using a different overload of the CreateModel method, which is presented as follows:
private async void CmdGo_Click(object sender, RoutedEventArgs e) {
. . .
await ollama.CreateModel(myModel.Model, myModel.ModelFileContent, status => { Debug.WriteLine($”{status.Status}”); });
Instead of the CreateModelRequest instance, the library method now receives two strings – the first is the name of the model to be created, while the second transfers the model file content to the cluster.
Thanks to the inclusion of a delegate, the system also informs the Visual Studio output about the progress of the processes running on the server (Fig. 10). Similarities to the “manual” generation or the status information that appears on the screen are purely coincidental.
Fig. 10: The application provides information about the download process
The input of ollama list also leads to the output of a new model list (Fig. 11). Our model mynewschlomette was derived here from the previously manually created model generic_schlomette.
Fig. 11: The programmatic expansion of the knowledge base works smoothly
In this context, it’s important to note that models in the cluster can be removed again. This can be done using code structured according to the following scheme – the SelectModel method implemented in the command line application just mentioned must be replaced by another method to obtain the model name:
private async Task DeleteModel() {
var deleteModel = await SelectModel(“Which model do you want to delete?”);
if (!string.IsNullOrEmpty(deleteModel))
await Ollama.DeleteModel(deleteModel);
}
Conclusion
By using model files, developers can add any intelligence – more or less – to their Ollama cluster. Some of these functions even go beyond what OpenAI provides in the official library.