The ML.NET Framework from Microsoft has ten algorithms for classifications with several possible values (multiclass). FastTree, LightGBM and Fast Forest: Which algorithm is best suited for the available data? Which one can cope with more data, which one can manage with less? Which one can best interpret my complex or simple data structures? To answer these questions, you can either study the algorithm behind it or – whichever is more obvious – try it out. This is exactly what AutoML does. As part of ML.NET, it takes a data structure, trains several models with it, measures which algorithm works best, and then returns it to the developer.
Another new opportunity is opening up for developers in surface design. With Windows 10, Microsoft introduced the Universal Windows Platform (UWP). After decades of Windows Forms executables, UWP enables software to be delivered to the user within a fixed framework. Your own app or game can then be used on all those devices that use Windows 10 as their base operating system: Desktop PCs, smartphones, VR glasses (HoloLens), consoles (Xbox) and Windows IoT devices. Through consolidated APIs and strict rules for the interaction between app and operating system, UWP apps can be conveniently distributed to a large number of devices through the Windows Store. To define the user interface, UWP apps use the Extensible Application Markup Language (XAML) already known from the Windows Presentation Foundation (WPF). Compared to Windows Forms programming, this makes it easier to create resolution-independent interfaces. Programming with XAML is an important component for the platform independence of apps. UWP supports not only the classic input via mouse and keyboard, but also more modern forms of input such as voice input, camera images, and gestures. Figure 1 shows the Windows calculator in the new UWP design. Here, for example, the background is semi-transparent – in UWP apps this is a simple but effective color setting.
The possibilities at a glance
How do you combine a novel app lifecycle with a modern layout and the exactly fitting algorithm for a neural network? The answer to this question is the NuGet package Microsoft.ML. The platform independent package works on .NET Framework, .NET Core as well as .NET Standard and can therefore be used on Linux as well as Windows. But before I go into the details of the AutoML possibilities, let’s first take a look at the AI ecosystem of Microsoft.
The oldest framework used by Microsoft today is the Microsoft Cognitive Toolkit (CNTK), which was released in 2014. Initially it started as a C++ library – today it also offers wrappers for C#, Java and Python. The fastest successes are usually achieved by developers using Cognitive Services. These cloud-based services extend their own applications with models pre-trained by Microsoft and always kept up to date, for example for the recognition of spoken language, object recognition in images and the analysis of text.
With Azure Machine Learning, Microsoft offers a web-based drag-and-drop environment for creating simple to complex neural networks based on your own data. After successful model generation, the model is made available in the form of a web service which can be consumed by the application.
Visitors to various websites, such health insurance companies and product providers, may be familiar with a “How can I help?” chat box that enables natural-looking communication with users using a wealth of FAQ knowledge. The Bot Framework from Microsoft, among others, offers such human-machine interaction. A speech recognition service runs in the background. Speech is translated into keywords that can be used to provide the user with a qualified answer. In addition to the webchat, many other channels are available: common chat apps such as Telegram; the own application via API, or even via e-mail can be used as a communication channel.
A comparatively new service is Windows ML, which is available with newer Windows 10 versions and for the Windows Server 2019. As an interface between application and model, Windows ML provides Open Neural Network Exchange (ONNX) models with hardware acceleration. Applications then only consume this model, Windows ML takes care of the rest. The ONNX format was created by Facebook and Microsoft with the aim of making neural networks available across platforms. Furthermore, models created with the popular Python-based framework TensorFlow can be used in .NET environments – and vice versa.
Last but not least there is ML.NET. Especially for company-specific models, which for various reasons should not be generated in the cloud – for example for data protection reasons – ML.NET is a good choice for companies who want to integrate their own data into production via a model.
And this closes the circle to AutoML. Despite ever-increasing computing power for model generation on the leading platforms of Google, Amazon and Microsoft, this is not required for all models. Microsoft has done a good job in optimizing ML.NET compared to open source products and offers .NET developers a framework that can also deliver good results on consumer hardware. The NYC Taxi Fare dataset, for example, offers more than one million data records on taxi rides in New York City with a file size of only 24 MB. If AutoML is trained for only 60 seconds on an Intel i5 from 2012, the result is a model with a proud 97 percent accuracy when calculating prices using regression. Figure 2 shows the command line execution.
Microsoft is also taking several steps toward developers when it comes to AutoML technology accessibility, offering a project generator integrated in Visual Studio, a command-line tool, and optional code-only generation.
In addition to determining numerical values, as in the latter example of the taxis in New York City, Microsoft offers AutoML code generators for the following usage scenarios (as of December 2019):
- Classification of data into two groups: for example, “yes” and “no” for purchase decisions or “good” and “bad” for mood analysis of tweets or ratings (binary classification).
- Classification of data into three or more groups: An example is the determination of flower types when data such as measurements and colours are known (multiple classification).
- Recognition of correlations in images: Using ML.NET, images can be assigned to a camera, for example. Examples would be the detection of cars on a webcam to identify free parking spaces or the visual detection of production errors directly on the delivery belts.
Online vs. on-premise
UWP apps mainly run on on-premise devices. But is it a good idea to run machine learning on your own hardware because of this? Even if companies’ confidence in the cloud is growing, there may be reasons not to generate models in the cloud. There are also cases in which using cloud infrastructure for machine learning is the only option, for example with very large models. One of the factors is network connectivity. The Federal Ministry of Transport and Digital Infrastructure maintains the Broadband Atlas. In it, the map for commercial broadband availability glass fibre ≥ 1000 Mbit/s paints a sad picture: To date, few areas have fast connections nationwide. If companies use the same networks as private individuals, they are equipped with an average of 21 Mbit/s upload speeds. Each upload of 5 GB of data therefore takes almost 32 minutes — every time.
Since this is an average value, half of the companies are at even higher values, up to regions that do not even have DSL connections. With continued use of cloud services, there is also an additional burden on the potentially weak Internet connection. This makes it logical to consider having the generation take place on the company’s own systems. Microsoft has a way out for such cases, at least for Cognitive Services. Depending on the application, Microsoft offers a Docker Image. This allows the services otherwise running in the cloud to be used in the own server room without bandwidth suffering.
Other advantages of cloud services such as the Azure Machine Learning Studio over locally developed models are simplicity, performance and timeliness. Similar to SharePoint Online, Microsoft updates its ML Studio on the fly. New tools, bug fixes, optimizations and algorithms are available faster than an on-premise developer could capture or redevelop them. The performance sheet turns again completely around when it comes to deep learning. The possibilities of AutoML described so far generate comparatively small neural networks. In the above example table of taxis in New York City, there are six data points per data row. Figure 3 shows the model generated by ML.NET. If images are analyzed in Deep Learning, at an image resolution of 420 x 420 pixels, this means 176,400 data points per data line. Cloud services can provide the extra performance required without having to invest in additional hardware.
Handling training data in UWP apps
If the model generation or model usage is to take place from a UWP app, there is a comfort hurdle for the developer, but at the same time it forces him to clean programming. UWP applications cannot directly access files outside their own workspace. This is a reassuring feature for users: the application cannot read or even send its own financial planning.xlsx files without being asked, as was previously the case with classic executables. To access training data nevertheless, access must be legitimized by means of user interaction. This can be done, for example, through the familiar open file dialog:
var picker = new Windows.Storage.Pickers.FileOpenPicker(); picker.ViewMode = Windows.Storage.Pickers.PickerViewMode.Thumbnail; picker.SuggestedStartLocation = Windows.Storage.Pickers.PickerLocationId.DocumentsLibrary; picker.FileTypeFilter.Add(".csv"); StorageFile pickedFile = await picker.PickSingleFileAsync();
The file can then be copied to the working folder:
var currentWorkSpaceFolder = await ApplicationData.Current.LocalFolder.GetFolderAsync("TrainingsDaten"); StorageFile copiedFile = await pickedFile.CopyAsync(currentWorkSpaceFolder,"MeineTrainingsdaten.csv", NameCollisionOption.ReplaceExisting);
If the file is in its own working folder, it can be used as you are used to doing. The standard TextLoader of Microsoft.ML can now be used with StorageFile:
[…] mlContext.Data.LoadFromTextFile(copiedFile.Path, textloaderOptions)
copiedFile.Path contains the full path on the hard disk as string.:
In 2020, machine learning will combine many possibilities with high performance – an ideal environment for realizing large and small projects. When it is clear how your own data treasure should be used profitably, the next step is choosing the right tool. Here, developers and companies have access to the entire range of finished models up to in-house developments. UWP apps for creating and using models can also be considered.