blog

ML Conference 2019
The Conference for Machine Learning Innovation
June 17 - 19 in Munich
1
Apr

Tutorial: An introduction to the Python programming language

When Guido van Rossum developed Python, he wanted to create a "simple" programming language that bypassed the vulnerabilities of other systems. Due to the simple syntax and sophisticated syntactic phrases, the language has become the standard for various scientific applications such as machine learning.

Python – The Algorithmic Expert’s Tool

Anyone who grew up with C, Java or Perl might perhaps view Python as a programming language for less gifted developers. This is already unfair, because, in terms of libraries, Python has huge volumes and also offers some very interesting syntactic gimmicks

This Python tutorial assumes that the reader knows another programming language – be it C ++ or Java – and wants to learn more about the specifics of Python. The host should be an AMD 8 core workstation running on Ubuntu 14.04. This does not mean that Python does not exist for macOS or Windows: There is hardly an operating system out there that has to make do without Python.

Python versions: Taming the chaos!

The resignation of Guido van Rossum a few weeks ago confirmed a serious vulnerability of the Python world: We never really had a clear roadmap – Python is the reference implementation for a system of intelligent chaos. The relatively high-level popularity has led to a problematic situation. There are a lot of different versions of Python that are by and largely incompatible with each other. In addition, many Linux distributions use Python to implement system services – if you accidentally install an incompatible interpreter, you end up with a station that can no longer be booted.

On Ubuntu 14.04, the situation is hairy to the point that Python and Python3 can call different versions of the interpreter:

tamhan@TAMHAN14:~$ python -V
Python 2.7.6
tamhan@TAMHAN14:~$ python3 -V
Python 3.4.3

If you add libraries and other fun stuff too that, it calls to mind the infamous ballistic gel cube. It is only a matter of time before a change brings the enchilada to a collapse. To circumvent the problem, the system of the virtual environment can be used; its function is shown schematically in Figure 1.

Python tutorial - Fig. 1: Virtual Environments isolate Python execution environments analogous to a container
Fig. 1: Virtual Environments isolate Python execution environments analogous to a container

The installation of the virtual environment engine depends on the operating system and Python version. On the author’s workstation, the installation is run by entering the following command; users of alternative operating systems can find more information online:

tamhan@TAMHAN14:~$ sudo apt-get install python3.4-venv

A new virtual environment is created by a command sequence that also varies from platform to platform. On Ubuntu 14.04, the commands look like this:

tamhan@TAMHAN14:~/suspython$ python3 -m venv workspace
tamhan@TAMHAN14:~/suspython$ cd workspace/
tamhan@TAMHAN14:~/suspython/workspace$ source bin/activate
(workspace) tamhan@TAMHAN14:~/suspython/workspace$

The reward for all your effort is the work environment shown in Figure 2, which contains a configuration file as well as a set of references that allow the shell and operating system to “bend” the access to the Python interpreter. Another nice aspect of the virtual environment is that it allows for the local installation of libraries – if the library is nested in the virtual environment, it is not dependent on the rest of the system.

picture:python2
Fig. 2: Virtual Environments consist of shortcuts and configuration scripts

After executing the source bin / activate command, the terminal is parameterized. One of the ways you will recognize this is that the name of the work environment appears in a bracket before the actual prompt. If you want to load a virtual environment which has already been created afterwards, the source has to be entered again:

tamhan@TAMHAN14:~/suspython/workspace$ source bin/activate
(workspace) tamhan@TAMHAN14:~/suspython/workspace$

Interactive and input-controlled

Programming languages such as C follow the EVA principle. The programmer supplies one or more code files that are compiled and executed. In Python, the situation is quite different – just try to enter the python command and press CTRL+C a few times to enjoy the action shown in Figure 3.

Python tutorial - Fig. 3: The interpreter doesn’t seem to be overly impressed by the keyboard interrupt
Fig. 3: The interpreter doesn’t seem to be overly impressed by the keyboard interrupt

This behavior is caused by the fact that the Python interpreter can also be operated interactively. You enter commands line by line and get an output immediately – this is ideal as a makeshift calculator. If you want to leave this operating mode, you need to use the CTRL + D key combination.

In practice, Python is used to execute supplied files. Code is in the form of files ending in .py, which can be executed as follows:

(workspace) tamhan@TAMHAN14:~/suspython/workspace$ python test.py

Python 2 versus Python 3

When you’re young and confrontational, there comes a time when you clash with your colleagues about the best way to format code. What sounds harmless to a greying coder can put deep friendships to the test.

Python avoids this problem because the language creates radical restrictions on the structure of the code files. The first difference is that the Python interpreter divides code into logical and physical lines. A logical line can consist of several physical lines, and alternatively, a physical line using the ; operators can include several logical lines.

Python 3 allows the use of UTF-8 characters in the source code, while Python 2 is limited to ASCII. Both programming schemes can record tabs as well as whitespace, which is a source of discussion. In the quasi-standard document PEP8 [1], Guido van Rossum writes that the use of spaces is preferred over the use of tabs. Ideally, we use four spaces per indentation level, and an intelligently adjusted editor can automatically enter them when the Tab-key is pressed.

Python 2 introduced logic that seeks to replace tabs with spaces during program execution – but when working with Python 3, you only need to use tabs or only spaces in the file. The cause of this pedantic-sounding policy is explained in the following sample program:

weight = float(input("Tam Air - weight calculator"))
print("I have a box!")
if weight > 70:
  print("Pay the penalty: 25€.")
elif weight > 50:
  print("Pay the penalty: 5€.")
print("Tam Air thanks you")

This primitive weight calculator of a low-profile airline is interesting because it demonstrates a group of Python-specific elements. First, the print function is executed with braces (curly brackets), so it’s a common function – in Python 2, the command was an intrinsic built into the language that logically needed to be activated without brackets.

C and Visual Basic programmers don’t believe their eyes, because both the brackets and the EndIf statements are missing. The code blocks of selections, interactions and similar elements in Python are determined solely by the number of whitespaces that precedes them.

The interpreter is less cooperative in enforcing this policy – for example, let’s slightly adjust the if part and enter five spaces instead of four when inserting the second print command:

if weight > 70:
  print("Pay the penalty: 25€.")
    print ("You packed too much!")

Anyone attempting to execute this program will be penalized with the error message shown in Figure 4 . In the interest of completeness, it should be noted that this is a fundamental design concept of Python – those who cannot cope with it need to choose a different programming language.

Python Tutorial - Fig. 4: Errors in indentation are severely punished by the interpreter
Fig. 4: Errors in indentation are severely punished by the interpreter

Python’s data types

An bad old joke claims that a programmer who knows one object-oriented language knows them all. This may apply to Python to a certain degree, as it does support classes and such. But interestingly enough, Python has a significantly broader footing in basic data types than the competition. While C and its peers only directly support primitive variables, Python offers wealthy diversity in this regard.

Stay tuned!
Learn more about ML Conference:

 

Experienced developers may not like it when they prepare lists and the like by hand. Anyone who sees Python as a tool to implement an algorithm is happy to have to perform a few jobs as possible to accomplish the actual business task.

A pleasing expansion of this is the direct support of complex numbers, that is, a presentation format that is widely used in electrical engineering in particular and is composed of absolute value and angle. For example, complex numbers are used to describe voltages and currents in circuits that are time-delayed during sinusoidal excitation.

In places where a developer has to program things in other languages by hand, Python offers direct support with a complex data type:

import cmath
z = complex(4,4);
print ("Realpart : ",end="")
print (z.real)
print ("Imaginarypart: ",end="")
print (z.imag)

Yet electrical engineers and armament electronic technicians become sad at this point: In Python, complex numbers are generally presented in components, while the polar form is only available as a calculated presentation. This source illustrates the import of the cmath module, which provides the functions required to process complex numbers. The complex call method of print uses a tuple – a construction that we will want to take a look at later. As an old electronic buff, the author simply cannot resist the temptation to present rect. This is how to transfer a value in polar coordinates to a Python variable:

cmath.rect(r, phi)

Python elements in rank and file

Another feature of the rich diversity of the basic types in the language is the sequence type. The question of what a sequence type is exactly will be keeping a room full of Python developers busy for a long time. The classic “Python in a Nutshell” refers to sequence types as fields that implement a kind of CV memory that is made up of an integer and another type of value. It should be noted that this definition is not 100 percent correct – the Named tuple breaks with the requirement of the presence of an ints as a primary key.

The open architecture of Python allows libraries and third-party developers to create their own sequence types. In practice, the following seven elements are particularly common:

  • Buffers
  • Byte arrays
  • Lists
  • Strings
  • Tuples
  • Unicode strings
  • Xranges

It is no accident that arrays do not appear on this list. To be sure, the data structures known from C are available in Python in the form of various modules that belong to the base distribution. However, we rarely use them for anything but mathematical processes and apply Python-specific elements instead.

Since a complete discussion of all these types of data in a single article is would be unrealistic, let’s focus on tuples and their cohorts. We basically already became acquainted with the elements earlier – the print function was executed with a strange parameter:

print ("Imaginarypart: "End =" ")

A primitive tuple is a list, which however Python declares to be immutable. Hiding behind this complex-sounding term is the idea that the object no longer changes at runtime – mutable elements on the other hand sometimes change even after the declaration. From a syntactic point of view, the two memory fields do not contribute much – Table 1 demonstrates the differences.

Tuple sample code Sample code list
thistuple = (“apple”, “banana”, “cherry”)
print(thistuple[1])
thislist = [“apple”, “banana”, “cherry”]
thislist[1] = “blackcurrant”
print(thislist)

Table 1: Differences between a list and a tuple

Tuples and lists differ mainly in how they are created. If a developer wants to create a tuple, the elements to be created are enclosed in brackets. A list is created by using the square brackets, which in turn is used in other programming languages as an array symbol.

At runtime, meanwhile, there is no big difference – the tuple also has access to the individual elements with square brackets. It is important that changes to the tuple or list structure are only allowed in the case of the list. However, this does not mean that all elements located in a tuple are not changeable. For example, if you save a list as a member of a tuple, the individual values of the list can be changed quite well – only their position in the tuple remains constant.

The issue of when to use lists and when to use tuples can definitely be the topic of a lively discussion [2]. It is logical though that the unchangeable form at runtime makes the tuple ideal for any situation where the information stored will not be subject to change. Another excellent area of application is speed-related situations – since the memory structure no longer needs to be changed, the variable proves to be more powerful.

Yet all this still does not explain the execution of print used above. A tuple, as well as a list, assigns numerical values to the contained elements. There are situations in which you would prefer other key types. This case sets the stage for the Named Tuple, which found its way into the language standard with Python 2.6 or 3.0. Creating a named tuple could not be easier:

from collections import namedtuple
Point = namedtuple('Point', 'x y')
pt1 = Point(1.0, 5.0)
pt2 = Point(2.5, 1.5)
print(pt1.x)
print(pt1[0])

Using the correct load command is important. We use from – import to be able to directly access the namedtuple function in the “Collections” package. This then serves as a generator, with which we bring two points to life.

It is interesting that the elements located in a named tuple cannot be addressed by their names. In the interest of backward compatibility, the collections class also makes sure that classic numeric access works without issue as well.

Theoretically, such a named tuple can be passed directly to a function. However, the strange execution of print seen above has a different cause: We are dealing with named function parameters. So we would also like to take a closer look at them through a small example. The hurz function realizes the weighted summation of four parameters, which for the sake of convenience listen for the names a, b, c and d.

def hurz(a, b, c, d):
  return a*1 + b*2 + c*3 + d*4
print (hurz(1,2,3,4))
print (hurz(4,3,2,1))

Due to the multiplication with different weights, the order of the parameters is important. The first execution of the function returns the value 30, while the second execution, with a different order of numbers, only comes up with a sum of 20. Using “named” parameters allows us to bypass the issue:

def hurz(a, b, c, d):
  return a*1 + b*2 + c*3 + d*4
print (hurz(1,2,3,4))
print (hurz(d=4,c=3,b=2,a=1))

Running this program will cause the same number to appear twice on the screen. Namely, the interpreter inserts the supplied values into the desired parameter slots based on the names.

More interesting in this context is the question of what happens if you do not supply all the parameters. As an example, let’s say we want to provide hurz with values for a and d only:

def hurz(a, b, c, d):
  return a*1 + b*2 + c*3 + d*4
print (hurz(d=4,a=1))

The execution of this program will be acknowledged with the error shown in Figure 5. So supplying optional parameters requires some extra work, which must be done in the method header.

Python tutorial -  Fig. 5: The absence of the "b" and "c" parameters causes the program execution to fail
Fig. 5: The absence of the “b” and “c” parameters causes the program execution to fail

A naive attempt to circumvent the problem would be to write default values to values b and c only:

def hurz(a, b, c, d):
  return a*1 + b*2 + c*3 + d*4
print (hurz(d=4,a=1))

Unfortunately, this approach is not promising. Although the Python language standard may be very flexible, it does stipulate here that parameters without default values must always be placed before the first parameter, which has a default value. So to get a compile-capable program, we also need to write a default value to the d value to have the pleasure of a working variant. This seemingly academic-sounding gimmick is actually very important in practice. Many Python functions and libraries – not just the print used above – have one or two fixed parameters and are otherwise supplied with information through a kind of named tuple from various parameters. We will see that once again further on in the article when we turn to the automated execution of diagrams.

The list as a tool

Storing information in lists is seldom an end in itself. In the scientific field in particular, we usually deal with measured or calculated values which should be weighted or otherwise modified, evaluated or processed by program logic. Python has been aware of this task or situation for a long time and takes account of it through extensive methods for list processing. Regardless of the immense scope of the features, we only want to pick out a few examples that are interesting for newcomers and important in machine learning.

To work with lists and such, we first need to have some information. Since interacting with real files at this point would take us too far, we’ll use the range operator instead:

mylist = range(2,20)
print(mylist)
for n in mylist:
  print(n)

In Python 3, range is a comparatively flexible method that creates more or less any desired list from numbers. In principle, only a final value is required; If you also specify a start value and/or a step size, you can influence the output even more precisely. Interesting in this context is the behavior of the execution of print if it is only supplied with the mylist object. Figure 6 shows what you can expect.

Python tutorial - Fig. 6: The output of individual elements in Python can only be made by using an explicit command
Fig. 6: The output of individual elements in Python can only be made by using an explicit command

Now that we have a sequence or list, we can begin to make some calculations on it. By far the simplest calculation method is map. You command the interpreter to apply a supplied function to all elements of the list and then return a new list or an intertable which contains the results of the respective elements. An example would be to double the values, which can be accomplished with the following code:

def worker(x):
  return x*2
mylist = range(2,20)
values = map(worker, mylist)
for n in values:
  print(n)

The map function accepts a lambda as the first parameter. Anyone who has previously done any programming using Microsoft programming languages knows instinctively what hides behind this: A kind of function pointer which specifies the structure of the function to be created, but otherwise leaves developers free reign during creation.

When you execute this program, you will see the doubled values generated by Range in the output. It follows from this that the map command actually applied the worker function to all elements of mylist and returned the resulting field in the value value. Using a for iteration will then provide an output according to the scheme we’ve seen earlier.

If you don’t like the delivered values and wish to discard them, you can use the filter method instead. Last but not least, there is also reduce. The function, no longer natively implemented in Python 3.0 due to its extreme unpopularity with Guido van Rossum, allows for the summarization of values. Because this is an interesting example, here is a short implementation:

import functools
def worker(x, x2):
  return x+x2
mylist = range(2,20)
value = functools.reduce(worker, mylist)
print(value)

The lambda function passed to reduce takes two parameters instead of a single one. During the first run of the function, the first parameter that of the first field, while the second parameter is always filled with the value of the subsequent field. It gets interesting from the second run: The first parameter passed to the second invocation of the worker is the return value of the first one. The second parameter comes from the third field, et cetera. In this example, a conveniently calculated sum of all numbers in the field is created in this manner.

It should be noted that the dedicated import of the module used here is only required in Python 3, in 2.X the function was part of the main module. As an alternative, there is the concept of list abstraction, which is also known as list comprehension. Here you have a set of methods derived from set theory to work quickly and efficiently with lists. An example of this is the following code, which generates a sequence of squares in interactive operation:

>>> squares = []
>>> for x in range(10):
...     squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

By using a comprehension, this task can be significantly shortened:

squares = [x**2 for x in range(10)]

The question of whether you should go for comprehensions rather than executing reduce () can be the topic of a lively discussion.

Python libraries save time

Although there is much more to be said about the syntax of Python, we need to address another important aspect of the ecosystem: the multitude of libraries.

The SciPy project plays a special role here – its abundant library resources are briefly summarized in Figure 7. Both NumPy and SciPy count as quasi-standard libraries, which are easily to download on just about any Python installation. They provide developers with powerful mathematical procedures that would cost thousands of euros to implement and/or license just a few years ago.

Python tutorial - Fig. 7: In addition to the namesake library, the SciPy project also offers other libraries
Fig. 7: In addition to the namesake library, the SciPy project also offers other libraries

For practical use, it is recommended to download a whole set of utilities. If you have the Virtual Environment activated, just enter the following command sequence:

sudo apt-get install libatlas-base-dev gfortran
sudo apt-get build-dep python-numpy python-scipy python3-numpy python3-scipy
python -m pip install numpy scipy matplotlib

Avoid errors!
For complex libraries, it sometimes makes more sense to go with a global installation. Although the packages provided by the distribution provider are almost always outdated, they are precompiled.

Alternatively, you can also do a global installation. However, can become hairy, because both NumPy and SciPy are in permanent development. If another user or another program installs a more recent or older version, your own program may stop working. In any case, PIP is a kind of package manager for Python libraries [3].

The installation of Fortran and libatlas is not a whim of the author: NumPy and SciPy come with a lot of native code that optimizes the performance of the system and needs to be compiled before the environment can be used. To demonstrate the possibilities of the package, we’d like to execute a program that displays a pie chart on the screen. You can check out a naive little implementation in Listing 1.

Listing 1
import matplotlib.pyplot as plt
labels = ‘A’, ‘B’, ‘C’, ‘D’
sizes = [15, 30, 45, 10]
explode = (0, 0.1, 0, 0)
fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct=’%1.1f%%’, shadow=True, startangle=90)
ax1.axis(‘equal’)
plt.show()

For the sake of completeness, I’d like to note that we entering hairy terrain at this point. The exact behavior of matplotlib varies from workstation to workstation. It may be that some of the problems described here do not occur on your PC or Python distribution.

From a programming point of view, there is not much that is new here: We create a tuple and a list of information that influences the behavior of the diagram engine. In the next step we enter some values. pie is interesting in that it only requires comparatively few fixed parameters, the more precise parameterization of the behavior takes place via dozens of optional parameters, which you only need to specify as needed. The blocking call (according to the documentation) of show should in theory then ensure that the diagram appears on the screen.

That’s certainly not the case on the author’s workstation. The program runs smoothly, but then it ends and no diagram appears on the screen. The cause of the strange behavior is that matplotlib splits the task into two parts, as shown in Figure 8 . Incidentally, this procedure is in no way a specific feature, rather it can also be found in many other Python libraries.

Python tutorial - Fig. 8: Introducing a platform-specific back end object is considered "clean" in the world of Python
Fig. 8: Introducing a platform-specific back end object is considered “clean” in the world of Python

If you pull your library from the package sources of the respective distribution, has – as is mentioned in the box – no issues. Since we made the installation via pip, we must first provide for the provision of a backend. First, we switch to interactive mode, where we query the library about the currently enabled back end:

(workspace) tamhan@TAMHAN14:~/suspython/workspace$ python
. . .
>>> import matplotlib
>>> matplotlib.get_backend()
'agg'

To solve the problem we need to download a bunch of additional libraries. Here we’d like to turn to the TK-GUI stack to save time:

sudo apt-get install tcl-dev tk-dev python-tk python3-tk

The actual integration is simple – the only important thing is that use () must be called before the pyplot module is loaded:

import matplotlib
matplotlib.use('tkagg')
import matplotlib.pyplot as plt

labels = 'A', 'B', 'C', 'D'
. . .
plt.show()

Python tutorial - Fig. 9: The pie chart will appear on the screen
Fig. 9: The pie chart will appear on the screen

Python tutorial – Conclusion

People converting from C to Java, from Java to C or from C to C#, by and large find something they’re familiar with. Python dances out of line to the extent that a few unusual things emerge both in terms of syntax and in terms of language scope: Much of what is available in other programming languages in the form of libraries is a part of the language here.

The comparatively chaotic continued development ensures that even rather exotic programming concepts are quickly absorbed by the community. As a result, the product plays the role of a melting pot, mixing old and new ideas.

The hysteria surrounding the departure of Guido van Rossum is quite unnecessary in my view: Python was already a mess before and will not suddenly collapse in the chaos. Today, we can only reasonably discuss the question of whether you should immerse yourself in the madness. There are situations in which libraries save thousands of man-hours. Keep an open mind when you buy the textbook and by all means give the environment a chance – believe me, you will not regret the effort.

 

Links & literature

[1] https://www.python.org/dev/peps/pep-0008/

[2] https://stackoverflow.com/questions/1708510/python-list-vs-tuple-when-to-use-each

[3] https://pip.pypa.io/en/stable/user_guide/

Behind the Tracks