Interview Archives - ML Conference https://mlconference.ai/blog/interview/ The Conference for Machine Learning Innovation Mon, 27 Jan 2020 16:03:04 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 How Deep Learning helps protect honeybees https://mlconference.ai/blog/how-deep-learning-helps-protect-honeybees/ Tue, 19 Nov 2019 13:36:05 +0000 https://mlconference.ai/?p=14384 Honey bee colony assessment is usually carried out by manually counting and classifying comb cells. Thiago da Silva Alves explains in this interview how deep learning can help to accomplish this time-consuming and error-prone task.

The post How Deep Learning helps protect honeybees appeared first on ML Conference.

]]>
Editorial Team: You have developed a tool called DeepBee. What is it all about?

Using Machine Learning it is possible to deliver quality information about honey bee colonies to beekeepers and researchers.

Thiago da Silva Alves: Many research projects in the apidology area require a process called temporal assessment of honey bee colony strength, which often involves counting the number of comb cells with brood and food reserves multiple times a year. There are thousands of cells in each comb, which makes manual counting a time-consuming, tedious, and thereby an error-prone task.

Knowing this problem, we decided to automate this process using image processing techniques to automatically detect cells, and deep learning for the cells’ content classification.

Editorial Team: Your presentation at the Machine Learning Conference is called “Honey Bee Conservation using Deep Learning”. How can Machine Learning help with Honey Bee Conservation?

Thiago da Silva Alves: Using Machine Learning it is possible to deliver quality information about the colonies to beekeepers and researchers. With this information, they can have insights on what they can do to improve colony health.

For example, the tool we developed is able to count the amount of brood and food reserves in a comb image. If the beekeeper frequently extracts this information from his colony he can detect anomalies such as a low bee birth rate or an unexpected reduction in honey production. Having this information at hand, the beekeeper’s decision on colony health will be more efficient.

How machine learning can help prevent bee mortality

Editorial Team: Could you give an insight into the technologies used in DeepBee?

Thiago da Silva Alves: We started using Nvidia DIGITS + Caffe in our first classification tests, but quickly faced some limitations. Then we decided to use Keras, with a TensorFlow backend, for the implementation of our models. We did most of the images preprocessing using OpenCV and NumPy.

Editorial Team: Where did you see the biggest challenge in developing DeepBee?

Thiago da Silva Alves: The biggest challenge we encountered was collecting data and creating the datasets. It took us a few months before we had enough cells annotated to start developing the models.

Developing an algorithm to detect different cell types was also a big challenge for us. The aggravating factor, in this case, is the fact that it is not possible to easily see the edge of cells containing honey.

Editorial Team: How can machine learning help prevent bee mortality?

The biggest challenge we encountered was collecting data and creating the datasets.

Thiago da Silva Alves: It can help reduce bee mortality by giving the beekeepers more information about the strength of the colonies. This information is used to sharpen the beekeeper’s decision making and then improve bee health.

Editorial Team: What are the next steps? What are your plans for DeepBee?

Thiago da Silva Alves: We plan to make the tool even more user-friendly. We also believe it is possible to implement some features of DeepBee into a smartphone application.

Editorial Team: Thank you very much!

Questions by Hartmut Schlosser

The post How Deep Learning helps protect honeybees appeared first on ML Conference.

]]>
Generative Adversarial Networks: “GANs can create new ‘realities’ that never existed” https://mlconference.ai/blog/generative-adversarial-networks-gans-can-create-new-realities-that-never-existed/ Tue, 29 Oct 2019 09:05:09 +0000 https://mlconference.ai/?p=12745 Generative Adversarial Networks (GANs) have recently sparked an increasing amount of interest, as they can generate images of faces that look convincingly real. What else are they capable of, what risks could they pose in the long run, and what do they have in common with the emerging internet in the 1990’s? We interviewed ML Conference speaker Xander Steenbrugge.

The post Generative Adversarial Networks: “GANs can create new ‘realities’ that never existed” appeared first on ML Conference.

]]>
Editorial Team: In the abstract for your ML Conference session “Generative Media – an Emerging Industry”, you wrote that one of the most beautiful ideas in the Deep Learning Revolution of the past decade was the invention of Generative Adversarial Networks (GANs). So, would you first explain what GANs are essentially?

A new industry of generative media will emerge over the next decade.

Xander Steenbrugge: GANs, or Generative Adversarial Networks, are a totally new approach to generative models, invented by Ian Goodfellow in 2014. In contrast to, say, classification models which classify images into categories, generative models can generate completely novel images – or any kind of data for that matter – by first learning what that data usually looks like from a dataset. This entire process usually happens completely unsupervised, i.e. without needing any labels.

Editorial Team: Could you explain how a GAN works in general? 

Xander Steenbrugge: The central idea is that a GAN has two neural networks that are adversaries of each other. On the one hand, the Generator tries to create an image that looks as real as possible. The second model, the Discriminator, gets to see an image that could come from two sources: it’s either a real image from an actual dataset, or a fake image coming from the Generator. It then has to learn to see the difference and this learning signal is used to improve both the Generator and the Discriminator. If one can keep these two models balanced, the end result is a Generator that can generate images which look very similar to the actual dataset that was used during training.

 

Generated images – and how we can spot them

Editorial Team: Which different types of media can GANs deal with, and are they better suited for a certain type?

Xander Steenbrugge: The most impressive results from using GANs have been demonstrated in the image domain. The reason is that convolutional networks are simply very, very good. However, the core idea behind GANs can, in principle, be applied to many different data types such as audio or text.

Editorial Team: Some telltale signs show that an image of a person’s face is artificially generated, e.g. artifacts or mismatched earrings. As GANs continue to improve, do you expect these signs that are visible to the human eye will vanish? And if so, will there be other methods to ascertain whether an image has been created by a GAN?

In the long run, everybody will learn to understand that any type of media can now be “faked”.

Xander Steenbrugge: Looking at the quality improvements in generated images over the past five years, I am very certain that very soon we’ll be able to generate images indistinguishable from real ones in specific narrow domains such as faces or cars. Approximating the entire natural manifold of possible images, however, might prove to be a much more challenging task, as GAN results on a full ImageNet dataset look a lot worse than when trained on just faces.

At the same time, I’m very confident that detecting generated images will be doable with very similar types of models. You could, for example, take the discriminator and use it as a “fake detection” filter. The bigger challenge, though, will be to educate the general public that seeing a video of something no longer means it actually happened.

 

Future implications of GANs

Editorial Team: In what respect could GANs have a negative impact in the long run?

Xander Steenbrugge: A new industry of generative media will emerge over the next decade. I can foresee applications in the movie industry (licensing an actor’s face), Virtual Reality (avatars that look like their users), design, art, etc.

I believe that, in the long run, everybody will learn to understand that any type of media can now be “faked”. When you get a letter today saying that America is going to nuke China tomorrow, signed by Barack Obama, you won’t believe that’s true because you know anybody could have written that letter. A couple of years from now, everybody will have that same intuition for an HD video of Obama saying the same thing. The problem is that many people currently don’t have that intuition yet, and therein lies the biggest risk.

Editorial Team: What do you believe the positive effects of GANs will be?

The true leap will come when we can query a generative model for a very specific output.

Xander Steenbrugge: This is very hard to predict because it’s such a broad concept. It’s like asking “what will the positive effects of the internet be?” in the late 1990’s. I believe that generative models have a very big future. In essence, these models can learn what the world is like by looking at data and then create new “realities” that never existed. Currently, most GANs only allow for creating random data samples. The true leap will come when we can query a generative model for a very specific output like “Generate an image of what my living room would look like if I bought this IKEA sofa and painted the east wall in this shade of orange.” These are called conditional samples and we are making fast progress towards this as well. In the end, I believe that generative models will become embedded in all our wearables, TV screens, smartphones, and more, and will give us a personalized lens by which to look at the world around us. Is that good or bad? I don’t think that’s the right question, as in my view technology itself is neutral. What you do with it is everyone’s personal choice.

Editorial Team: Thank you for the interview!

Questions by Maika Möbus

 

The post Generative Adversarial Networks: “GANs can create new ‘realities’ that never existed” appeared first on ML Conference.

]]>
Neural networks with PyTorch https://mlconference.ai/blog/neural-networks-with-pytorch/ Tue, 06 Aug 2019 13:27:06 +0000 https://mlconference.ai/?p=11787 PyTorch is currently one of the most popular frameworks for the development and training of neural networks. It is characterized above all by its high flexibility and the ability to use standard Python debuggers. And you don’t have to compromise on the training performance. 

The post Neural networks with PyTorch appeared first on ML Conference.

]]>

Development, training and deployment of neural networks

Because of the features mentioned above, PyTorch is popular above all with deep learning researchers and Natural Language Processing (NLP) developers. Significant innovations were also introduced in the last version, the first official release 1.0, in the area of integration and deployment as well.

Tensors

The elementary data structure for representing and processing data in PyTorch is torch.Tensor, The mathematical term tensor stands for a generalization of vectors and matrices. Tensors in the form of multidimensional arrays are implemented in PyTorch. Here a vector is nothing more than a one-dimensional tensor (or a tensor with rank 1) the elements of which can be numbers of a certain data type (such as torch.float64 or torch.int32) could be. A matrix is thus a two-dimensional tensor (rank 2) and a scalar is a zero-dimensional tensor (rank 0). Tensors of even higher dimensions do not have any special names (Fig. 1).

 

Figure 1: Tensors

 

The interface for PyTorch tensors strongly relies on the design of multidimensional arrays in NumPy. Like NumPy, PyTorch provides predefined methods which can be used to manipulate tensors and perform linear algebra operations. Some examples are shown in Listing 1.

 

# Generation of a one-dimensional tensor with 
# 8 (uninitialized) elements (float32)
x = torch.Tensor(8)
x.double()  # Conversion to float64 tensor
x.int()     # Conversion to int32 data type

# 2D long tensor preinitialized with zeros
x = torch.zeros([2, 2])

# 2D long tensor preinitialized with ones 
# and subsequent conversion to int64
y = torch.ones([2, 3]).long()

# Merge two tensors along dimension 1
x = torch.cat([x, y], 1)  

x.sum()  # Sum of all elements
x.mean() # Average of all elements

# Matrix multiplication
x.mm(y)

# Transpose
x.t()

# Inner product of two tensors
torch.dot(x, y) 

# Calculates intrinsic values and vectors
torch.eig (x)

# Returns tensor with the sine of the elements
torch.sin(x)

 

The use of optimized libraries such as BLAS, LAPACK and MKL allows for high-performance execution of tensor operations on the CPU (especially with Intel processors). In addition, PyTorch (unlike NumPy) also supports the execution of operations on NVIDIA graphic cards using the CUDA toolkit and the CuDNN library. Listing 2 shows an example of how to move tensor objects to the memory of the graphic card to perform optimized tensor operations there.

 


Listing 2
# 1D Tensors
x = torch.ones(1)
y = torch.zeros(1)

# Move tensors to the GPU memory
x = x.cuda()
y = y.cuda()

# or:
device = torch.device("cuda")
x = x.to(device)
y = y.to(device)

# The addition operation is now performed on the GPU
x + y  # like torch.add(x, y)
# Copy back to the CPU
x = x.cpu()
y = y.cpu()

 

Since NumPy arrays are more or less considered to be standard data structures in the Python data science community, frequent conversion from PyTorch to NumPy and back is necessary in practice. These conversions can be done easily and efficiently (Listing 3) because the same memory area is shared and no copying of memory content is required.

 


Listing 3
# Conversion to NumPy
x = x.numpy()

# Conversion back as PyTorch tensor
y = torch.from_numpy(x)
# y now points to the same memory area as x
# a change of y changes x at the same time

 

Stay tuned!
Learn more about ML Conference:

 

Network Modules

The torch.nn library contains many tools and predefined modules for generating neural network architectures. In practice, you define your own networks by deriving the abstract torch.nn.Module class, Listing 4 shows the implementation of a simple feed-forward network with a hidden layer and one tanh activation listed.

 


Listing 4
import torch
import torch.nn as nn

class Net(nn.Module):

  def __init__(self, input_dim, hidden_dim, output_dim):
    super(Net, self).__init__()
    # Here you create instances of all submodules of the network

    self.fc1 = nn.Linear(input_dim, hidden_dim) 
    self.act1 = nn.Tanh()
    self.fc2 = nn.Linear(hidden_dim, output_dim)

  def forward(self, x):
    # Here you define the forward sequence
    # torch.autograd dynamically generates a graph on each run

    x = self.fc1(x)
    x = self.act1(x)
    x = self.fc2(x)
    return x

 

In the process, a network class of the abstract .Module class is derived. The __init__() and forward() methods must be defined at the same time. In __init __ (), we need to instantiate and initiate all the required elements that make up the entire network. In our case, we generate three elements:

 

    1. fc1 – using nn.Linear (input_dim, hidden_dim) we generate a fully connected layer with an input dimension of input_dim and an output dimension of hidden_dim

 

    1. act1 – a Tanh activation function

 

  1. fc2 – another fully connected layer with an input dimension of hidden_dim and an output dimension of output_dim.

 

The sequence in __init() __ basically does not matter, but for stylistic reasons you should generate them in the order in which they are called in the forward() method. The sequence in the forward() method is decisive for processing – this is where you determine the sequences of the forward run. At this point, you can even build in all kinds of conditional queries and branches, since a calculation graph is dynamically generated on each run (Listing 5). This is useful if for example you want to work with varying batch sizes or experiment with complex branches. In particular, the processing of sequences of different lengths as input – as is often the case with many NLP problems – is much easier to realize with dynamic graphs than with static ones.

 


Listing 5
class Net(nn.Module): 
  ...
  def forward(self, x, a, b):

    x = self.fc1(x)

    # Conditional application of the activation function
    if a > b:
      x = self.act1(x)

    x = self.fc2(x)
    return x

 

“Autograd” and dynamic graphs

PyTorch uses the torch.autograd package to dynamically generate a directed acyclic graph (DAG) on each forward run. In contrast, in the case of static generation, the graph is completely constructed initially and is then no longer changed. The static graph is filled and executed at each iteration with the new data. Dynamic graphs have some advantages in terms of flexibility, as I had already explained in the previous section. The disadvantages concern optimization capabilities, distributed (parallel) training and deployment of the models.
Through the definition of the forward path, torch.autograd generates a graph; the nodes of the graph represent the tensors and the edges represent the elementary tensor operations. With the help of this information, the gradients of all tensors can be determined automatically at runtime and thus back propagation can be carried out efficiently. An example graph is shown in Figure 2.

 

Figure 2: Example of a DAG generated using “torch.autograd”

 

Debugger

The biggest advantage in the implementation of dynamic graphs rather than static graphs is the possibility of debugging. Within the forward () method, you can make any printout or set breakpoints, which in turn can be analyzed, for example with the help of the pdb standard debugger. This feature is not readily available with static graphs, because you do not have direct access to the objects of the network at runtime.

 

Training

The torchvision package contains many useful tools, pre-trained models and datasets for image processing. In Listing 6, the FashionMNIST dataset is loaded. It consists of a training and validation dataset containing 60,000 or 10,000 icons from the fashion industry.

 


Listing 6
transform = transforms.Compose([transforms.ToTensor()])
# more examples of transformations:
# transforms.RandomSizedCrop()
# transforms.RandomHorizontalFlip()
# transforms.Normalize()

# Download and load the training data set (60,000)
trainset = datasets.FashionMNIST('./FashionMNIST/', download=True, train=True, transform=transform)  # Object from torch.utils.data.class dataset
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=4)

# Download and load the validation dataset (10,000)
validset = datasets.FashionMNIST('./FashionMNIST/', download=True, train=False, transform=transform)  # Object from torch.utils.data.class dataset
validloader = DataLoader(validset, batch_size=batch_size, shuffle=True, num_workers=4)

 

Figure 3: Examples from the “FashionMNIST” dataset

 

The icons are grayscale images of 28×28 pixels divided into ten classes (0-9): 0. T-Shirt, 1. Trousers, 2. Sweater, 3. Dress, 4. Coat, 5. Sandals, 6. Shirt, 7. Sneakers, 8. Bag, 9. Ankle Boot (Fig. 3). The dataset class represents a dataset that can be partitioned arbitrarily and applied to various transformations. In this example, the NumPy arrays are converted to Torch tensors. In addition though, quite a few other transformations are offered to augment and normalize the data (such as snippets, rotations, reflections, etc.). Data Loader is an iterator class that generates individual batches of the dataset and loads them into memory, so you do not have to completely load large data sets. Optionally, you can choose whether you want to start multiple threads (num_workers) or whether the dataset should be remixed before each epoch (shuffle).
In Listing 7, we first generate an instance of our model and transfer the entire graph to the GPU. PyTorch offers various loss functions and optimization algorithms. For a multi-label classification problem, CrossEntropyLoss () can be for example chosen as a loss function, and Stochastic Gradient Descent (SGD) as the optimization algorithm. The parameters of the network that should be optimized are transferred to the SGD() method. An optional parameter is the learning rate (lr).

 


Listing 7
import torch.optim as optim
# Use the GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define model
input_dim = 784
hidden_dim = 100
output_dim = 10
model = Net(input_dim, hidden_dim, output_dim)
model.to(device)  # move all elements of the graph to the current device

# Define optimizer algorithm and associate with model parameters
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Define loss function: CrossEntropy for classification
loss_function = nn.CrossEntropyLoss()

 

The train() function in Listing 8 performs a training iteration. At the beginning, all gradients of the network graph are reset (zero_grad()). A forward run through the graph is performed afterwards. The loss value is determined by comparing the network output and the label tensor. The gradients are calculated using backward() and finally, the weights of the network are updated through back propagation using optimizer.step(). The valid() validation iteration is a variant of the training iteration, and all back propagation work steps are omitted in the process.

 


Listing 8
# Training a batch
def train(model, images, label, train=True):
  if train:
    model.zero_grad() # Reset the gradients

  x_out = model(images)
  loss = loss_function(x_out, label)  # Determine loss value

  if train:
    loss.backward()  # Calculate all gradients
    optimizer.step() # Update the weights
  return loss

# Validation: Only forward run without back propagation
def valid(model, images, label):
  return train(model, images, label, train=False)

 

The full iteration over multiple epochs is shown in Listing 9. For the application of the Net() feed forward model, the icon tensors with the dimensions (batch_size, 1, 28, 28) must be transformed to (batch_size, 784). The call of train_loop() should thus be executed with the ‘flatten’ argument:

train_loop(model, trainloader, validloader, 10, 200, ‘flatten’).

 


Listing 9
import numpy as np

def train_loop(model, trainloader, validloader=None, num_epochs = 20, print_every = 200, input_mode='flatten', save_checkpoints=False):
  for epoch in range(num_epochs):

    # Training loop
    train_losses = []
    for i, (images, labels) in enumerate(trainloader):
      images = images.to(device)
      if input_mode == 'flatten':
        images = images.view(images.size(0), -1)  # flattening of the Image
      elif input_mode == 'sequence':
        images = images.view(images.size(0), 28, 28)  # Sequence of 28 elements with 28 features

      labels = labels.to(device)
      loss = train(model, images, labels)
      train_losses.append(loss.item())
      if (i+1) % print_every == 0:
        print('Training', epoch+1, i+1, loss.item())

    if validloader is None:
      continue

    # Validation loop
    val_losses = []
    for i, (images, labels) in enumerate(validloader):
      images = images.to(device)
      if input_mode == 'flatten':
        images = images.view(images.size(0), -1)  # flattening of the Image
      elif input_mode == 'sequence':
        images = images.view(images.size(0), 28, 28)  # Sequence of 28 elements with 28 features
      labels = labels.to(device)
      loss = valid(model, images, labels)
      val_losses.append(loss.item())
      if (i+1) % print_every == 0:
        print('Validation', epoch+1, i+1, loss.item())

    print('--- Epoch, Train-Loss, Valid-Loss:', epoch, np.mean(train_losses), np.mean(val_losses))
        
    if save_checkpoints:
      model_filename = 'checkpoint_ep'+str(epoch+1)+'.pth'
      torch.save({
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
      },  model_filename)

 

Save and load trained weights

To be able to use the models later for inference in an application, it is possible to save the trained weights in the form of serialized Python Dictionary objects. The Python package pickle is used for this. If you want to continue training the model later, you should also save the last state of the optimizer. Listing 9 stores the model weights and current state of the optimizer after each epoch. Listing 0 shows how one of these pickle files can be loaded.

 


Listing 10
model = Net(input_dim, hidden_dim, output_dim)

checkpoint = torch.load('checkpoint_ep2.pth')
model.load_state_dict(checkpoint['model_state_dict'])

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

 

Network Modules

PyTorch offers many more predefined modules for building Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or even more complex architectures such as encoder-decoder systems. The Net() model could for example be extended with a dropout layer (Listing 11).

 


Listing 11
class Net(nn.Module):

  def __init__(self, input_dim, hidden_dim, output_dim):
    super(Net, self).__init__()

    self.fc1 = nn.Linear(input_dim, hidden_dim)
    self.dropout = nn.Dropout(0.5) # Dropout layer with probability 50 percent
    self.act1 = nn.Tanh()
    self.fc2 = nn.Linear(hidden_dim, output_dim)

  def forward(self, x):        
    x = self.fc1(x)
    x = self.dropout(x) # Dropout after the first FC layer
    x = self.act1(x)
    x = self.fc2(x)

    return x

 

Listing 12 shows an example of a CNN consisting of two Convolutional Layers with Batch Normalization, each with ReLU activation and a Max Pooling Layer. The training call could look like this:

model = CNN(10).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
train_loop(model, trainloader, validloader, 10, 200, None)

 


Listing 12
class CNN(nn.Module):

  def __init__(self, num_classes=10):
    super(CNN, self).__init__()

    self.layer1 = nn.Sequential(
      nn.Conv2d(1, 16, kernel_size=5, padding=2),
      nn.ReLU(),
      nn.MaxPool2d(2)
    )
    self.layer2 = nn.Sequential(
      nn.Conv2d(16, 32, kernel_size=5, padding=2),
      nn.ReLU(), 
      nn.MaxPool2d(2))
    self.fc = nn.Linear(7*7*32, 10)

  def forward(self, x):
    out = self.layer1(x)
    out = self.layer2(out)
    out = out.view(out.size(0), -1)  # Flattening for FC input
    out = self.fc(out)
    return out

</code>

&nbsp;



An example of an LSTM network which has been optimized using the Adam Optimizer is shown in Listing 13. The pixels of the images from the <em>FashionMNIST</em> dataset are interpreted as sequences of 28 elements, each with 28 features and preprocessed accordingly.


&nbsp;

[code title="Listing 13"]

Listing 13
# Recurrent Neural Network
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers, num_classes):
    super(RNN, self).__init__()
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
    self.fc = nn.Linear(hidden_size, num_classes)

  def forward(self, x):
    # Initialize Hidden and Cell States
    h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
    c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

    out, _ = self.lstm(x, (h0, c0))

    out = self.fc(out[:, -1, :]) # last hidden state
    return out

sequence_length = 28
input_size = 28
hidden_size = 128
num_layers = 1

model = LSTM(input_size, hidden_size, num_layers, output_dim).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
train_loop(model, trainloader, validloader, 10, 200, 'sequence')

</code>

&nbsp;



The torchvision package also allows you to load known architectures or even pre-trained models that you can use as a basis for your own applications or for transfer learning. For example, a pre-trained VGG model with 19 layers can be loaded as follows:




<ol>from torchvision import models</ol>




<ol>vgg = models.vgg19(pretrained=True)</ol>



&nbsp;



<h2><strong>Deployment</strong></h2>





The integration of PyTorch models into applications has always been a challenge, as the opportunities to use the trained models in production systems had been relatively limited. One commonly used method is the development of a REST service, using <a href="http://flask.pocoo.org">flask</a>, for example. This REST service can run locally or within a Docker image in the cloud. The three major providers of cloud services (AWS, GCE, Azure) now also offer predefined configurations with PyTorch.
An alternative is conversion to the <a href="https://onnx.ai">ONNX format</a>. ONNX (Open Neural Network Exchange Format) is an open format for the exchange of neural network models, which are also supported by <a href="https://mxnet.apache.org">MxNet</a> and <a href="https://caffe.berkeleyvision.org">Caffe</a>, for example. These are machine learning frameworks that are used productively by Amazon and Facebook. Listing 14 shows an example of how to export a trained model to the ONNX format.


&nbsp;

[code title="Listing"]

Listing 14
model = CNN(output_dim)

# Any input sensor for tracing
dummy_input = torch.randn(1, 1, 28, 28)

# Conversion to ONNX is done by tracing a dummy input
torch.onnx.export(model, dummy_input, "onnx_model_name.onnx")

 

TorchScript and C++

As of version 1.0, PyTorch has also been offering the possibility to save models in LLVM-IR format. This can be done completely independently of Python. The tool for this is TorchScript, which implements its own JIT compiler and special optimizations (static data types, optimized implementation of tensor operations).

 


Listing 15
# Any input sensor for tracing
dummy_input = torch.randn(1, 1, 28, 28)

traced_model = torch.jit.trace(model, dummy_input)
traced_model.save('jit_traced_model.pth')

 

You can create the TorchScript format in two ways. For one, by tracing an existing PyTorch model (Listing 15) or through direct implementation as a script module (Listing 16). In script mode, an optimized static graph is generated. This not only offers the advantages for deployment mentioned earlier, but could, also be used for distributed training, for example.

 


Listing 16
from torch.jit import trace

class Net_script(torch.jit.ScriptModule):

  def __init__(self, input_dim, hidden_dim, output_dim):
    super(Net_script, self).__init__()

    self.fc1 = trace(nn.Linear(input_dim, hidden_dim), torch.randn(1, 784)) 
    self.fc2 = trace(nn.Linear(hidden_dim, output_dim), torch.randn(1, 100))

  @torch.jit.script_method
  def forward(self, x):        
    x = self.fc1(x)
    x = torch.tanh(x)
    x = self.fc2(x)
    
    return x

model = Net_script(input_dim, hidden_dim, output_dim)

model.save('jit_model.pth')

 

The TorchScript model can now be integrated into any C ++ application using the C ++ front-end library (LibTorch). This enables high-performance execution of the inference independent of Python and in many different production environments, such as on mobile devices.

 

Conclusion

With PyTorch, you can efficiently and elegantly develop and train both simple and very complex neural networks. By implementing dynamic graphs, you can experiment with very flexible architectures and use standard debugging tools with no problem. The seamless connection to Python allows for speedy development of prototypes. These features currently make PyTorch the most popular framework for researchers and experiment-happy developers. The latest version also provides the ability to integrate PyTorch models into C ++ applications to achieve better integration in production systems. This is significant progress when compared to the earlier versions. However, other frameworks, especially TensorFlow still have a clear lead in this category. With TF Extended (TFX), TF Serving, and TF Lite, the Google framework provides much more application-friendly and robust tools for creating production-ready models. It will be interesting to see what new developments in this area we will see from PyTorch.

 

The post Neural networks with PyTorch appeared first on ML Conference.

]]>
Too many ideas, too little data – Overcome the cold start problem https://mlconference.ai/blog/many-ideas-little-data-overcome-cold-start-problem/ Mon, 12 Nov 2018 15:03:56 +0000 https://mlconference.ai/?p=10137 The cold start problem affects both startups as well as established companies. Nonetheless, it also provides a great opportunity to collect new data with your customer’s problem in focus. How do you solve the cold start problem and arrive at a useful data pipeline? We talked to ML Conference speakers Markus Nutz and Thomas Pawlitzki about all this and more.

The post Too many ideas, too little data – Overcome the cold start problem appeared first on ML Conference.

]]>
Data scientists and product owners have a lot of great ideas. But often these ideas are missing data to answer the given questions and build a solution around them. We talked to ML Conference speakers Markus Nutz and Thomas Pawlitzki about how to build a data pipeline starting from “zero data”.

Find out how to solve the cold start problem!

JAXenter: Databases need maintenance, we know that. But over the years impenetrable data thickets have grown in many companies. In your session you talk about unraveling the chaos, but where do you start?  

Markus Nutz: Fortunately, Freeyou hasn’t been around for that long, so we’ve been able to keep track of everything so far. The answer is probably pretty boring: documentation. Documentation includes all involved parties, which means that the requirements of the product owners, data scientists and data architects all have equal status.  We are aware that the data is our basis for differentiating ourselves from other insurers.

Thomas Pawlitzki: I have nothing more to add to this. Our own database is still controllable. The development team talks a lot about features and changes so that the individual team members are aware of database changes. You don’t have to explain anything to data gurus like Markus.

In the last few weeks, I have also looked at various frameworks that we can use in the development of our API. Some of them already offer features for data migration. For example, there you can store schema changes in relational databases as code and apply them, but also perform a rollback. Perhaps we will soon use such solutions to test the whole thing in its early stages.

Stay tuned!
Learn more about ML Conference:

 

JAXenter: How can we solve the “cold start problem”?

Markus Nutz: In general, keep your eyes open to see where and what kind of data is available. Statistics about traffic accidents, for example, are often available in small inquiries in the state parliament. This was quite surprising to me. Pictures for a first image classifier are available online. Customer inquiries arise all by themselves!

Thomas Pawlitzki: You should also consider when it makes sense to create your own model or which “ready-made” model to take. For example, we also use an API for image recognition. These APIs are very easy to integrate and do a really good job with general problems. We’d rather put our energy into providing solutions to problems which general APIs can’t solve. We still have very little data here. Fortunately Markus knows enough tricks to polish small data sets and still come up with usable models.

Markus Nutz: Data augmentation, e.g. changing images, inserting spelling mistakes into words, translating mails into English and back again, window slicing at Time Series Data – these are all strategies that make the existing “few” data as efficient as possible to use! When it comes to models, regarding images and text transfer learning or course, we are particularly interested in Tensorflow Hub. Its a library from Google for reusable Machine Learning modules.

In general, we also pay attention to using suitable models for our existing data, which don’t require the largest amounts of data to function well. Logistic regression or random forests are simply super!

JAXenter: In connection with the construction of a data pipeline you speak of “zero data”. Please give us a concrete example.

Markus Nutz: Oh, that was misleadingly described then. We have chosen “Zero Data” because data – now it’s getting trite – exists everywhere around us and also available to us. We can evaluate initial ideas with data sets from Kaggle or the relatively new Google Dataset Search, and reference official statistics using Openstreetmap data. The simply unbelievably detailed data allows us, for example, to estimate the risk of vandalism for bicycles and cars based on a location or find a good route from bicycle dealer to bicycle dealer for our distribution. It’s a free lunch, so to say.

Thomas Pawlitzki: Yes, that was really surprising and enjoyable when we encountered a problem with theft in a workshop on our bike insurance. We had briefly considered how we could access a good database and whether we could address the various police stations or not. However, a 5-minute search on the net has shown that (at least for the location we examined) a daily newspaper offers up-to-date data. We were surprised and of course very happy about that.

JAXenter: How do you maintain your data pipeline?

Markus Nutz: Phew! I’d like to have a good answer to that, but we don’t have a good recipe yet. I’d say: testing. What helps in any case is that we, as an organization, have a common understanding. Data is what enables us to offer a better product that can distinguish us from the market. That’s why we’re all very motivated to make this happen!

Thomas Pawlitzki:  Yes, sometimes we are a bit “casual” and there’s still room for improvement. Nevertheless, the whole thing works surprisingly well, probably due to the great commitment of all the team members.

Thank you very much!


Markus Nutz and Thomas Pawlitzki will be delivering a talk at ML Conference in Berlin on Wednesday, December 5 about their experience with the cold start problem and building a data pipeline. Starting from “zero data”, how do they arrive to a data pipeline with open, found and collected data? Their data pipeline enables building data products that help customers in their daily life.


 

Carina Schipper has been an editor at Java Magazine, Business Technology and JAXenter since 2017. She studied German and European Ethnology at the Julius-Maximilians-University Würzburg.

 

 

 

 

 

The post Too many ideas, too little data – Overcome the cold start problem appeared first on ML Conference.

]]>
“Designing proper data collection today improves the quality of ML outcomes tomorrow” https://mlconference.ai/blog/designing-proper-data-collection-today-improves-quality-ml-outcomes-tomorrow/ Fri, 09 Nov 2018 13:24:09 +0000 https://mlconference.ai/?p=10118 Machine learning may have all sorts of use cases, but forecasting? In honor of the upcoming ML Conference, we talked to Philipp Beer about how data scientists can utilize ML in statistical forecasting. We talk about the advantages and disadvantages of modern vs. classical methods, how can one decide between the two, and where should they turn when they need good predictions for their business KPIs.

The post “Designing proper data collection today improves the quality of ML outcomes tomorrow” appeared first on ML Conference.

]]>
JAXenter: Classical statistical forecasting might be used in many many businesses. Please give us an example where it’s still used and it’s using still makes sense.

Philipp Beer:  Brad Efron once said, “Those who ignore Statistics are condemned to reinvent it.”

Machine learning algorithms, with proper design, can have great power of generalization. Together with feature selection, feature engineering and encoding are key steps that can lead to algorithms usable for different dataset.

On the other side, statistical models require a lot of effort for tuning parameters and finding optimal models. Therefore, they are found where models are well understood and are driven by theory. Additionally, statistics also generate reasonable results where the available data volume is not large enough for machine learning. Consequently, their needs are also limited in terms of computational power. That same thriftiness cannot be attributed to machine learning algorithms.

Stay tuned!
Learn more about ML Conference:

JAXenter: You’re talking about “data hungry” machine learning algorithms. Forecasting in times of ML needs a lot of data. Where is this data coming from? Who collects it?

Philipp Beer: You’re talking about “data hungry” machine learning algorithms. Forecasting in times of ML needs a lot of data. Where is this data coming from? Who collects it? Data needs to come from necessities. Often organizations are asking themselves, how they can collect more information to harness the power of machine learning. From my point of view this approach will not yield good results.

A more insightful approach is to ask good questions. Which kind of data do you need to tackle your most pressing tasks at hand? Knowing the answer to that usually helps to identify where the data should be coming from. It also helps a great deal to understand that the data not necessarily needs to be generated in house. Third party data that can be licensed (e.g. commodity prices) or open data (e.g. weather, population change) may be the place to go to fulfill your business needs.

Having big data today is not a question anymore. Having the right data, however, is not always given. Designing proper data collection today improves the quality of outcomes tomorrow.

With this perspective an organization is in a good position to tackle all of the five W’s regarding the needed data.

JAXenter: How can one decide which method – classical statistics or machine learning – fits the own case?

Philipp Beer: Whatever yields the best results! An a priori guide can only give very general guidance.

To identify the right method, developers need to conduct a detailed exploration and analysis of the data. That will allow them to rule out certain methods and approaches and leave a smaller subset that may yield good results. This will be true for statistical and ML approaches.

In order to get a definitive answer, the remaining methods have to be compared in the results that they produce. In the case of a time-series, predictions and models need to compete side by side and their predictive power determined. If both of them give good results for prediction, there’s no need to choose; just use both of them.

JAXenter: Let’s make a different forecast. How long will it take for machine learning to take over the forecasting sector?

Philipp Beer: I don’t think that machine learning will supplant statistical methods, because both have complementary capabilities.

Machine learning will be equally important component in time-series forecasting in the next 2 – 3 years. The adoption will be driven by convenience and integration. Machine learning needs to become accessible for all interested stake holders – not only in time-series forecasting. As machine learning results becomes more seamlessly integrated into organizations, it will grow into a pillar of future development in all areas of our society.


Philipp Beer will be delivering a talk at ML Conference in Berlin on Wednesday, December 5 that goes over the advantages and disadvantages of modern vs. classical methods. How can one decide between the two and where should they turn when they need good predictions for their business KPIs.


Carina Schipper has been an editor at Java Magazine, Business Technology and JAXenter since 2017. She studied German and European Ethnology at the Julius-Maximilians-University Würzburg.

 

 

 

 

 

The post “Designing proper data collection today improves the quality of ML outcomes tomorrow” appeared first on ML Conference.

]]>
Man & Machines: The Dreamteam for your intelligent Marketing Strategy https://mlconference.ai/blog/man-machines-dreamteam-intelligent-marketing-strategy/ Thu, 27 Sep 2018 13:59:52 +0000 https://mlconference.ai/?p=9983 Machine learning enables customized conversations between man and machine that can result in buying decisions. We asked Tina Nord and Kathleen Jaedtke to explain how this can be achieved through the use of dialogue-oriented technologies. Let’s take a look at how communication between man and machines works.

The post Man & Machines: The Dreamteam for your intelligent Marketing Strategy appeared first on ML Conference.

]]>
JAXenter: Customer contact via a machine sounds exciting. What exactly do you mean by “artificial intelligence” in connection with marketing strategies? Are you talking about chatbots?

Tina and Kathleen: It’s about much more than chatbots. A simple chatbot is not necessarily based on Machine Learning (ML). A simple dialogue between man and machine can be programmed relatively uncomplicated. ML only becomes relevant when a bot or intelligent assistant is supposed to process complex speech or text input of the human counterpart. And even that is only one of many use cases. Those who want to deal with Artificial Intelligence (AI) and marketing should first deal with the processing and generation of natural language (NLP & NLG) as well as with machine vision.

Subforms of machine learning as processing and generation of natural language have the potential to speed up or simplify work processes.

These subforms of machine learning influence e.g. the search behavior and the expectations of users or enable new, intuitive and faster types of dialogue. They also have the potential to speed up or simplify work processes. This can be, for example, the automated creation and translation of texts or the provision of automatically pre-sorted images that correspond to a certain corporate identity. In this way, AI can contribute to achieving overarching strategic goals – such as cost leadership or differentiation.

JAXenter: What does a source of inspiration look like to you? Please use a short example to explain what this can look like.

Tina and Kathleen: A source of inspiration for us are the users. Their feedback is essential when it comes to the use of new innovative technologies. Intelligent assistants or robots are a great thing, but they only have real added value if they simplify users’ lives. A simple example is text search: typing in endless columns of words in search engines is usually time-consuming and leads to an endless series of search results, but not quickly and easily to the desired information. Only through the use of machine learning do search results become more personally relevant. Moreover, through language search, we no longer have to type the search term and the spelling becomes irrelevant when pronouncing the search word.

The visual search even reduces finding visual inspiration to a click of the camera trigger. All three examples are based on ML and speed up and simplify finding. New technology makes it easier for us to search the Internet and replaces the tedious text search. Conclusion: Machine learning only has a right to exist if there is true added value for the user.

Stay tuned!
Learn more about ML Conference:

JAXenter: What do dialogue-oriented technologies look like in practice and what’s under their hood?

Tina and Kathleen: The best-known example of a dialogue-oriented technology is probably Google Duplex. A function of the Google Assistant that arranges appointments with a human voice or books a table in a favorite restaurant. However, such advanced features are usually not available in practice. More likely is the use of so-called Google Actions or Alexa Skills. These often do not go beyond functions such as weather or news queries.

The development of such Conversational Interfaces is (still) complex and what is under the hood can be better explained by a software engineer. However, numerous companies are working to change this. In the future, everyone will be able to create new skills or actions and make them available to users with just a few clicks.

JAXenter: Where is the journey headed? Let’s see this from the customer’s perspective. Will we be communicating with a machine via voice or text input in the future?

Tina and Kathleen: For us and for many other experts it is clear that the near future is called “voice first”. Very soon we will be talking not only with our smartphone, but also with the fridge or the washing machine. Our environment will be our dialogue partner, regardless of whether we are in our own four walls or, for example, at the train station.

We will touch fewer things and navigate with gestures or speech instead. Language barriers will disappear through real-time translations. In addition, it can already be observed that machine vision is increasingly being combined with language functions. The relaunch of Google Glasses or the voice-activated Selfie filter from Snapchat Lens, for example, speak for themselves. So the future is voice and visual first.

Thank you!

 

Carina Schipper has been an editor at Java Magazine, Business Technology and JAXenter since 2017. She studied German and European Ethnology at the Julius-Maximilians-University Würzburg.

 

 

 

 

 

 

 

The post Man & Machines: The Dreamteam for your intelligent Marketing Strategy appeared first on ML Conference.

]]>
Find the outlier: Detecting sales fraud with machine learning https://mlconference.ai/blog/find-outlier-detecting-sales-fraud-machine-learning/ Wed, 06 Jun 2018 14:18:21 +0000 https://mlconference.ai/?p=9520 We spoke to data expert Canburak Tümer about how machine learning is being used to detect fraud in sales transactions. Find out how ML technology is helping to keep this tricky job under control and what it looks for when crunching the data.

The post Find the outlier: Detecting sales fraud with machine learning appeared first on ML Conference.

]]>
JAXenter: Hello Canburak! Your session at the Machine Learning Conference is titled Anomaly detection in sales point transactions. What does this mean? Do you have an example?

Canburak Tümer: Let me first define what I mean with sales point. Sales points are the locations where Turkcell Superonline gathers new subscribers. They can be a shop belonging to Turkcell, they can be a franchise, or sometimes they can be a booth in an event. Anomaly in sales usually occurs in the numbers of new subscriptions; if a shop usually sells x subscriptions in a day and suddenly in a new day it sells twice as many units, there is an anomaly and it may lead to a fraud. We report this anomaly to revenue assurance teams to investigate.

The other type of anomaly is between different shops. We are expecting to have similar numbers between the same type of shop in same towns, but there can be outliers. These outliers should be investigated for any potential fraud. So an anomaly in the sales may mean a fraudulent action.

JAXenter: What parameters do you look for when looking for an anomaly?

Canburak Tümer: Our main parameter is the number of new subscriptions in different intervals (daily, weekly, monthly, 6 months), supported by the town and sales point type information. But in further research we will add, we will also look at cancellation numbers of these new subscriptions, complaint numbers, and average churn tenure.

JAXenter: How can outlying sales points be identified?

Canburak Tümer: For detecting the outlying shop in a town, we are now using the interquartile range method. This is a basic and trusted method to detect outliers in a set of records. Also, we are evaluating the hierarchical clustering method by choosing a good cut off point. Hierarchical clustering can help us to detect non-normal point in the data.

JAXenter: Why is it more complex to find outlier sales points? What is necessary for this?

Canburak Tümer: For a single sales point, it is easier to detect the trend, then predict the sales for the next time interval and check if the data belongs in the predicted value. But when it comes to comparing different sales points, new features come into the stage. First of all, location and population of the location affect the sales.

Then, the type of the sales point: an online sale or telesales point cannot be compared to a local shop. As the number of features increases, model complexity increases along with it. In order to keep things simple, we group the sales point by the location and type then use the simple methods to detect outliers.

JAXenter: Thank you!

 

 

Carina Schipper has been an editor at Java Magazine, Business Technology and JAXenter since 2017. She studied German and European Ethnology at the Julius-Maximilians-University Würzburg.

 

 

 

 

 

 

The post Find the outlier: Detecting sales fraud with machine learning appeared first on ML Conference.

]]>
Preparing Text Input for Machine Learning https://mlconference.ai/blog/peparing-text-input-machine-learning/ Tue, 15 May 2018 12:39:24 +0000 https://mlconference.ai/?p=9471 ML Conference-Speaker Christoph Henkelmann says machine learning is basically nothing more than a numbers game. We’ve taken a closer look at what he means by that and and asked him to explain the principles of word processing from the point of view of a machine in more detail.

The post Preparing Text Input for Machine Learning appeared first on ML Conference.

]]>
JAXenter: What is the difference between image and text from a machine’s point of view?

Christoph Henkelmann: Almost all ML methods, especially neural networks, want tensors (multidimensional arrays of numbers) as input. In case of an image the transformation is obvious, we already have a three-dimensional array of pixels (width x height x color channel), i.e. except for smaller pre-processing the image is already “bite-sized”. There is no obvious representation for text. Text and words exist at a higher level of meaning, for example, if you simply enter Unicode-encoded letters as numbers in the net, the jump from coding to semantics is too “high”. We also expect systems that work with text to perform semantically more demanding tasks. If a machine recognizes a cat in an image, that’s impressive. But it is not impressive if a machine detects the word “cat” in a sentence.

JAXenter: Why do problems arise concerning Unicode normalization?

Christoph Henkelmann: One would actually like to think that Unicode does not have to be normalized at all – after all, it is intended to finally solve all the coding problems from the early days of word processing. But the devil is in the details. Unicode is enormously complex because language is enormously complex. There are six different types of spaces in Unicode. If you use standard methods of some programming languages to split text from different sources, you suddenly wonder why words still stick together. Also, the representation of words is not unique, e.g. there are two Unicode encodings of the word “Munich”. If you now compare signs by sign, “Munich” suddenly no longer equals “Munich”. If you forget something in preprocessing, we train a system on unclean data – and of course this does not give a good result.

JAXenter: You speak of different ways of displaying text – are there several and what is it all about?

Christoph Henkelmann: Since we do not have such an “obvious” representation of text, there are many different ways to feed text into an ML system. Starting with low-level methods, where a number is really assigned to a letter – basically the same as with a text file, through methods where individual words are encoded as the smallest unit, to methods, where a tensor is generated from an entire document, which is actually more of a “fingerprint” of the document, one can choose different “granularities”. Then there are a number of technical variants for each of these approaches. The complicated thing is: there is not the best approach, depending on the problem you have to choose the right one.

JAXenter: Is it also about coding semantics? Word2vec?

Christoph Henkelmann: Exactly, much more than with images or audio, the pre-processing of text has an effect on the semantic level at which the process moves. Sometimes preprocessing itself is already a kind of machine learning, so that we can already answer questions, only because we have coded the text differently. The best known and currently much discussed example is word2vec. Once you have created a word2vec encoding, you can answer semantic questions like “King – Man + Woman = ?”. Here you can read the answer “Queen” directly from the word2vec coding. Word associations can also be solved, e.g. “Berlin is for Germany like Rome for ?”. Word2Vec delivers the answer “Italy”. The semantic meaning results only from the mathematical distance of the encodings. The system “does not know” what a country or a capital is, it only knows the (high-dimensional) distance between the words. This is an incredibly useful presentation of words for ML systems and therefore also the final part of my presentation at the next ML Conference in Munich.

Thank you very much!

 

 

Carina Schipper

Carina Schipper ist seit 2017 Redakteurin beim Java Magazin, Business Technology und JAXenter. Sie hat Germanistik und Europäische Ethnologie / Volkskunde an der Julius-Maximilians-Universität Würzburg studiert.

 

 

 

 

 

 

 

 

The post Preparing Text Input for Machine Learning appeared first on ML Conference.

]]>
An interdisciplinary approach to artificial intelligence testing https://mlconference.ai/blog/interdisciplinary-approach-artificial-intelligence-testing/ Tue, 08 May 2018 13:14:26 +0000 https://mlconference.ai/?p=9457 Humanity is confronted more than ever with artificial intelligence (AI), yet it is still challenging to find a common ground. We talked with Marisa Tschopp, researcher at scip ag about Artificial Intelligent Quotient (A-IQ), how to automate A-IQ testing and more.

The post An interdisciplinary approach to artificial intelligence testing appeared first on ML Conference.

]]>
JAXenter: The term ‘intelligence’ is not easy to understand. What’s the best way to explain it and how can we apply it to machines? 

Marisa Tschopp: Human intelligence has been a very controversial topic and has undergone dramatic changes in history since the beginnings in the early 19th century. Intelligence gained importance especially in the educational context as these “mental abilities” were the best predictors for success in school and aimed to place students into the right classes. There are various, very elaborated theories, that define human intelligence. Nowadays, human intelligence is taking a more systemic perspective and incorporates various dimensions, not only the ability to calculate or solve riddles.

It is not easy to simply define human intelligence, and the same applies to machine intelligence. We must be aware that we are still in the process of clarifying terms and definitions about AI. For our research, we created the intelligence test from an interdisciplinary perspective. This means we analyzed the various theories and created our intelligence framework, based on what is currently appropriate in an AI context. Our framework is understood as a system of abilities:

  • to understand ideas (e.g. questions or commands)
  • in a specific environment and learn from experiences (e.g. referring to prior information or put it in context)
  • able to engage in reasoning to solve problems (e.g. to answer questions or solve tasks).

Areas of human intelligence are verbal skills, such as knowledge, understanding, and numerical reasoning, spatial and visual abilities, such as solving a puzzle or arranging images in a logical manner. Other dimensions are inter- and intrapersonal competencies, physiological or language skills. From the myriad of existing sub-skills, we have chosen several dimensions for testing:

  • Explicit Knowledge
  • Language Aptitude
  • Working Memory
  • Verbal- and Numerical Reasoning
  • Critical and Creative Thinking

Stay tuned!
Learn more about ML Conference:

JAXenter: And what is Bloom’s Taxonomy? Could you explain the reasoning behind it?

Marisa Tschopp: The intelligence domains aim to measure specific abilities, which all contribute individually with varying significance to the overall concept of interdisciplinary artificial intelligence. Furthermore, we have included Bloom’s Taxonomy to better understand the underlying hierarchies of thought.

Bloom explains thinking along the dimensions of lower to higher order skills. The domain Explicit Knowledge for example, measures Know-What as opposed to Know-How: it is comparable to information or data found in books or documents; like lexical knowledge – this domain is rated as a lower order thinking skill. On the other end are the higher order thinking skills, and these are represented as Creative or Critical Thinking in our model.

When we want to know if a machine is able of higher order thinking, we measure the ability to define and analyze a problem and to formulate counter questions adequately to get to a better solution. We investigate the handling with over-simplification, ambiguous questions and answer-uncertainty as part of the Critical Thinking Domain. In the end, we try to merge the best scientific approaches to get the best results; a result is good when it is valid, meaning that it accurately measures the actual capabilities.

JAXenter: What are academic IQ tests and how do they work?

Marisa Tschopp: Academic IQ tests aim to quantify intelligence in an objective manner. Scientific standards play a critical role, like for example the retest-reliability, which measures the correlation of the results of the same test taken at different times.

In short: The IQ is a standardized, numerical measurement of intelligence, with the Stanford-Binet and Wechsler Scales being those mostly in use. Nowadays, the intelligent quotient is a measure of deviation. This means if you take a valid, standardized test your result is compared to those of other testees. The distribution of results follows the rules of a normal distribution; this means that the majority of the people have an IQ around 100 and only 5% of the testees score very high or very low, or in other words are geniuses or in the state of mental retardation.

JAXenter: Are there plans to automate A-IQ testing? Can you talk us through the concent?

Marisa Tschopp: In the future, we want to execute A-IQ tests with all kinds of digital assistants, independent of their ecosystem. We are working on a solution to automate the A-IQ testing procedure to make it available to the broad public. This device will take over the role of the personal analyst, the investigator, who now evaluates the test manually, which is quite time-consuming.

A-IQ Test questions are administered acoustically from a computer (emulating the analyst) to the digital assistant, who is taking the test. Answers will be saved as audio data (e.g. mp3 file), which will be transformed into transcripts via speech2text. This will allow a continuous comparison with past test results. A distant based method like Soundex or Levenshtein will then be used to determine contextual differences. Deviations will be reported to the research department to identify implications and track changes in AI capabilities.

Thank you!

 

The post An interdisciplinary approach to artificial intelligence testing appeared first on ML Conference.

]]>
Cracking open the black box of Neural Networks https://mlconference.ai/blog/cracking-open-black-box-neural-networks/ Fri, 20 Apr 2018 09:39:30 +0000 https://mlconference.ai/?p=9413 The countdown to the Machine Learning conference in Berlin keeps ticking. We spoke with ML conference speaker and ML6 head of Applied Research Xander Steenbrugge about the “black box problem” in neural networks. Catch more of AI expert Xander Steenbrugge during his keynote talk, session, and workshop.

The post Cracking open the black box of Neural Networks appeared first on ML Conference.

]]>

JAXenter: You talk about neural networks in your keynote. Can you give us a very concrete example of a neural network first?

Xander Steenbrugge: A neural network is a chain of trainable numerical transformations applied to some input data yielding some output data. With this very general paradigm, we can now build anything from image classifiers, speech-to-text engines and programs that beat the best humans at Chess or Go.

JAXenter: What’s behind the black box problem? 

 Xander Steenbrugge: One of the major problems with current Deep Learning techniques is that trained models are very uninterpretable because they consist of millions of different parameters that are all interacting in very complicated ways in order to achieve the task they where trained for, you can’t just look at them and say “Aha so this is what it’s doing…” This makes it tricky to apply them in domains where safety and operational predictability are crucial. Across many application areas, we are left with a choice of using a 90% accurate model we understand, or a 99% accurate model we don’t. But if that model is in charge of diagnosing you and suggesting a medical treatment, what would you choose?

JAXenter: There are various ways to fool neural networks to make obvious mistakes called ‘adversarial attacks‘. What is the significance of adversarial attacks for ML applications and how will this point develop?

 Xander Steenbrugge: Adversarial attacks are significant because they pose a severe security risk for existing ML applications. The biggest problem is that most adversarial attacks are undetectable by humans, making them a nasty “under the radar problem”. Imagine a self-driving car that fails to recognize a stop sign because someone stuck an adversarial sticker on it; no need to explain why this is a very serious issue. Adversarial examples have exposed a new weakness in the current generation of neural network models that is not present in our biological brains and many research groups are now working to fix these, very likely paving the road for exciting new discoveries and potential breakthroughs in the field of AI.

JAXenter: Why can’t black boxes be interpreted and what approaches is research taking in this context?

 Xander Steenbrugge: Neural nets are uninterpretable because there are too many parameters, too many moving parts for a human to interpret. The research community is now actively working on creating new tools to bridge this gap. The first successful techniques try to generate pictures of what individual neurons in the network are looking at, giving an idea of the active components in the network. Recent works are now also trying to create trainable interfaces that map a networks decision process onto a representation that humans can interpret using techniques like attention and even natural language.

Thank you!

 

 

Carina Schipper

Carina Schipper has been an editor at Java Magazine, Business Technology and JAXenter since 2017. She studied German and European Ethnology / Folklore at the Julius-Maximilians-Universität Würzburg.

 

 

 

 

 

 

 

The post Cracking open the black box of Neural Networks appeared first on ML Conference.

]]>