The Conference for Machine Learning Innovation

Essential Workshop to Exploratory Data Analysis and Feature Engineering

Monday, December 9 2019
09:00 - 17:00
Booking note:
Essential Workshop
Most experienced data scientists would agree that data processing takes most of the time when undertaking machine learning projects. Both data pre-processing and feature engineering quality is crucial for model performance. However, it is not typically an easy thing to do. Dealing with real data, you are likely to encounter such problems as noise, missing values, excessive information, etc. Building a good feature vector turns out to be just as hard. In this workshop, you will learn some simple but effective ways of handling these problems using a public Google Play Store dataset as an example.

First of all, we’ll explore and preprocess the data: clean them, fix the errors, convert to appropriate type, etc. Then we will more thoroughly analyze the data: its correlations, relationships, and distribution of variables. After that, we will get rid of the least useful features and try to engineer new ones. Finally, you will train different machine learning models and see how data processing and feature engineering affect the model accuracy and training time.

Therefore, the workshop will cover:
  1. Primary data analysis and Preprocessing   
  2. Exploratory Data Analysis  
  3. Feature Engineering     
  4. Machine Learning model training and evaluation

The workshop requires participants to have a basic knowledge of Python and Machine Learning. If you do not have any coding experience you are still welcome to join us to get a sense of how much effort is invested to develop a machine learning project. Also, all the solutions will be ​provided during the workshop. Each participant should bring their own laptop and make sure VPN restrictions do not block the connection to Google Colab.​

Behind the Tracks