Tag

Python

19
Feb

OpenAI Embeddings

Embedding vectors (or embeddings) play a central role in the challenges of processing and interpretation of unstructured data such as text, images, or audio files. Embeddings take unstructured data and convert it to structured, no matter how complex, so they can be easily processed by software. OpenAI offers such embeddings, and this article will go over how they work and how they can be used.
2
Feb

Address Matching with NLP in Python

Discover the power of address matching in real estate data management with this comprehensive guide. Learn how to leverage natural language processing (NLP) techniques using Python, including open-source libraries like SpaCy and fuzzywuzzy, to parse, clean, and match addresses. From breaking down data silos to geocoding and point-in-polygon searches, this article provides a step-by-step approach to creating a Source-of-Truth Real Estate Dataset. Whether you're in geospatial analysis, real estate data management, logistics, or compliance, accurate address matching is the key to unlocking valuable insights.
15
Mar

On pythonic tracks

Python has established itself as a quasi-standard in the field of machine learning over the last few years, in part due to the broad availability of libraries. It is logical that Oracle did not really like to watch this trend — after all, Java has to be widely used if it wants to earn serious money with its product. Some time ago, Oracle placed its own library Tribuo under an open source license.
7
Jan

Real-time anomaly detection with Kafka and Isolation Forests

Anomalies - or outliers - are ubiquitous in data. Be it due to measurement errors of sensors, unexpected events in the environment or faulty behaviour of a machine. In many cases, it makes sense to detect such anomalies in real time in order to be able to react immediately. The data streaming platform Apache Kafka and the Python library scikit-learn provide us with the necessary tools for this.
16
Dec

Let’s visualize the coronavirus pandemic

Since February, we have been inundated in the media with diagrams and graphics on the spread of the coronavirus. The data comes from freely accessible sources and can be used by everyone. But how do you turn the source data into a data set that can be used to create something visual like a dashboard? With Python and modules like pandas, this is no magic trick.
11
Feb

Tutorial: Explainable Machine Learning with Python and SHAP

Machine learning algorithms can cause the “black box” problem, which means we don’t always know exactly what they are predicting. This may lead to unwanted consequences. In the following tutorial, Natalie Beyer will show you how to use the SHAP (SHapley Additive exPlanations) package in Python to get closer to explainable machine learning results.
28
Jan

Deep Learning: Not only in Python

Although there are powerful and comprehensive machine learning solutions for the JVM with frameworks such as DL4J, it may be necessary to use TensorFlow in practice. This can, for example, be the case if a certain algorithm exists only in a TensorFlow implementation and the effort to port the algorithm into another framework is too high. Although you interact with TensorFlow via a Python API, the underlying engine is written in C++. Using the TensorFlow Java wrapper library, you can train and inference TensorFlow models from the JVM without having to rely on Python. Existing interfaces, data sources, and infrastructures can be integrated with TensorFlow without leaving the JVM.

Behind the Tracks