Mar
18,
2024
Maximizing Machine Learning with Data Lakehouse and Databricks: A Guide to Enhanced AI Workflows
In today’s rapidly evolving data landscape, leveraging a Data Lakehouse architecture is becoming a key strategy for enhancing machine learning workflows. Databricks, a leader in unified data analytics, provides a robust platform that integrates seamlessly with the data lakehouse model to enable data engineers, data scientists, and Machine learning (ml) developers to collaborate more effectively. In this article, we explore how Databricks empowers organizations to streamline data processing, accelerate model development, and unlock the full potential of artificial intelligence (AI) by providing a centralized data repository. This solution not only improves scalability and efficiency but also facilitates end-to-end machine learning pipelines from data ingestion to model deployment.
Feb
19,
2024
OpenAI Embeddings
Embedding vectors (or embeddings) play a central role in the challenges of processing and interpretation of unstructured data such as text, images, or audio files. Embeddings take unstructured data and convert it to structured, no matter how complex, so they can be easily processed by software. OpenAI offers such embeddings, and this article will go over how they work and how they can be used.
Jan
4,
2024
Building a Proof of Concept Chatbot with OpenAIs API, PHP and Pinecone
We leveraged OpenAI's API and PHP to develop a proof-of-concept chatbot that seamlessly integrates with Pinecone, a vector database, to enhance our homepage's search functionality and empower our customers to find answers more effectively. In this article, we’ll explain our steps so far to accomplish this.
Jul
11,
2022
Take Control of ML Projects
The decision to move Elasticsearch to proprietary licensing awakened a sleeping giant. The open source community rapidly flexed its muscle to ensure a true open source option for fast and scalable search and analytics—which many users depend on for ML projects—would continue to be available. The result is OpenSearch, a community-driven hard fork of Elasticsearch 7.10.2, built with Apache Lucene and available under the fully open source Apache 2.0 license.
Feb
10,
2022
Python Developers live in Visual Studio Code
With over 18 million monthly users, VS Code has become one of the most popular and fastest-growing text editors in the world. To learn more about why over 3.7 million of them find VS Code to be the perfect habitat for Python development and data science work, keep on reading!.
Oct
12,
2021
What is Data Annotation and how is it used in Machine Learning?
What is data annotation? And how is data annotation applied in ML? In this article, we are delving deep to answer these key questions. Data annotation is valuable to ML and has contributed immensely to some of the cutting-edge technologies we enjoy today. Data annotators, or the invisible workers in the ML workforce, are needed more now than ever before.
Sep
14,
2021
Neuroph and DL4J
In this article, we would like to show how neural networks, specifically the multilayer perceptron of two Java frameworks, can be used to detect blood cells in images.
Jul
20,
2021
Top 5 reasons to attend ML Conference
So you’ve decided to attend ML Conference but you don’t know how to break it to your boss that it is a win-win situation? Don’t worry, we’ve got you covered. Follow 4 simple steps and use these 5 arguments to show why your organization needs to invest in ML Conference!
Jun
9,
2021
Anomaly Detection as a Service with Metrics Advisor
We humans are usually good at spotting anomalies: often a quick glance at monitoring charts is enough to spot (or, in the best case, predict) a performance problem. A curve rises unnaturally fast, a value falls below a desired minimum or there are fluctuations that cannot be explained rationally. Some of this would be technically detectable by a simple automated if, but it's more fun with Azure Cognitive Services' new Metrics Advisor.
Dec
16,
2020
Let’s visualize the coronavirus pandemic
Since February, we have been inundated in the media with diagrams and graphics on the spread of the coronavirus. The data comes from freely accessible sources and can be used by everyone. But how do you turn the source data into a data set that can be used to create something visual like a dashboard? With Python and modules like pandas, this is no magic trick.