Skip to main content

3 posts tagged with "Postgres"

View All Tags

· 10 min read
Matthias Broecheler

A common problem in search is ordering large result sets. Consider a user searching for “jacket” on an e-commerce platform. How do we order the large number of results to show the most relevant products first? In other words, what kind of jackets is the user looking for? Suit jackets, sport jackets, winter jackets?

Often, we have the context to infer what kind of jacket a user is looking for based on their interactions on the site. For example, if a user has men’s running shoes in their shopping cart, they are likely looking for men’s sports jackets when they search for “jacket”.

At least to a human that seems pretty obvious. Yet, Amazon will return a somewhat random assortment of jackets in this scenario as shown in the screenshot below.

Amazon search results for `jacket` |

To humans the semantic association between “running shoes” and “sport jackets” is natural, but for machines making such associations has been a challenge. With recent advances in large-language models (LLMs) computers can now compute semantic similarities with high accuracy.

We are going to use LLMs to compute the semantic context of past user interactions via vector embeddings, aggregate them into a semantic profile, and then use the semantic profile to order search results by their semantic similarity to a user’s profile.

In other words, we are going to rank search results by their semantic similarity to the things a user has been browsing. That gives us the context we are missing when the user enters a search query.

In this article, you will learn how to build a personalized shopping search with semantic vector embeddings step-by-step. You can apply the techniques in this article to any kind of search where a user can browse and search a collection of items: event search, knowledge bases, content search, etc.

· 11 min read
Matthias Broecheler

Let’s build a personalized recommendation engine using AI as an event-driven microservice with Kafka, Flink, and Postgres. And since Current23 is starting soon, we will use the events of this event-driven conference as our input data (sorry for the pun). You’ll learn how to apply AI techniques to streaming data and what talks you want to attend at the Kafka conference - double win!

We will implement the whole microservice in 50 lines of code thanks to the DataSQRL compiler, which eliminates all the data plumbing so we can focus on building.

Watch the video to see the microservice in action or read below for step-by-step building instructions and details.

What We Will Build

We are going to build a recommendation engine and semantic search that uses AI to provide personalized results for users based on user interactions.

Let’s break that down: Our input data is a stream of conference events, namely the talks with title, abstract, speakers, time, and so forth. We consume this data from an external data source.

In addition, our microservice has endpoints to capture which talks a user has liked and what interests a user has expressed. We use those user interactions to create a semantic user profile for personalized recommendations and personalized search results.

We create the semantic user profile through vector embeddings, an AI technique for mapping text to numbers in a way that preserves the content of the text for comparison. It’s a great tool for representing the meaning of text in a computable way. It's like mapping addresses (i.e. street, city, zip, country) onto geo-coordinates. It’s hard to compare two addresses, but easy to compute the distance between two geo-coordinates. Vector embeddings do the same thing for natural language text.

Those semantic profiles are then used to serve recommendations and personalized search results.

· 14 min read
Matthias Broecheler

When developing streaming applications or event-driven microservices, you face the decision of whether to preprocess data transformations in the stream engine or execute them as queries against the database at request time. The choice impacts your application’s performance, behavior, and cost. An incorrect decision results in unnecessary work and potential application failure.

To preprocess or to query? >|

In this article, we’ll delve into the tradeoff between preprocessing and querying, guiding you to make the right decision. We’ll also introduce tools to simplify this process. Plus, you’ll learn how building streaming applications is related to fine cuisine. It’ll be a fun journey through the land of stream processing and database querying. Let’s go!

Recap: Anatomy of a Streaming Application

If you're in the process of building an event-driven microservice or streaming application, let's recap what that entails. An event-driven microservice consumes data from one or multiple data streams, processes the data, writes the results to a data store, and exposes the final data through an API for external users to access.

The figure below visualizes the high-level architecture of a streaming application and its components: data streams (e.g. Kafka), stream processor (e.g. Flink), database (e.g. Postgres), and API server (e.g. GraphQL server).

Streaming Application Architecture

An actual event-driven microservice might have a more intricate architecture, but it will always include these four elements: a system for managing data streams, an engine for processing streaming data, a place to store the results, and a server to expose the service endpoint.

This means an event-driven architecture has two stages: the preprocess stage, which processes data as it streams in, and the query stage which processes user requests against the API. Each stage handles data, but they differ in what triggers the processing: incoming data triggers the preprocess stage, while user requests trigger the query stage. The preprocess stage handles data before the user needs it, and the query stage handles data when the user explicitly requests it.

Understanding these two stages is vital for the successful implementation of event-driven microservices. Unlike most web services with only a query stage or data pipelines with only a preprocess stage, event-driven microservices require a combination of both stages.

This leads to the question: Where should data transformations be processed? In the preprocessing stage or the query stage? And what’s the difference, anyways? That’s what we will be investigating in this article.