What is a Data Product?
A data product is a piece of software that processes data to deliver actionable, valuable insights or results. Data products take raw data as input, apply a series of transformations, algorithmic processes, or analytics, and produces information that is useful to customers, decision makers, or business operations. Data products can range in complexity from simple analytics dashboards to personalized recommendation engines utilizing machine learning models and generative AI.
What does a Data Product Consist Of?
A data product implementation consists of multiple stages to go from raw data to valuable result.
Most data products consist of the following stages:
- Data Ingestion: A data product ingests or accesses one or multiple sources of data from other data systems like databases, file systems, object storage, queues, logs, data warehouses, data lakes, etc.
- Data Preparation: A data product prepares the input data for processing. Data preparation may require schema mapping, data normalization and augmentation, data cleansing, etc. Since a data product doesn't control the sources of data, there can be quite a bit of work in making the data readily usable.
- Data Integration: A data product transforms and links the input data into one coherent dataset for analysis. A data product transforms the input data to map it onto the right structure and establishes links or relationships between data records to enrich the combined dataset.
- Data Analysis: A data product analyzes the prepared data to derive valuable information in the form of a new dataset or enriching the prepared dataset. The analysis is defined by the logic of the data product and is often considered the core component of a data product. Data analysis may range in complexity from simple aggregations to the application of machine learning models and generative AI.
- Data Presentation: A data product produces results that are aligned with how the customer of the data product expects to consume the data.
Types of Data Products
We can classify data products by the results they produce and how they react to changes in data. Let's look at each of those dimensions.
Data Product Results
There are three different types of results that a data product can produce:
- Raw Data: When a data product produces raw data as a result, it is expected that the customer of the data product does further processing of the data. Outputting raw data gives the customer the greatest degree of flexibility in how they use the data but requires that they have data processing expertise.
- Interactive: When a data product serves the results through a database or data warehouse, the customer queries the data interactively through SQL or BI tools. That gives the customer a medium degree of flexibility and requires some data skills.
- API: An API is the easiest way to consume the result data by a wide variety of customers through code, low-code, or no-code tools but provides the least amount of flexibility.
In addition, data products can either dynamically update with changes in the input data or be static.
- Dynamic: Dynamic data products update their results in realtime as new source data arrives or as data changes.
- Static: Static data products compute the results from a snapshot of data at a certain point in time and don't update until the result set if recomputed (which happens periodically or manually).
Combining those two dimensions, we get 6 different types of data products.
Stream Processing = Dynamic + Raw
The data product processes raw streams of data to produce another raw stream of data for further processing by the customer.
Streaming Database = Dynamic + Interactive
The data product processes raw stream of data and stores the results in a database or data warehouse for interactive querying by the customer.
Streaming API = Dynamic + API
The data product processes raw stream of data, stores the results in a database, and serves the data through an API for access by the customer.
Batch Processing = Static + Raw
The data product processes raw data to produce files for further processing by the customer.
Snapshot Database = Static + Interactive
The data product processes raw data and stores the results in a database or data warehouse as a static snapshot for interactive querying by the customer.
Static API = Static + API
The data product processes raw data, stores the resulting snapshot in a database, and serves the data through an API for access by the customer.
Use DataSQRL to Build Data Products
DataSQRL makes it easy to build efficient data products by eliminating the data plumbing that is required to implement the various types of data products. That means you can build data products in days instead of months at a fraction of the cost.