This Is How We Designed Data Warehouses for Bloomreach Engagement Users
Since our inception, we have dedicated significant time to developing and maintaining data warehouses for companies that use Bloomreach Engagement, as well as for those that do not...
About the Author:
Co-founder of Datacop, agency that fulfils marketing operation roles in large eCommerce companies such as OluKai, Melin, Roark, Visual Comfort and Company, Dedoles and others.
Since our inception, we have dedicated significant time to developing and maintaining data warehouses for companies that use Bloomreach Engagement, as well as for those that do not.
In this article, we aim to share our insights on the importance of having a data warehouse. We will discuss the problems it addresses and explain the design process of the data warehouses we have built, so you can take some inspiration and guidance should you decide to embark on building your own.
This article will explore the data warehouse setup we have developed for a Bloomreach Engagement client. It incorporates an Engagement BigQuery module, the specifics of which, including its functionalities and how we have integrated it into our setup, will be detailed later in this article.
Furthermore, I believe this article could be valuable even if you are not using Bloomreach Engagement. The design principles of a data warehouse, whether for Bloomreach users or other platforms, tend to be quite similar.
So, without further ado, let's dive in.
Why Have We Decided to Build a Data Warehouse Based on Engagement BigQuery Module?
There are three primary reasons for choosing to build our data warehouse with Engagement BigQuery Module:
1. Addressing the Limitations of Data Retention in Bloomreach Engagement
Bloomreach Engagement is undoubtedly an excellent analytics tool, notable for its real-time capabilities and rapid data processing. However, it is not an all-encompassing solution for every analytical need. A key factor in this is the platform's data retention policy, which directly affects pricing: the longer you retain your data, the higher the cost. This creates a significant incentive to delete older data, which can limit our ability to perform long-term analyses. Therefore some analyses may not be simply possible. For example, if you want to analyze what product categories customers were browsing a year ago compared to this year, this can be rarely executed in Bloomreach Engagement because you would need the expiration period for the view_item event to be longer than 13 months.
Engagement BigQuery addresses this limitation effectively. It serves as a repository for all events and customer profiles ever recorded or currently present in Bloomreach Engagement. Therefore, if we use Bloomreach for five years, Engagement BigQuery enables us to analyze data spanning those five years, providing a much broader scope for insights and decision-making.
2. Bloomreach Engagement is Not a Data Warehouse
Since Bloomreach Engagement is a CDXP platform (meaning it has a customer data platform in the middle and a marketing activation layer around it), it is designed to work best with customer-related data, such as on-site behaviors (captured as events) and customer attributes like first name, phone number or email.
However, integrating and manipulating other types of data, such as Facebook marketing spend, is not straightforward within Bloomreach Engagement. For example, if you want to understand how much money you have spent across all channels on the promotion of specific product, there is no easy way to import & work with this data directly in Bloomreach Engagement.
This limitation led us to choose BigQuery as our data warehouse solution, which offers much greater flexibility and ease in handling diverse data types.
3. Enhancing the Capabilities of Bloomreach Engagement
There are some use cases that are quite difficult to deploy directly in Bloomreach Engagement but considerebly easier to deploy with the help of BigQuery data warehouse. Here are couple of examples:
Daily Giveaway Automation: Previously, one of our clients manually selected a winner for their daily giveaway and sent a congratulatory email each day. With BigQuery, we fully automated this process. We now select a winner from eligible new email subscribers directly in BigQuery, sent a “winner” event back to Bloomreach Engagement for a chosen customer profile, which triggers the congratulatory email.
Extended Back-in-Stock Email Automation: Our advanced back-in-stock email automation, which generates four times more revenue than traditional back-in-stock notifications, was also enabled through BigQuery.
Slack Notifications for Data Anomalies: We have developed various Slack notifications to alert us of significant data changes, such as anomalies in number of cart_update events or interruptions in critical email scenarios.
For example, a notification was set up to alert us whenever there was a 30% or more reduction in cart_update events, which could indicate an issue with frontend data tracking or a decrease in conversion rates among website visitors.
For a more in-depth exploration of Slack notifications, feel free to read the comprehensive article linked here.
Designing a BigQuery Data Warehouse
What is Engagement BigQuery?
Engagement BigQuery, formerly known as Exponea BigQuery or Long Term Data Storage, is a comprehensive repository of all events and customer profiles that have been, or are currently, housed within the Bloomreach Engagement platform.
It is crucial to understand that this project does not grant editing privileges, which means data cannot be deleted or structurally modified.
One notable characteristic of this dataset is the potential presence of duplicates. For example, if an event is uploaded to Bloomreach Engagement five times and subsequently deleted four times, Engagement BigQuery will retain records of all five uploads.
The core function of this BigQuery project is to act as a robust data source for analyzing website behavior and omni-channel engagement.
Events are appended to a dedicated table every six hours, while the customer table is rewritten on a daily basis.
For more detailed information, please refer to the Bloomreach Engagement documentation at: https://documentation.bloomreach.com/engagement/docs/ebq
From my perspective, Engagement BigQuery serves more as a data source rather than a true data warehouse, primarily due to the lack of editing capabilities that would allow for structural data changes and data aggregation for enhanced performance.
Consequently, we treat Engagement BigQuery as a foundational data source within a custom-built data warehouse for our clients. Below is a diagram illustrating the structure of this data warehouse.
BigQuery Data Warehouse Structure
Below is a diagram illustrating the structure of the data warehouse we have developed:
In the top left corner of the diagram, you will notice the Engagement BigQuery icon. In our configuration, this serves as the primary data source for all customer engagement metrics, such as website behavior and interactions with omni-channel campaigns.
Below this, you can view other data sources that we typically incorporate, such as ad spend data from various platforms and data from company Google Sheets, which predominantly includes revenue forecasts.
For some use-cases, such as real-time Slack notifications, we also pull real-time data from Bloomreach Engagement. This is necessary because the Engagement BigQuery module is updated only every six hours, which may not be frequent enough for certain applications.
To facilitate the transfer of this data into our data warehouse, we utilize Cloud Functions, which are also part of the Google Cloud Platform. Cloud Functions offer tremendous flexibility for our use cases and have proven to be the most cost-effective solution we’ve found.
In the top right corner of the diagram, you'll find various developer tools that we employ to build and maintain our data warehouse setup.
Dataform has been a significant enhancement for us. It has enabled us to implement version control of our data warehouse, automate dependencies, and create a dependency graph (see the screenshot below). This automation eliminates the need to manually specify refresh times for tables through scheduled queries.
Given the extensive benefits we've realized from using Dataform, we plan to dedicate a separate article to this topic in the future.
On the right side of the diagram, you can see the outputs from our data warehouse. We primarily use Tableau for data visualization, but other BI software options are equally viable depending on your needs.
We often integrate data back into Bloomreach Engagement for specific use cases.
Additionally, we push some data back into Google Sheets, which are widely used by many users across many organizations.
Slack notifications are also triggered based on the data stored in our data warehouse.
Future Enhancements
Looking forward, there are several areas we are excited about but haven't yet fully exploited. One significant area is the broader Google Cloud Platform (GCP), which offers a plethora of tools including pre-trained machine learning models that can be applied directly to our data warehouse. We're also exploring the direct integration of AI assistants with the data warehouse, such as ChatGPT and Google's Gemini, to further enhance our capabilities.
As we implement these tools, expect upcoming Substack articles, dedicated to this topic!
What to Expect Next
Based on a BigQuery data warehouse we've developed a compelete reporting for an eCommerce companies that focus on multiple areas which are:
Shop performance,
Product Perfomance,
Customer Insights,
Paid Ads Performance,
Owned Channels Performance
Other Tools
In upcoming articles, we will showcase some of these tools and analytical frameworks we've developed, enabling you to implement them in your eCommerce operations, should you choose to do so.
If you found this post valuable…
We hope you found this article valuable. If so, please consider subscribing (for free!) to receive updates on our latest publications.
Additionally, if you think someone else might benefit from this information, we would greatly appreciate it if you could share this article using the button below.