First-Party Data Collection & Data Quality (Part 2: Data Quality & Best Practices)
Today, we will focus on Best Practices for First-Party Data Tracking and examine some common issues associated with it.
About the Author:
Co-founder of Datacop, agency that fulfils marketing operation roles in large eCommerce companies such as OluKai, Melin, Roark, Visual Comfort and Company, Dedoles and others.
In our last article, we began discussing the topic of First-Party Data Collection and Data Quality.
We identified three key challenges that everyone should address:
Digital Identity
Data Quality & Best Practices
Data Monitoring
Previously, we explored the concept of Digital Identity. Today, we will focus on Best Practices for First-Party Data Tracking and examine some common issues associated with it.
Let’s start by exploring the data we should be capturing about our customers and website visitors. The vast majority of data that eCommerce businesses collect comes in the form of events.
Events in eCommerce represent either a customer action or any other relevant activity associated with a particular customer, such as sending a marketing email. Each event includes specific attributes that provide additional context about the action the customer has taken.
Events form the backbone of revenue-generating use cases and data analytics.
It is crucial to thoughtfully determine in advance which events and event attributes should be captured for each customer. This foresight ensures that you won't face limitations when conducting analyses or implementing revenue-generating strategies later on.
While every eCommerce business operates in a unique context, meaning some events may be relevant for one business and not for another, there are certain events that are broadly applicable to most eCommerce companies. We’ve categorized these events into four main groups, which are outlined below. For each category, we’ve also provided a list of recommended events to track, along with the associated event attributes.
Note: Based on our experience, crafting a robust data architecture—defining which events and attributes should be tracked—is one of the most challenging tasks for the Marketing Operations Department. This process has significant implications, as it impacts all subsequent data-driven activities.
Depending on your tech stack, after capturing an event, it may be sent directly to a data warehouse or a Customer Data Platform (CDP).
The specific events that are tracked out of the box (system events) versus those that require manual setup (custom events) also depend heavily on your tech stack. In this article, we will describe both custom events that we typically recommend setting up, as well as system events.
In our case, we will be describing system events of Bloomreach Engagement, our preferred marketing automation tool. It’s important to note that, depending on your tech stack, these events may have different names, structures, or might not be tracked at all. Nevertheless, you’ll likely find these events highly useful, regardless of the technology stack your company employs.
Event Categories
(a) Transactional Events
Transactional events in eCommerce refer to specific actions or interactions related to the completion of a transaction, such as a purchase, refund, or exchange on an eCommerce platform.
These events typically fall into the category of custom events, as out-of-the-box integrations rarely capture all the necessary event attributes required across different transactional events.
Here is the list of all transactional events that we recommend to capture:
purchase
: Represents a customer's purchase.purchase_item
: Represents items purchased in a customer's order.purchase_update
: Triggered when an order's status changes to shipped or delivered. Essential for sending order status emails through Bloomreach Engagement and analyzing average delivery time to customers.refund
: Represents a refund initiated by a customer, crucial for triggering email flows post-refund and calculating customer Lifetime Value (LTV).refund_item
: Represents item(s) returned by a customer in exchange for a refund.exchange
: Represents a product exchange conducted by a customer.exchange_item
: Represents product(s) returned in exchange for other product(s).subscription_status
: This is relevant only for eCommerce companies that offer subscription services. It tracks when a customer creates a new subscription or makes any changes to their existing subscription.
At Datacop, we’ve developed a comprehensive Tracking Document (see screenshot below) that outlines all the event attributes we recommend tracking for each transactional event. This document serves as a best practice blueprint for managing transactional events. If you need assistance with setting up or auditing your tracking, don’t hesitate to reach out to me.
(b) Website Engagement Events:
These events represent various actions that a customer or website visitor performs on the site. We’ll first list four events that Bloomreach Engagement automatically captures once its tracking is implemented on your website, followed by a list of additional events we recommend capturing.
Bloomreach Engagement System Events:
session_start
: This event is tracked when the customer visits your website.session_end
: This event is generated 20 minutes after the customer leaves your website or closes the browser tab. Its timestamp is that of the last session ping from the Javascript SDK + 30 seconds.first_session
: This event is derived from thesession_start
event. It is tracked when the customer visits your website for the very first time.page_visit
: Happens each time the customer opens one of the pages of your website. Note that these events are disabled by default but you can enable them using the track options of the Javascript SDK.
Custom Events:
view_item
: Tracked upon the loading of a Product Detail Page (PDP).view_item_variant
: Tracked when a customer selects a specific size or color variant on a PDP, if applicable.view_category
: Tracked upon the loading of a Product Listing Page (PLP).cart_update
: Tracked when a product is added to or removed from the cart.checkout
: Fired at each step of the checkout funnel, invaluable for debugging profile merging issues or analyzing the checkout funnel.search
: Triggered when a site visitor uses the search functionality, essential for search abandonment email scenario.filter
: Triggered when a customer applies product filters on the website, useful for understanding customer preferences.
Custom Events - Advanced:
item_impression
- This event is triggered when a specific product is displayed on a website visitor’s screen. If you choose to capture this event, it will generate a large volume of data. For this reason, we recommend against tracking it directly through Bloomreach Engagement, as it can be quite costly. Instead, tools like MetaRouter can track this event much more cost-effectively and store it directly in a data warehouse, offering a significantly cheaper solution for storing these events.
In the past, we published an article detailing common mistakes related to the view_item event, along with a free Tracking Document for this event. You can access the article here:
(c) Campaign Engagement Events
All campaign engagement events are system-generated, except for consent and double_opt-in events, which require custom integration.
In the context of Bloomreach Engagement, a campaign
event is incredibly broad. It encompasses any action (such as delivered, opened, clicked, etc.) across various channels, including:
Emails
SMS/MMS messages
WhatsApp
Push notifications
Webhooks
For example, when a customer opens an email campaign
, Bloomreach Engagement creates the following event:
This differs from when a customer clicks on an SMS campaign
, where the event would be:
Here is a more detailed documentation on campaign
events:
https://documentation.bloomreach.com/engagement/docs/system-events#campaign
The other two events that require custom integration are:
consent
: This event is activated when a customer grants us permission to send them email or SMS newsletters. Capturing the source of consent (e.g., checkout, footer, email pop-up banner) is vital for understanding the effectiveness of different consent collection strategies.double_opt_in
: If you use a double opt-in strategy for newsletter subscriptions, this event is tracked during the first step of the process, when the user provides their contact information. Once they confirm their subscription through the double opt-in email, the consent event is then tracked.
(d) Customer Profile Related Events
All of these events are system events within Bloomreach Engagement. They represent actions generated by the system that are connected to a specific customer profile, such as participating in an A/B test, experiencing personalized content (like banners, experiments, or recommendations), or the merging of two profiles. I’ve provided brief explanations of each event below, but for more detailed information, please refer to the Bloomreach documentation.
merge
: when 2 customer profiles are merged into 1.anonymization
: happens when you remove the customer's identification but still keep their data for analytics purposes.ab_test
: happens when the customer sees one of the variants of a page, banner, or recommendation you had created in an A/B split.experiment
: happens when an experiment is shown to the customer.banner
: happens when the customer is shown a banner you had created.banner
is considered to be a semi-custom event because you have to manually set up the tracking of a new custom-made banner while the banner templates provided are tracked automatically.recommendation
: happens when the customer is shown a recommendation box you had created.recommendation
is considered to be a semi-custom event because you have to manually set up the tracking of a new custom-made recommendation while the recommendation templates provided are tracked automatically.
For each event, we advise maintaining a comprehensive tracking document that outlines:
All attributes captured by an event.
Every trigger point associated with each event.
Common errors encountered with each event.
Quality assurance (QA) procedures for each event.
A changelog documenting updates and changes to the events.
Below is an example of view_item
and view_item_variant
tracking document:
https://docs.google.com/spreadsheets/d/1eKHSskTW0e9GUnZoJI_DjoIz-YnT-t8fT8kMC6WWRCU/edit?gid=0#gid=0
Common First-Party Data Tracking Issues
In this section, we'll explore the most common tracking challenges encountered when working with eCommerce clients and provide strategies to mitigate these issues as much as possible.
Unattributable Online Orders
Ideally, every web order should be preceded by website-related events within the same customer profile. After all, a customer must visit the website and perform certain actions to complete a purchase, right?
However, in practice, we often encounter web purchases that lack preceding website behavior stored in the same profile. This discrepancy can occur when the website activity is stored in a separate profile (as illustrated in the screenshot below) or, in some cases, when it isn't tracked at all.
Why Should You Care About Unattributable Orders?
If you want to understand your conversion rate based on a landing page or a traffic source (such as paid social, email, or organic search), unattributable online orders will pose a significant challenge. Because these orders lack preceding website behavior, you won't be able to link them to specific landing pages or traffic sources, meaning your analysis will not take into account these purchases.
If a significant portion of your orders—say, 30% or more—are unattributable, it could severely undermine the reliability of your analyses. With such a large segment of conversions excluded, your ability to accurately assess the performance of landing pages or traffic sources is compromised.
What Causes Unattributable Orders?
We've already discussed one reason for unattributable orders in the previous article: missing "identify" calls at crucial points on your website. This can lead to orders that are unattributable. For more details, refer to the section on "QA of 'Identify' Calls on Website" in the earlier article.
Another cause could be the extensive use of tracking prevention software by your website visitors, which can block the tracking of website behavior entirely. In such cases, a tool like Metarouter.io can be useful, as it can bypass most (if not all) of these blockers.
Additionally, unattributable orders may occur if website behavior tracking isn’t triggered before a visitor accepts cookies. If you choose not to track behavior without cookie consent, that's understandable, but be aware that it can negatively impact your analytical capabilities.
A final major reason for unattributable orders is the use of alternative checkout methods such as Apple Pay, Shop Pay, or PayPal. These options often don’t require customers to provide their email address on your website, resulting in the same issue as missing an "identify" call during checkout.
If you redirect customers to an order confirmation page hosted on your website after they complete a purchase through these checkout options, you can resolve this issue by tracking a purchase_frontend
event. This event will store both a browser cookie and the customer’s email, merging website behavior and transaction data into a single profile.
Ghost Sessions
This is a Bloomreach Engagement profile that stores a series of consecutive fictional sessions, referred to as Ghost Sessions.
These often occurs when a user leaves your website open in a dormant browser tab and their computer enters sleep mode. Periodically, the device may "wake-up" and the dormant browser tab is "pinged" - a behaviour common especially in MacBooks. Consequently, the JavaScript SDK running on the idle tab tracks a session_start and a session_end. This action subsequently records a brief, empty session in Bloomreach Engagement.
Here is the Bloomreach documentation that guides you on how to adjust your SDK to prevent tracking these sessions in your project.
Internal Referrals
This is how a session_start categorized as an internal referral looks like:
The "referrer" is the URL of the website from which I visited a specific landing page on our website (stored as “location” attribute). In this case, it appears that I navigated to datacop.shop from datacop.shop, meaning I visited our website from our own website.
While this might seem illogical, there are valid explanations for this occurrence.
Explanation #1: Redirects
Imagine a customer visits the URL "https://datacop.shop/challenge" from "https://google.com" but is immediately redirected to "https://datacop.shop/". In this scenario, the Bloomreach Engagement SDK might not have enough time to fire on the original landing page, "https://datacop.shop/challenge," and only initializes on the subsequent page, "https://datacop.shop/." As a result, we end up with a session_start event like the one shown in the screenshot above, losing the information about the customer's original source, which in this case was Google.
To address this issue, you can either ensure that the Bloomreach SDK fires before the redirect occurs or minimize the use of redirects as much as possible.
Explanation #2: Ghost Sessions
Ghost sessions are often (though not exclusively) identified as internal referrals. Here’s an example:
Explanation #3: Inactive Tabs on Mobile
Do you have dozens or even hundreds of inactive tabs open in your browser?
You're not alone. In fact, customers sometimes revisit your website from these inactive tabs. When this happens, a session start is generated, with the referrer being your own website (in the vast majority of cases), leading to a session start classified as an internal referral.
Monitoring Systems
We've frequently encountered situations where eCommerce companies lost revenue due to a mistake or partial outage in data tracking that went unnoticed for months.
To prevent this, we developed a notification system specifically designed to catch these issues early and resolve them as quickly as possible.
Here’s an article we previously wrote that provides an in-depth explanation of our notification system:
If you found this post valuable…
We hope you found this article valuable. If so, please consider subscribing (for free!) to receive updates on our latest publications.
Additionally, if you think someone else might benefit from this information, we would greatly appreciate it if you could share this article using the button below.