Exponea BigQuery

This feature is currently available in Beta version

Exponea BigQuery is a package consisting of repeated data exports from Exponea application to Google BigQuery dataset and granting you a read access.

Exponea BigQuery (EBQ, formerly called Long Term Data Storage) is a petabyte-scale data storage in Google BigQuery. It provides a flexible, secure, and scalable infrastructure to house your data in an Exponea-like structure. Frequent data updates ensure that your data is always available on demand for custom analytics using your own BI tools. The scalable design easily handles the increased needs of your growing business, characterized by elevated data ingestion. EBQ ensures first-class care of all the technical and security aspects connected with your data storage.

Exponea BigQuery is very different from BiqQuery Integration which is used for importing data from any BigQuery that you are using.

Exponea BigQuery can store any events or user properties. Currently, events are loaded from Exponea into BigQuery roughly every 6 hours, and customers are loaded once a day.
The data is accessible through the BigQuery Console but you can also access it by using a GCP service account in BigQuery client libraries in supported programming languages. GCP service accounts are not created automatically; if you need this kind of access, you should request it, and we will create it for you. In both cases, we provide read-only access to the data, you cannot edit it. Deleting Customer profiles from Exponea also deletes them from Exponea BigQuery (this includes customer ids and customer attributes). Events however, are not deleted automatically.

🚧

Google email account needed

You cannot access the Exponea BigQuery console without a Google email account.

📘

Frequency of data-update

The data in Exponea BigQuery updates once a day for customer attributes and IDs, and four times a day for events.

Benefits

  • Highly scalable infrastructure will always meet your demands
  • Exponea-like data structure makes it easy to understand your data in 3rd party tools
  • Advanced security and threat detection
  • Custom data retention - store your data as long as you need to
  • Access management - distribute read access rights across your teams
  • Usage-based pricing - packages based on the expected volume of stored and processed data
  • Easily work with your data using Google BigQuery and 3rd party applications

Getting Started

To enable EBQ in your project, please contact your customer success manager or our support.

Accessing Exponea BigQuery

There are several ways how you can access your EBQ data, which are loaded in BigQuery.

  1. Use the web-based BigQuery console. You can use it to write your custom SQL queries, store and execute them.

  2. Use any of the modern BI tools that provide a connector to BigQuery e.g. Tableau, Power BI, Qlik or Google DataStudio to create reports and analysis using your EBQ data.

  3. Install and setup JDBC/ODBC drivers for Google BigQuery to facilitate accesses by various tools that do not have native BigQuery connector (e.g. ETL tools).

  4. Access Google BigQuery API using Google Cloud Client Libraries.

User Management

 Google Groups

You can manage users and access rights to your Google BigQuery dataset through a Google Group. You will receive an invitation email to your Google Group upon the setup of the EBQ module, which will also grant you admin rights. Only group members can access your BigQuery data.

Adding new members to a Google Group

  1. Open the group you have been invited to
  2. Select "Members" in the top right menu
  3. Select "Manage" in the top right menu
  4. Select "Direct add members" or in the left menu
  5. Add emails and the welcome message

User Accounts

For the purposes of data analysis in the BigQuery console and for BI tools access, you will use your Google user account.

Service Accounts

For the purposes of BigQuery access using an API, a service account will be created for you.

Data Schema

EBQ is a set of tables in the Google BigQuery (GBQ) dataset that is kept up to date using regular loads. A BigQuery project always contains 2 types of tables:

Events tables One table for each event type (e.g. session_start, item_view etc,)
Customers tables customers_properties
customers_id_history - history of all customer merges
customers_external_ids - mapping between internal and external customer ids

Event Tables

Event tables are loaded incrementally, which means that new rows are added to the tables during every load.

Each Event table contains tracked data in the same structure as in the application (see Data Manager > Events to understand the expected data structure). All tracked data are stored as a “properties” record and can be accessed in SQL in the following way:

SELECT properties.utm_campaign FROM 
`gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views.session_start`

First three columns in each Event table are:

Internal_customer_id An ID of the customer that can be used to join with Customer tables
ingest _timestamp Timestamp of when a given event was processed by Exponea
timestamp Timestamp of when a given event actually happened, business timestamp

To query those fields, no prefix needs to be used:

SELECT internal_customer_id FROM
`gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views.session_start`

Customer Tables

Customer tables are loaded using a full load. During each daily load, tables are loaded from scratch, hence they always contain the latest information about the customers.

customers_properties

The main customer table is customer_properties. The structure is the same as defined in the application (see Data Manager > Customer Properties to understand the expected data structure). All tracked data are stored as a “properties” record and can be accessed in SQL in the following way:

SELECT properties.last_name FROM
`gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views.customers_properties`

customers_id_history

This table contains a history of customer merging. past_id is the ID of the customer that is merged with another customer and internal_customer_id is the ID of the customer with which the past_id was merged. Until customers are merged, there is no record in this table.

🚧

customers_id_history is missing

View customers_id_history is created only when there are customer merges in the exported project.

customers_external_ids

This table maps each internal_customer_id to all available external_customer_ids. The value of external_customer_id is stored in id_value and the type of external_customer_id (e.g. cookie, email) is stored in id_name.

Properties schema

Both event and customers_properties tables contain the properties field, which is a BigQuery nested record (struct). Fields in this record have types based on types of properties specified in Exponea application (Data Manager > Events and Data Manager > Customer Properties).

The conversion is done in a following way:

Exponea type

BigQuery type

number

NUMERIC

boolean

BOOLEAN

date*

TIMESTAMP

datetime*

TIMESTAMP

list**

STRING

other

STRING

*date and datetime properties will be converted correctly only if their value is unix timestamp in seconds.
** list type is stored in BigQuery in a JSON serialized form

In addition to properties field, customers and event tables also contain raw_properties field. Properties in raw_properties field are not converted according to Exponea schema, all of them are BigQuery STRING type. This is useful for cases when conversion in properties doesn't return expected results (the returned value is null).

Data Queries

In order to get query results faster and cheaper, it is recommended to use partitioned tables in BigQuery. Timestamp partitioning is available in your project.

Timestamp partitioning is done using business timestamp. A business timestamp is the time when the event was tracked. See examples:

A filter on partitioned column has been used

A filter on partitioned column has NOT been used

🚧

Notice the difference in the volume of data to be processed (indicated in the bottom right part of the screenshots).

Load monitoring

In order to enable monitoring of the load process there is a table _system_load_log in each dataset. For example to find out what was the last time when session_start event type was loaded, the following query needs to be executed:

SELECT  tabs, timestamp 
FROM `gcloudltds.exp_4970734e_9ed3_11e8_b57b_0a580a205e7b_views._system_load_log` 
CROSS JOIN unnest(tables) as tabs
WHERE tabs in('session_start')
ORDER BY 2 DESC
LIMIT 1

Excluding deleted events

Data in the event tables in the BigQuery are loaded incrementally and never deleted, even if you delete some data in Exponea. Information about deleted data is stored in EBQ in the table _system_delete_event_type. You can use this table to filter only data that have not been deleted from Exponea.

The code below demonstrates how to exclude deleted purchase events:

SELECT * 
FROM   `gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views.purchase` 
WHERE  ingest_timestamp >= (SELECT Max(ingest_timestamp) 
                            FROM 
`gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views._system_delete_event_type` 
 WHERE  event_type = 'purchase') 
AND timestamp >= (SELECT Max(timestamp) 
                  FROM 
`gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views._system_delete_event_type` 
 WHERE  event_type = 'purchase')

Merging customers

Every time a customer uses a different browser or device to visit your website, they are considered as separate and non-related entities. However, once the customer identifies (through registering or logging in their account for example), those 2 profiles are merged into one. In this way, customer activity can be tracked across multiple browsers and devices.

Until the customers are merged, there is no record about the first or the second customer in the customers_id_history table.

At the moment of the merge, the following information is stored in the table:

internal_customer_id

customer_on_device_1

past_id

customer_on_device_2

This means that the customer that was tracked on device 2 (customer_on_device_2) is merged into the customer on device 1 (customer_on_device_1) and both customers together are now considered as a single merged customer.

It is important to work with merged customers when you analyze the event data to get all events generated by the customer_on_device_2 and customer_on_device_1 assigned to a single customer. Use the following query to work with merged customers:

SELECT    Ifnull(b.internal_customer_id, a.internal_customer_id) AS merged_id, 
                   a.internal_customer_id  AS premerged_id 
FROM      `gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views.session_start` a 
LEFT JOIN `gcloudltds.exp_558cba14_8a46_11e6_8da1_141877340e97_views.customers_id_history` b 
ON        a.internal_customer_id = b.past_id)

For each internal_customer_id in the event table (session_start in this case) if there is a merge available in the customers_id_history, internal_customer_id will be mapped to the merged customer. As a result, analysis can be now done on merged_id, all session_starts created historically by either customer_on_device_2 or customer_on_device_1 will have merged_id = customer_on_device_1.

Anonymizing customers

When you anonymize a customer in Exponea, the data is also anonymized in the BigQuery. This is done in the following steps:

  1. During anonymization through the app or API, an anonymization event is tracked for the customer.
  2. Anonymization event is loaded into the BigQuery during a periodic data-update along with all the other events.
  3. All the new anonymization events are queried.
    4, Anonymization is done by an Update query for all the anonymization events queried in step 3 which contained a customer_ID. This means that all the private event properties tracked for the particular customer IDs will be set to NULL.
  4. ltds_anonymization_timestamp is tracked for all the anonymized events with a timestamp of the moment when the anonymization went through in the BigQuery.

Scheduled queries for periodic recalculation of data

With the Exponea Big query, it is possible to connect a 3rd party BI tool, such as Tableau, periodically calculate complex queries, and store the results in an intermediary table within BigQuery. Tableau will then be able to report on the data within this table.

The prerequisites of this are:

  1. You must have your own Google Cloud Platform (GCP) project with BigQuery (BQ) enabled
  2. You must have the BigQuery Data Transfer API enabled
  3. As a person configuring this, you must have a ‘write access’ to the BQ and ‘read access’ to EBQ

To configured scheduled queries, follow this guide:

  1. Open the relevant GCP and ensure that the active project is the desired GCP project and not GCloudLTDS (EBQ)
  2. Create a new dataset in the BQ at the same location as EBQ (EU)
  3. Create a new Scheduled query, as shown on the images below

Exporting data

We recommend using the raw_properties field in customers_properties and event tables.
See the complete tutorial here: https://cloud.google.com/bigquery/docs/exporting-data

Updated 20 days ago


Exponea BigQuery


This feature is currently available in Beta version

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.


We rely on cookies

to optimize our communication and to enhance your customer experience. By clicking on the Accept and Close button, you agree to the collection of cookies. You can also adjust your preferences by clicking on Manage Preferences. For more information please see our Privacy policy.

Manage cookies
Accept & close

Cookies preferences

Accept & close
Back