External ID

In Exponea there are two types of identifiers, external ID and internal ID. Unlike the internal ID - which is assigned to each customer automatically by Exponea and can be seen the URL when viewing the details of a customer - External ID is specified by the user and is used to map customers from your external data source. You can see this ID as a database index.

Each customer must have at least one external ID. For the most basic use cases like importing customers by email addresses, you could use an email address as the external ID. However, to set up more advanced use cases (web tracking of anonymous customers that haven't been authenticated yet) we have created multiple external identifiers, which can be configured on a per-project basis.

Types of External IDs

Each external identifier has its own group name. The most simple configuration would have the following external IDs:

  • registered - hard (used for tracking ID from the database of your system)
  • cookie - soft (cookie used to track anonymous customers)

📘

Unlimited number of identifiers

Since there are more advanced use cases (e.g. you have ID from your system, ID from Facebook and you want to identify customers by email address as well), you can have unlimited number of those identifiers.

Use cases

Each project has several customer IDs (soft, hard, Google Analytics etc.). This is a list of various use cases that describe the most common tasks that require you to ýmanipulate with customer IDs. In order to make project configurations, you need access to Exponea administration.

Creating a new customer by hard ID

  • IDs: registered (hard), cookie (soft)
  • Identification calls: {'registered': '1'}
  • Expected result: New customer with external IDs: {'registered': '1'}

Creating a new customer by soft ID

  • IDs: registered (hard), cookie (soft)
  • Identification calls: {'cookie': '123e4567-e89b-12d3-a456-426655440000'}
  • Expected result: New customer with external IDs: {'cookie': '123e4567-e89b-12d3-a456-426655440000'}

Lookup existing customer by ID

  • IDs: registered (hard), cookie (soft)
  • Customers: c1 with {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'}
  • Identification calls: {'registered': '1'}
    * Expected result: c1 with external IDs: {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'}

Identifying anonymous customer

  • IDs: registered (hard), cookie (soft)
  • Customers: c1 with {'cookie': '123e4567-e89b-12d3-a456-426655440000'}
    * Identification calls: {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'}
  • Expected result: c1 with external IDs: {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'}

Multiple cookies

  • IDs: registered (hard), cookie (soft)
    * Customers: c1 with {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'}
  • Identification calls: {'registered': '1', 'cookie': '234e5678-e90b-12d3-a456-426655440000'}
  • Expected result: c1 with external IDs: {'registered': '1', 'cookie': ['123e4567-e89b-12d3-a456-426655440000', '234e5678-e90b-12d3-a456-426655440000']}

Anonymize customer

On the other hand, if you want to anonymize a customer, you need to:

  1. remove private customer properties
  2. remove external IDs and replace its cookie (cookies are replaced by a random Universal Unique Identifier)
  3. remove private properties of events
  4. add an anonymization event

📘

Step by step

To anonymize customers in Exponea, follow our documentation guide here.

Export ID history

In order to Export all past ID, you need to present that ID in JSON file in GCS

Customer merging

When merging N customers together, the following rules are used:

  • the customer destination where all the other customers will be merged is the customer with smaller internal ID
  • the "other customers" are called source customers and will be merged one by one to the destination customer in the sorted order (by internal ID, smallest first)
  • events are moved from source customers to destination customer
  • properties from source customers are added to destination customer
  • if destination customer already have a property with a given name it is replaced
  • the same rule that applies to properties applies also to assigned HTML campaigns
  • last campaign communications are merged in a way, that the latest communication overwrites the older one
  • assigned ab tests are merged in reversed order (the ab test from a customer with newer (larger) internal ID wins over the older (smaller) one)
  • add_first_session flag is set only if any of the customers has it set to True
  • external IDs of the customers are added together (if the constraints that apply to hard IDs are not met, the merge won't take place, for more information see merge conflict resolving)

Basic customer merging (use case)

  • IDs: registered (hard), a cookie (soft)
  • Customers:
    c1 with {'cookie': '123e4567-e89b-12d3-a456-426655440000'} and properties {'a': 1, 'b': 2}
    c2 with {'registered': '1'} and properties {'a': 2, 'c': 3}
  • Identification calls: {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'}
  • Expected result:
    c1 with external IDs: {'registered': '1', 'cookie': '123e4567-e89b-12d3-a456-426655440000'} and properties {'a': 2, 'b': 2, 'c': 3}
    c2 will be removed and all his properties and events will be moved to c1 (for more details about how are customers merged together see customer merging section below)
    Note: c2 is merged into c1 because c1 is older, even though c2 has a hard ID.

Merge conflict resolving

There are cases when merging existing customers by the rules above would result in breaking the constraint, that one customer can have only one hard ID of a given type. In those cases following rules applies:

  • If there are soft IDs involved in the merge request try to resolve it by moving some soft IDs between the customers. Since there are multiple ways how to solve the same merge conflict while the algorithm still being deterministic, each soft ID has assigned a priority and users are choosing that combination that maximizes the sum of those soft IDs that will be moved between customers, so the least important customer IDs are moved and the most important ones are kept intact (see information below). There are some cases where moving the soft IDs won't help in resolving the merge conflict, in those cases, the same strategy is used as when the conflicts are caused only by hard IDs.
  • If there are only hard IDs involved in the merge request, there is no way how to resolve it. In this case, we discard the operation as a merge failure.

📘

Soft ID priority ranking

Each soft ID has assigned a rank (priority) number telling us how important the soft ID is, which is an index of the soft ID in the array of external IDs when discarding all the hard IDs + 1

e.g.:
IDs: registered (hard - no rank),
email (soft - rank 1),
Facebook (hard - no rank),
phone (soft - rank 2),
cookie (soft - rank 3)

(In this example the most important soft ID is email, the second most important soft ID is phone and the least important soft ID is cookie)

Conflict resolving use cases

Unresolvable hard ID request

  • IDs: registered (hard), Facebook (hard)
  • Customers:
    c1 with {'registered': '1', 'facebook': '1'}
    c2 with {'registered': '2', 'facebook': '2'}
  • Identification calls: {'registered': '1', 'facebook': '2'}
  • Expected result: No customers are modified and a merge error is returned.

Moving cookie from one customer to another

  • IDs: registered (hard), a cookie (soft)
  • Customers:
    c1 with {'registered': '1', 'cookie': ['1', '3']}
    c2 with {'registered': '2', 'cookie': '2'}
  • Identification calls: {'registered': '2', 'cookie': '1'}
  • Expected result:
    c1 with {'registered': '1', 'cookie': '3']}
    c2 with {'registered': '2', 'cookie': ['2', '1']}
    This means we cannot merge those two customers since that would result in a customer having two hard IDs (registered → 1, 2), however, if we move the requested cookie ID from c1 to c2, we don't have any conflict

Using soft ID priority to solve a merge conflict

  • IDs: registered (hard), email (soft), a cookie (soft)
  • Customers:
    c1 with {'registered': '1', 'email': '2', 'cookie': '3'}
    c2 with {'registered': '4', 'email': '5'}
  • Identification calls: {'cookie': '3', 'email': '5'}
  • Expected result:
    c1 with {'registered': '1', 'email': '2']}
    c2 with {'registered': '4', 'email': '5', 'cookie': '3'}
    This means we cannot merge those two customers since that would result in a customer having two hard IDs (registered → 1, 4), solving this problem can be done either by:
  • moving email c2 → c1
  • moving cookie c1 → c2
    Since email is more important (rank 1), we IDs which rank is the highest, therefore we move cookie (rank 2)

Moving soft IDs from two different customers at once

  • IDs: registered (hard), email (soft), phone (soft), cookie (soft)
  • Customers:
    c1 with {'registered': '1', 'email': '1'}
    c2 with {'registered': '2', 'phone': '2'}
    c3 with {'registered': '3', 'cookie': '3'}
  • Identification calls: {'registered': '1', 'email': '1', 'phone': '2', 'cookie': '3'}
  • Expected result:
    c1 with {'registered': '1', 'email': '1', 'phone': '2', 'cookie': '3'}
    c2 with {'registered': '2'}
    c3 with {'registered': '3'}

Moving two soft IDs from one customer at once

  • IDs: registered (hard), email (soft), phone (soft), cookie (soft), device (soft)
  • Customers:
    c1 with {'registered': '1', 'email': '1', 'cookie': '1'}
    c2 with {'registered': '2', 'phone': '2', 'device': '2'}
  • Identification calls: {'email': '1', 'cookie': '1, 'phone': '2', 'device': '2'}
  • Expected result:
    c1 with {'registered': '1', 'email': '1', 'phone': '2', 'cookie': '1', 'device': '2'}
    c2 with {'registered': '2'}

Unable to resolve a conflict by moving any of the soft IDs

  • IDs: registered (hard), Facebook (hard), a cookie (soft)
  • Customers:
    c1 with {'registered': 'A', 'facebook': 'B'}
    c2 with {'registered': 'B'}
    c3 with {'facebook': 'C', 'cookie': 'X'}
  • Identification calls: {'facebook': 'B', 'registered': 'B', 'cookie': 'X'}
  • Expected result:
    c1 with {'registered': 'A', 'facebook': 'B'}
    c2 with {'registered': 'B', 'cookie': 'X'}
    c3 with {'facebook': 'C'}

📘

The best thing to do is to move the cookie ID from c3 → c2. This won't solve the conflict between c1 and c2, but it's still better than returning just the c2 without any changes. (Registered ID has the lowest rank, that's why c2)

Hard ID conflict caused by ID not present in the data yet

  • IDs: registered (hard), Facebook (hard)
  • Customers: c1 with {'registered': '2', 'facebook': '1'}
  • Identification calls: {'registered': '1', 'facebook': '1'}
  • Expected result: None of the customers are modified and an merge error is returned.

Hard ID conflict caused by ID not present in the data yet resolved by moving soft ID

  • IDs: registered (hard), cookie (soft)
  • Customers: c1 with {'registered': 'A', 'cookie': 'B'}
  • Identification calls: {'registered': 'B', 'cookie': 'B'}
  • Expected result:
    c1 with {'registered': 'A'}
    c2 with {'registered': 'B', 'cookie': 'B'}

Hard ID conflict caused by ID not present in the data yet resolved by moving soft ID (multiple hard IDs)

  • IDs: registered (hard), Facebook (hard), cookie (soft)
  • Customers:
    c1 with {'facebook': '1', 'cookie': '1'}
    c2 with {'registered': '2'}
  • Identification calls: {'registered': '2', 'facebook '2', 'cookie': '1'}
  • Expected result:
    c1 with {'facebook': '1'}
    c2 with {'registered': '2', 'facebook': '2', 'cookie': '1'}

Two hard ID conflicts at once caused by soft IDs

  • IDs: registered (hard), facebook (hard), email (soft), phone (soft), cookie (soft), device (soft)
  • Customers:
    c1 with {'registered': '1', 'email': '1', 'device': '3'}
    c2 with {'registered': '2', 'facebook': '2', 'phone': '2'}
    c3 with {'facebook': '3', 'cookie': '3', 'device': '4'}
  • Identification calls: {'email': '1', 'phone '2', 'cookie': '3', 'device': '5'}
  • Expected result:
    c1 with {'registered': '1', 'facebook': '3', 'email': '1', 'phone': '2', 'cookie': '3', 'device': ['3', '4', '5']}
    c2 with {'registered': '2', 'facebook': '2'}

Non-trivial hard IDs conflict 1

  • IDs: email (hard), strange (hard), registered (soft), a cookie (soft)
  • Customers:
    c1 with {'email': '[email protected]', 'strange': '1', 'registered': 'A', 'cookie': '09e7c434'}
    c2 with {'email': '[email protected]', 'strange': '2'}
  • Identification calls: {'email': '[email protected]', 'strange': '3', 'registered': 'A'}
  • Expected result:
    No customers are modified and a merge error is returned.

Nontrivial hard IDs conflict 2

  • IDs: email (hard), strange1 (hard), strange2 (hard), registered (soft), cookie (soft)
  • Customers:
    c1 with {'email': '[email protected]', 'strange1': '1', 'registered': 'A', 'cookie': '09e7c434'}
    c2 with {'strange2': 's1', 'cookie': '0a3c2f45'}
    c3 with {'email': '[email protected]', 'strange1': '2'}
  • Identification calls: {'email': '[email protected]', 'strange1': '3', 'strange2': 's2', 'registered': 'A', 'cookie': '0a3c2f45'}
  • Expected result: No customers are modified and a merge error is returned.

Limitations

There is a limitation on how many soft IDs of one type can be assigned to a single customer. This is set to 64. When this limit is hit, the oldest soft ID is removed.
For example, a customer has 64 cookies and 64 google_analtytics IDs and is then identified with a new cookie and a new google_analytics ID. At this point the oldest cookie — probably the one from the customer's first session — is removed from the cookies list and the new one is added, thus the customer keeps 64 cookies in total. The same will happen with the google_analytics ID.

Since there are 2^n possibilities (we are finding subsets of n elements) how to remove soft IDs when resolving soft ID requests where n is a number of soft IDs involved, we are not examining all the possibilities, since that could lead to performance problems and vulnerabilities, where an attacker could configure such combination of IDs and do such requests, that would lead to exponential computational complexity. To avoid this we are examining the first 16 subsets. This can lead to some non-optimal behaviour in some really complex use cases that will probably never happen in production unless someone artificially fabricates them.

Limitations use cases

Too many cookies

  • IDs: registered (hard), a cookie (soft)
  • Identification calls:
    {'registered': '1', 'cookie': '1'}
    {'registered': '1', 'cookie': '2'}
    {'registered': '1', 'cookie': '3'}
    ...
    {'registered': '1', 'cookie': '65'}
  • Expected result:
    After first 64 requests the only customer should look like this:
    {'registered': '1', 'cookie': ['2', '3', ... '65']}

Too many soft IDs with more complex case (eg: the limit for soft IDs is 4 instead of 64, making it shorter)

  • IDs: r1 (hard), r2 (hard), cookie (soft), phone (soft)
  • Identification calls:
    {'r1': '1', 'cookie': '5', 'phone': '123'}
    {'r1': '1', 'cookie': '2', 'phone': '234'}
    {'r1': '1', 'cookie': '3', 'phone': '345'} ← those 3 belong to first customer c1
    {'r2': '2', 'cookie': '4', 'phone': '456'}
    {'r2': '2', 'cookie': '1', 'phone': '567'} ← those 2 belong to second customer c2
    {'r1': '1', 'r2': '2', 'cookie': '6'} ←this command would trigger a merge, but there are two problems: a cookie is over limit by 2 and phone is over limit by 1
  • Expected result:
    After first 5 requests there should be two customers:
    c1 with {'r1': '1', 'cookie': ['5', '2', '3'], 'phone': ['123', '234', '345']}
    c2 with {'r2': '2', 'cookie': ['4', '1'], 'phone': ['456', '567']}
    The 6th request should result in the following customer:
    c12 with { 'r1': '1', 'r2': '2', 'cookie': ['3', '4', '5', '6'], 'phone': ['234', '345', '456', '567']}

Modifying ID configuration

Currently, you can define your IDs per project, however, only the following operations are allowed:

  • adding a new identifier
  • turning hard ID to soft ID

Other operations (e.g. reordering the IDs, removing ID, turning soft ID to hard ID) are not supported since that would require traversing all the data and checking for conflicts (e.g. already existing customers with multiple soft IDs that would be turned into customers with multiple hard IDs that are not supported).