It is possible to import data to your project from sources outside of Exponea. These can be CSV or XML files (uploaded through the import wizard or through a URL feed), integrated SQL databases, imports from file storage integration (SFTP and Amazon S3), or data from your analyses (reports, etc.) in Exponea.”

Exponea works with 4 types of data:

How to import your data

To import new data, go to Data & Assets > Imports > + Create new.

Next, you will be asked to choose which type of data you want to import. This article will now explain the basics that you need to know about imports before guiding you through the import process for each type of data.

Before you start, check the technical reference section of this guide and best practices to understand the size limits, etc.

Reimporting customers

If you reimport a customer but leave some of their properties empty, those properties will be overwritten with blank fields regardless of whether they had values before the import.

Navigating the data mapping editor

The crucial step of every import is data mapping. Here you tell Exponea to what attributes the columns in your data map and what are the identifiers. There are a number of things to keep in mind. It is important to pay attention to them as a small mistake can corrupt your data in a way that may be difficult to reverse.

Mapping columns to existing attributes

In the dropdown menu for each column name, you can map that attribute to already existing attributes. When mapped to an existing attribute, the name of the column changes the font to blue. Be sure that your imported data have the same structure and formatting as the existing data. You can also create a new attribute altogether. Just type the name and then click on the + sign as shown below.

Assigning identifiers to columns

During every import, you need to pair at least one of your columns with one of the identifiers in the yellow boxes. For example, for customers, this is the ID registered to identify each row with a customer, so Exponea knows where to import those values. However, you do not want to import such a column as a new attribute, so you need to untick the checkbox for that column.

Example

Imagine you're importing the event purchase. When importing events, you need to have a column registered containing the IDs of users so Exponea can assign each event to the correct customers. If you leave this column checked, however, Exponea will also upload it as an attribute registered of the event purchase, which is not desired.

An example of a correct event import

An example of a correct event import

Setting correct data types

You can define the correct data type for each attribute in your import by clicking on the icon next to its name. The data type affects how Exponea treats those values. When filtering data, for example, the data type will decide what type of operators will be shown by default to match that type. It also affects how the values are shown in the customer profile.

Value "28.9.1996 13:45" imported under different data types and shown in a customer profile.

Value "28.9.1996 13:45" imported under different data types and shown in a customer profile.

Each data type is explained in the Data Manager article. You can also change the data type for each attribute later in the Data Manager. However, be cautious with list values as these are processed differently when imported under an incorrect data type.

Adding a column with a static value

You can also add a new column with a static value in the import wizard by clicking on the + sign on the very right of the preview table. We recommend to always add a column with a name source and value describing the source of the data, such as "Import_11 June 2019". This can help later with troubleshooting or orientating in your data.

Importing customers

In customer imports, columns represent the customer attributes while each row stands for a single customer. The first row in the data will be taken to identify the attribute. You can change the naming, discard the whole column, or add a new column with a static value later in the data mapping step.

:one+: Select the data source

Choose your data source. Always preview your data before proceeding to make sure it is in the expected correct format.

:two+: Map columns and IDs and set correct data types

This is a crucial step explained in detail the data mapping section above. You have to assign at least one of the yellow ID tags to your columns to identify values in each row with the correct customer.

Remember that the column with IDs should not be imported and hence left unchecked.

Make sure that the correct data type is set for each column (text, number, date, etc.). Click on a column's name and select one of the existing attributes to match it with.

If you want to send emails through Exponea, your customers must have an attribute called email filled with their email addresses. Similarly, if you want to send text messages, the attribute for this must be called phone.

:three+: Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL or Exponea), you can schedule a repeated import.

:white-check-mark+: Run the import

You can check the status in the list of imports when you hit "Refresh". Big files will take some time to get imported.

You can use this CSV to try imports yourself before importing a big file. The screenshot below shows how the correct import with this data should look like.

registered;first_name;last_name;email;birthday
[email protected];John;Smith;[email protected];28.9.1996
"registered" is unchecked to use it only for pairing the data with the correct customer. The "birthday" attribute is defined in the date format. As it doesn't have the blue color, it means that such attribute doesn't exist in our data yet and will be created.

"registered" is unchecked to use it only for pairing the data with the correct customer. The "birthday" attribute is defined in the date format. As it doesn't have the blue color, it means that such attribute doesn't exist in our data yet and will be created.

Importing events

:one+: Select the event you are importing

You can name your events as you like but we recommend to follow our naming standards for the basic events.

:two+: Select the data source

Choose your data source. Always preview your data before proceeding to make sure it is in the expected correct format.

:three+: Map columns, assign customer ID tag and timestamp tag and set correct data types

This is a crucial step explained in detail the data mapping section above.

There are 2 columns required for event imports: One must contain customer IDs to pair each event with the correct customer. You need to assign the appropriate yellow tag to this column. The second required column is timestamp of each event with the blue tag assigned to it. This must be in the datetime format. If only date is provided, then Exponea will add midnight as the time to the event's timestamp.

Remember that the columns with IDs should not be imported and hence should be left unchecked.

You can have more columns with different IDs (registered, cookie, etc.). You can map all of them and Exponea will use the one available for a particular customer to pair the event with. registered hard ID is always prioritized.

Make sure that the correct data type is set for each column (text, number, date, etc.). Click on a column's name and select one of the existing attributes to match it with.

:four+: Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL or Exponea), you can schedule a repeated import.

:white-check-mark+: Run the import

You can check the status in the list of imports when you hit "Refresh". Big files will take some time to get imported.

You can use this CSV to try imports yourself before importing a big file. The screenshot below shows how the correct import with this data should look like.

registered;timestamp;purchase_id;total_price;total_quantity
[email protected];1474886373;100001;53.90;3
registered;timestamp;purchase_id;price;product_id;title;quantity;total_price
[email protected];1474886373;100001;35.30;29;Colorful bracelet;1;35.30
[email protected];1474886373;100001;9.30;47;Fancy necklace;2;18.60
"registered" is unchecked to use it only for pairing the data with the correct customer. Notice the data types set for other columns.

"registered" is unchecked to use it only for pairing the data with the correct customer. Notice the data types set for other columns.

Importing catalogs

You can work with 2 types of catalogs in Exponea - general and product catalogs.

General catalog in Exponea is basically a lookup table with one primary key (item_id). It can store any table with fixed columns and can be used for various advanced use cases.

Product catalog is designed to store the database of your products. It has some additional settings and IDs like product_id, title, size, etc. These are to ensure that the attributes are properly identified by Exponea and can be used for recommendations.

:one+: Select the catalog you are importing

You can select one of your existing catalogs to import additional / new data, or create a new catalog by typing its name and clicking on the arrow.

:two+: Select the data source

Choose your data source. Always preview your data before proceeding to make sure it is in the expected correct format.

:three+: Map columns, assign tags and set correct data types

Firstly, check "Is product catalog" at the top if you want to import a product catalog. This will display additional tags that you need to assign to your columns (if they are present in your data).

Secondly, at least one ID tag has to be assigned. There are two IDs for products: item_id and product_id.

The difference is that one product with a single product_id can have multiple variants with different item_ids. For example, you can have a product "iPhone X Case" with a product ID "123". Next, you will also have iPhone X Case black, white, green, etc. Each of the color variants will have their own item_id but the same product_id.

For general catalogs, item_id is used as the identifier.

See the Catalogs article for more information about each column.

Thirdly, set the correct data type for each column. Read more about navigating the data mapping editor in the section above in this article.

Map predefined IDs to you data structure. If you are importing a general catalog only "item_id" is available and mandatory.

Map predefined IDs to you data structure. If you are importing a general catalog only "item_id" is available and mandatory.

:four+: Set searchable fields

If you are importing a catalog for the first time, you have to choose which fields are "searchable". Searchable fields are available in Exponea search and also recommendations are based only on these columns (this means that if you want to make product recommendations based on brand for example, this column has to be set as "searchable field").

  • Searchable fields can only be defined during the first import. If you want to add more columns as searchable later, you will have to re-import the data to a new catalog.
  • Maximum of 20 columns can be set as searchable and available for recommendations.

:five+: Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL or Exponea), you can schedule a repeated import.

:white-check-mark+: Run the import

Importing vouchers

The import process for vouchers is explained in the Vouchers article.

Storage file import

Imports from file storage support all functional requirements to import data for any use-case. Importing data from files located in different storage is one of the most secure & convenient ways to automatically transfer data to Exponea. This allows:

  • imports to filter files that should be imported
  • listen to new file uploads
  • ability to not import the same file twice

Further functions:

  • remembers the state of the folder (last 1000 files that were already uploaded by this import job)
  • general regex match for any file storage to filter files that should be imported. Example: /filename[0-9]+/
  • general "new file" trigger for any file storage (Exponea checks file storage for new files every 5 minutes and imports those that were not imported yet)

Example: Automatic imports - a client has DWH that exports data to file storage on a daily basis (e.g. purchases). The exported file name will be different every day (regex match) and can be finished at a different time due to the complexity of export (new file trigger).

Repeated imports

Automatically repeated imports are available if you import from a database, URL, or Exponea analysis.

To set a repeated import, get to the last fourth step of the import process and click on "Schedule repeated import" in the top right corner. You can set daily imports to run several times per day, weekly imports to run at one time on several days and monthly imports to run at one time on several days each month. You can also specify the time range, until when you want to keep the repeated imports running.

Data sources

You can import your data from the following sources:

Copy & paste

You can directly paste your CSV data into the application.

Upload a file

The file must be in the CSV format. You can click on "Select CSV" or drag&drop the file onto the canvas.

Database

You can feed the data from one of your integrated databases using the SELECT statements. PostgreSQL, MySQL, BigQuery or Google Cloud Storage are supported.

URL

If your data is saved online, you can create a feed to its URL, e.g.: https://www.hostname.com/exports/file.csv. It is also possible to input username and password for HTTP Basic Authentication if required in the form. Password will be securely stored in Exponea and it will not be visible for other users.
When feeding a CSV file from a database through URL, Exponea uses the following headers:

  • X-Exponea-Import "sample" to encoding detection sample request
  • X-Exponea-Import "preview" to request a preview which is shown when a user clicks "preview data" in the Exponea import wizard
  • X-Exponea-Import "download" to request the download of the full file

File storage

Import CSV files from your own SFTP server or Google Cloud Storage bucket. Once you click on this option, click on Select a file storage integration and choose your SFTP or GCS (Google Cloud Storage for Imports) integration from the list. If you don't have any, click on the + button that will take you to set up a new SFTP or GCS integration.

Exponea

You can also import data from analyses that you created in Exponea in the current or a different project.

After selecting your data source, always click on "Preview data" to see if the format looks as expected. If there is an issue, check that correct delimiter and encoding is selected.

Working with CSV

To import historical data into Exponea it is best to use CSV import. Each event should be imported in an individual CSV file. Each event i.e. each row in the CSV needs to contain a customer identifier and a timestamp in the UNIX format.

Customers may be imported to Exponea via CSV

In the first row of the CSV file, we need to specify the identifiers and attributes of customers. These names do not need to match naming in Exponea (mapping of the columns onto customer identifiers/attributes in Exponea will be shown later).

Each cell can only contain a single value - e.g. a customer record often has several cookies. However, it is possible to only import 1 cookie in 1 row. To import multiple cookies we would need to have some other identifier for the customer. The situation would look as follows.

Now customer record with registered ID equal to 1, would have [cookie1, cookie2].

Now customer record with registered ID equal to 1, would have [cookie1, cookie2].

In the same way, you can also update individual properties of existing customer profiles i.e. you can only import the relevant columns with the values you wish to update.

Column names in your CSV import do not need to match names of customer attributes in Exponea but can be mapped. The following image illustrates the situation - to generate this window in Exponea go to: Under Data&Assets - imports - new import - Customers.

We can see that one of the columns in the CSV import was called ‘fn’ (first_name). We can click on the field highlighted in yellow and then select the appropriate attribute. In this case, we are assuming that ‘fn’ stands for ‘first_name’. This column will now be imported as a customer attribute first name.

Technical limitations

Data should be imported in UTF-8 encoding. File size must not be greater than 100 Mb. The file cannot contain more than 260 columns. It is recommended not to import more than approximately 1 million rows in a single file.

Working with XML

XML format supports all import types. The XML format is supported as a copy-paste, manual file upload, import from URL or in file storage (SFTP, Google Cloud Platform, S3) integration. XML is one of the most used data formats, usually when it comes to product catalogs for e-commerce. Adding XML format support will speed up the integration phase and make it cheaper to run and more secure at the same time.

Brief introduction to XML

XML stands for eXtensible Markup Language (similar to HTML). XML serves as a way to store or transport data in separate XML files. Using XML makes it easier for both producers and consumers to produce, receive, and archive any kind of information or data across different hardware or software.

XML documents are formed as element trees. An XML tree starts at a root element and branches from the root to child elements. XML documents must contain one root element that is the parent of all other elements. Moreover, in XML elements must have a closing tag "/".
For more information about XML syntax and its usage visit this online guide.

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

For instance, this XML example would be used to store data for a bookstore company.

<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

Below you can see the way Exponea would read this XML code.

Implementation guidelines

XML imports copy the existing functionality of CSV imports. XML does not have a flat structure. However, Exponea supports only flat structures in import (e.g. CSV, database connections), so the structure transformation (flattening) is needed.

XML data

This is done in the Data structure step:

  • Data structure transformation is automatic in most cases unless explicit user decision is needed. In such a case, a user needs to define Path to repeating element and transformation function (join, split to columns, split into rows, first, last or do not import) for every conflicting element.
  • All other steps in imports are the same as CSV imports.

Step-by-Step Guide:

  • 1) Go to data analysis and select imports. Then press + New Import. You will end up on this page:
  • 2) You are now at Data source - copy-paste, upload or connect to XML file from file storage integration
  • 3) Data structure - choose repeating element and resolve uncertain aggregation (optional)

4) Data Mapping = last step

Follow the informatory note on the screen. Select one ID and the update_timestamp and simply drag it with your mouse a column below. Then you will be ready to run import.

Amazon S3 integration

This feature is currently available in Beta version

Amazon S3 is one of the most used cloud file storage. You can connect Amazon S3 file storage with Exponea directly and with the support for both data imports and exports.

Enablement

Users can specify the exact region and bucket where Exponea will look for files. In both imports and exports, it includes all features - regex file lookup, new file uploaded trigger and selection rules for imports and dynamic file naming for exports.

There are multiple ways for you to access the Amazon S3 storage.

To set up the integration,
Press data analysis and select integrations (if you do not see the button integrations contact your project manager to enable this function). You will be directed to a page where you can press Add new integration at the upper right corner. At this point, you should be able to + add this integration from the list of available integrations.

To find the already set up integration when importing/exporting your files:
Simply export your data through the Data Export and in the Step 4 you can access the Amazon S3 integration by clicking at Select a File storage integration. You can choose Amazon S3 and by clicking on the + sign you will be directed to the same image like the one above this text.

Save your Access Key ID and your Secret Acess Key. When you will be importing or exporting your data, you can use the already set up Amazon S3 integration.

Use cases

  • One-time imports - Importing historical data, one-time campaign segments
  • Regular imports - Synchronization with internal tools (e.g. CRM)
  • Exports - Synchronization with internal or campaign tools (e.g. consents, events, segments)
  • Adform - Exporting customers to custom audiences in Adform (handled automatically by Exponea)

Best practices

When you import events, there are certain things to look out for so that you avoid errors/duplications and also keep your data organized.

Errors' influence on the import

Only the rows with an error will not be imported. The rest of the file is.

General

  • Split large files into multiple smaller ones and schedule large imports for off-peak times to achieve the best performance and stability. Start importing the oldest events first and in case of an error in the middle of the process, you can remove the newly added information by simply using “events expiration” in the Data manager.
  • It is a good practice to add a static attribute "source" that will tell you what import the data comes from. This will make troubleshooting and orientating in your data much easier should there be any discrepancies in the future.
  • To be able to distinguish between old and newly imported events add a new attribute (e.g. import_date=20190715). The total number of events with this attribute should match the number of lines in the imported CSV file.
  • Double-check whether all headers have a unique name, especially when importing the same event from multiple .csv files. Import wizard would treat two headers with the same name as one column and subsequently, within Exponea, as one attribute.
  • Make sure that the registered_id is consistent with the IDs of existing customers. Some lists might use automatically-generated IDs while others may use emails.
  • Use test projects/events before doing the actual imports so that you avoid accidental deletion/duplication and the subsequent need for reimporting.
  • Ensure that all purchase_item events have a purchase_id attribute, that is tracked in another purchase event so that more purchase_item events coming from a single purchase can be treated as such.

Timestamp

  • Check the timezone of the timestamp.
  • Transform all the time formats into UNIX timestamp to avoid problems with date formats. (02/04/2017 could both April and February)


Values

  • Check whether commas, dots, and spaces in the values are used correctly. When working with .csv files, one comma can move values to the wrong column. When locating and changing the values, beware that Excel dropdown filter only lists the first 10 000 rows.
  • Be aware of regional differences with various Excel settings. Excel can read the value "6.1" as price in one language setting while reading it as a date in another. When using the “Text to Columns” function you can you can specify the formatting of each column and set its decimal point to avoid this problem.
  • Lowercase and trim the most important values (like IDs) to avoid wrong identification. To easily achieve this result you can use the TRIM(LOWER(xx)) function in Excel.
  • When combining two sets of events, be sure that price attribute within the purchase event is in both sets indexed with the same VAT, shipping, discount setting. Also, check whether both sets include refunded purchases.

Re-import

  • After deleting events, do not start reimporting right away. Events and customers might look deleted in the Exponea app but they might be deleted only from the frontend while the deletion process on backend might still be in process.
  • Deletion process has low priority on the backend, so it takes longer time than import.


Technical reference

Maximum size and number of rows

Any import should not have more than 1 million rows. If your import file is bigger, we recommend splitting it into several smaller files.

The size of the file needs to be under 1GB.

Date format and time format supported

The supported date formats and time formats are explained in the Data Manager article.

Deduplication of events

Every imported or tracked event is automatically checked with all events imported / tracked in the last 3 months. If there is another 100% identical event (meaning that it has the same attributes with the same values defined as the same data types), the new event will be discarded. Note that the timestamp of the event must be identical as well, to the last decimal place.

Automatic trimming and lowercasing of IDs

This feature will trim and lowercase your IDs when they are imported or tracked to Exponea. You must enable it first in settings.

Duplicate identifiers in import source

Import sources are processed from top to bottom and in case of customers and catalogs, each row is an update operation for a record found by a given ID.

  • In the case of catalog imports, the order of rows is respected. In other words, if you import a source, which contains lines with duplicate item IDs, but different columns, in general, the last occurrence of the row will execute the final catalog update and therefore that will be the final value.

  • In the case of customer imports: the order of rows is guaranteed as long as they have exactly the same values in all ID columns. Rows using different IDs are processed in an arbitrary order, even if they belong to the same customer.

Delays in the visibility of imported data

Our import system was designed to cope with imports of large data sets in sequential order.
We optimized it for throughput instead of reactivity. For this reason, users can sometimes see delays between seeing a finished import in UI and seeing actually imported data.
Here is a very rough description of how does our import system work.

Imports from all projects use the same queue. This means that many large imports in one project can delay imports in other projects. When you create an import, it is first scheduled into the queue and it will start once there is an available worker. In general, this scheduled phase should take only a few seconds, but if there are many parallel imports (e.g. dozens) in other projects, new import has to wait until the previous ones finish. Once the import starts, in UI you will see an import progress bar, which indicates the loading of your source into our system. Loading phase time depends on the size of the import and in case of external sources (e.g. databases) also on connectivity. After the import loading phase finishes, it still needs to be further processed by our system. The processing phase is not yet visible in UI. Smaller imports are processed in minutes, but the processing of your import depends on how many other imports are being processed at the same time. After the processing phase finishes, your data are available in the system.

Timeouts when using HTTP(S) as a source

In order to prevent the situation, when the server doesn't respond, but the import is still in progress and therefore blocks the workers' slots, there is a set timeout.
The first part of the timeout is connect timeout and it is set to 30 seconds. It is the time to open the socket to the server (TCP/SSL). Therefore, if the server is not available, we drop the import.
The second part of the timeout is read timeout. If the socket was opened, this is the time to get the first bytes of the HTTP response to an HTTP request from a remote webserver. It set to 3600 seconds. If that doesn't happen in the set time, the import is canceled.

Updated about 23 hours ago

Imports


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.