It is possible to import data to your project from sources outside of Exponea. These can be CSV or XML files (uploaded through the import wizard or through a URL feed), integrated SQL databases, imports from file storage integration (Google Cloud Storage, SFTP and Amazon S3), or data from your analyses (reports, etc.) in Exponea.”

Exponea works with 4 types of data:

Watch this short introductory video about this feature:

Before you start, check the technical reference section of this guide and best practices to understand the size limits, etc.

To import new data, go to Data & Assets > Imports > + Create new.

Next, you will be asked to choose which type of data you want to import. This article will now explain the basics that you need to know about imports while guiding you through the import process for each type of data.

Importing customers

In customer imports, columns represent the customer attributes while each row stands for a single customer. The first row in the data will be taken to identify the attribute. You can change the naming, discard the whole column, or add a new column with a static value later in the data mapping step.

In the row below, you can see the four steps that you will have to go throw when making data imports.

Data source

In this step, you will have to choose your data source. You can import your data from the following sources:

Copy & paste

You can directly paste your data into the application.

Upload a file

The file must be in the CSV or XML format. You can select a File format or drag&drop the file onto the canvas.

Database

You can feed the data from one of your integrated databases using the SELECT statements. The most commonly used databases are supported.

URL

If your data is saved online, you can create a feed to its URL, e.g.: https://www.hostname.com/exports/file.csv. It is also possible to input username and password for HTTP Basic Authentication if required in the form. Password will be securely stored in Exponea and it will not be visible for other users.
When feeding a CSV or XML file from a database through URL, Exponea uses the following headers:

  • X-Exponea-Import-Phase "sample" to encoding detection sample request
  • X-Exponea-Import-Phase "preview" to request a preview which is shown when a user clicks "preview data" in the Exponea import wizard
  • X-Exponea-Import-Phase "download" to request the download of the full file

File storage

Import files from your own SFTP server or Google Cloud Storage bucket. Once you click on this option, click on Select a file storage integration and choose your SFTP or GCS (Google Cloud Storage for Imports and Exports) integration from the list. If you don't have any, click on the + button that will take you to set up a new SFTP or GCS integration.

Exponea

You can also import data from analyses that you created in Exponea in the current or a different project.

🚧

After selecting your data source, always click on "Preview data" to see if the format looks as expected. If there is an issue, check that correct delimiter and encoding is selected.

Imports from file storage

Imports from file storage support all functional requirements to import data for any use-case. Importing data from files located in different storage is one of the most secure & convenient ways to automatically transfer data to Exponea. This allows:

  • imports to filter files that should be imported
  • listen to new file uploads
  • ability to not import the same file twice

Further functions:

  • remembers the state of the folder
  • Exponea remembers all files already uploaded per active import. When you delete a whole import, all remembered files are also wiped.
  • general regex match for any file storage to filter files that should be imported. Example: /filename[0-9]+/
  • general "new file" trigger for any file storage (Exponea checks file storage for new files every 5 minutes and imports those that were not imported yet)

File storage limitations

  • Scheduling file imports with selection rule All and execution rule New file uploaded is not allowed.
  • Scheduling file imports with selection rule All or Not imported yet has a hard limit of 200 files. Imports attempting to schedule over 200 files will not be allowed to be saved and started.

📘

Example

Automatic imports - you have a DWH that exports data to file storage on a daily basis (e.g. purchases). The exported file name will be different every day (regex match) and can be finished at a different time due to the complexity of export (new file trigger).

Selection rule

Once you fill in the pre-mapping step for file storage integrations (Amazon S3, SFTP & Google Storage), you will be able to choose the Selection rule.
It applies if there are more files on the storage matching file search:

  • All- all matched files will be imported (if already imported files are not removed , it will cause repeated imports)
  • Last modified - only 1 last modified file will be imported even when there are multiple files matching the filter
  • Last alphabetically - only 1 last file when sorted alphabetically will be imported even when there are multiple files matching the filter
  • Not imported yet - all newly discovered files will be imported, Exponea remembers all files already uploaded per active import. When you delete a whole import, all remembered files are also wiped.

Data mapping

In the dropdown menu for each column name, you can map that attribute to already existing attributes. When mapped to an existing attribute, the name of the column changes the font to blue. Be sure that your imported data have the same structure and formatting as the existing data. You can also create a new attribute altogether. Just type the name and then click on the + sign.

During every import, you need to pair at least one of your columns with one of the identifiers in the yellow boxes. For example, for customers, this is the ID registered to identify each row with a customer, so Exponea knows where to import those values. However, you do not want to import such a column as a new attribute, so you need to untick the checkbox for that column.

Make sure that the correct data type is set for each column (text, number, date, etc.). Click on a column's name and select one of the existing attributes to match it with. The data type affects how Exponea treats those values. When filtering data, for example, the data type will decide what type of operators will be shown by default to match that type. It also affects how the values are shown in the customer profile.

Each data type is explained in the Data Manager article. You can also change the data type for each attribute later in the Data Manager. However, be cautious with list values as these are processed differently when imported under an incorrect data type.

Value "28.9.1996 13:45" imported under different data types and shown in a customer profile.

You can also add a new column with a static value in the import wizard by clicking on the + sign on the very right of the preview table. We recommend to always add a column with a name source and value describing the source of the data, such as "Import_11 June 2019". This can help later with troubleshooting or orientating in your data.

🚧

If you want to send emails through Exponea, your customers must have an attribute called email filled with their email addresses. Similarly, if you want to send text messages, the attribute for this must be called phone.

Set repeated import (optional)

If you're uploading from a dynamic source (File Storage, Database, URL or Exponea), you can schedule a repeated import. Automatically repeated imports are available if you import from a database, URL, or Exponea analysis.

To set a repeated import, get to the last fourth step of the import process and click on "Schedule repeated import" in the top right corner. You can set daily imports to run several times per day, weekly imports to run at one time on several days and monthly imports to run at one time on several days each month. You can also specify the time range, until when you want to keep the repeated imports running.

:white-check-mark: Run the import

You can check the status in the list of imports when you hit "Refresh". Large files will take some time to get imported.

You can use this CSV to try imports yourself before importing a big file. The screenshot below shows how the correct import with this data should look like.

registered;first_name;last_name;email;birthday
[email protected];John;Smith;[email protected];28.9.1996

"registered" is unchecked to use it only for pairing the data with the correct customer. The "birthday" attribute is defined in the date format. As it doesn't have the blue color, it means that such attribute doesn't exist in our data yet and will be created.

🚧

Customer ID error

"No valid customer ID" error occurs either when the customer is missing ID or when it is longer than 256 bytes. To resolve this, check customer IDs in each row in the document from which you are importing the customers.

🚧

Reimporting customers

If you reimport a customer but leave some of their properties empty, those properties will be overwritten with blank fields regardless of whether they had values before the import.

Importing events

You can name your events as you like but we recommend to follow our naming standards for the basic events.

Data source

Choose your data source. This step works the same way as data source selection in Customer Imports. Always preview your data before proceeding to make sure it is in the expected correct format.

Data mapping

Here, you can map columns, assign customer ID tag and timestamp tag and set correct data types

This is a crucial step explained in detail the customers´ data mapping section above.

There are 2 columns required for event imports: One must contain customer IDs to pair each event with the correct customer. You need to assign the appropriate yellow tag to this column. The second required column is timestamp of each event with the blue tag assigned to it. This must be in the datetime (eg: "2020-08-05T10:22:00") or numeric (eg: number of seconds since 1970 would be written as "1596615705". Note that this works only in seconds) format. If only a date is provided, then Exponea will add midnight as the time to the event's timestamp.

Remember that the columns with IDs should not be imported and hence should be left unchecked.

📘

You can have more columns with different IDs (registered, cookie, etc.). You can map all of them and Exponea will use the one available for a particular customer to pair the event with. registered hard ID is always prioritized.

Make sure that the correct data type is set for each column (text, number, date, etc.). Click on a column's name and select one of the existing attributes to match it with.

Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL or Exponea), you can schedule a repeated import.

:white-check-mark: Run the import

You can check the status in the list of imports when you hit "Refresh". Big files will take some time to get imported.

You can use this CSV to try to import yourself before importing a big file. The screenshot below shows how the correct import with this data should look like.

registered;timestamp;purchase_id;total_price;total_quantity
[email protected];1474886373;100001;53.90;3
registered;timestamp;purchase_id;price;product_id;title;quantity;total_price
[email protected];1474886373;100001;35.30;29;Colorful bracelet;1;35.30
[email protected];1474886373;100001;9.30;47;Fancy necklace;2;18.60

"registered" is unchecked to use it only for pairing the data with the correct customer. Notice the data types set for other columns.

🚧

Imported events cannot be selectively altered or deleted.

Importing catalogs

You can work with 2 types of catalogs in Exponea: general and product catalogs.

General catalog in Exponea is basically a lookup table with one primary key (item_id). It can store any table with fixed columns and can be used for various advanced use cases.

Product catalog is designed to store the database of your products. It has some additional settings and IDs like product_id, title, size, etc. These are to ensure that the attributes are properly identified by Exponea and can be used for recommendations.

You can select one of your existing catalogs to import additional / new data, or create a new catalog by typing its name and clicking on the arrow.

Data source

Choose your data source. This step works the same way as data source selection in Customer Imports. Always preview your data before proceeding to make sure it is in the expected correct format.

Data mapping

Firstly, check "Is product catalog" at the top if you want to import a product catalog. This will display additional tags that you need to assign to your columns (if they are present in your data).

Secondly, at least one ID tag has to be assigned. There are two IDs for products: item_id and product_id.

The difference is that one product with a single product_id can have multiple variants with different item_ids. For example, you can have a product "iPhone X Case" with a product ID "123". Next, you will also have iPhone X Case black, white, green, etc. Each of the color variants will have their own item_id but the same product_id.

For general catalogs, item_id is used as the identifier.

See the Catalogs article for more information about each column.

Thirdly, set the correct data type for each column. Read more about navigating the customer data mapping editor in the section above in this article.

Map predefined IDs to you data structure. If you are importing a general catalog only "item_id" is available and mandatory.

Set searchable fields

If you are importing a catalog for the first time, you have to choose which fields are "searchable". Searchable fields are available in Exponea search and also recommendations are based only on these columns (this means that if you want to make product recommendations based on brand for example, this column has to be set as "searchable field").

🚧

  • Searchable fields can only be defined during the first import. If you want to add more columns as searchable later, you will have to re-import the data to a new catalog.
  • Maximum of 20 columns can be set as searchable and available for recommendations.

Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL or Exponea), you can schedule a repeated import.

:white-check-mark: Run the import

🚧

Maximum length of catalog item value

The maximum length of the catalog item value is 28672 bytes. If you exceed this length you need to either shorten the value. It is possible to split the longer property into two short ones.

Importing vouchers

The import process for vouchers is explained in the Vouchers article.

Additional resources

Working with CSV

To import historical data into Exponea it is best to use CSV import. Each event should be imported in an individual CSV file. Each event i.e. each row in the CSV needs to contain a customer identifier and a timestamp in the UNIX format.

Customers may be imported to Exponea via CSV

In the first row of the CSV file, we need to specify the identifiers and attributes of customers. These names do not need to match naming in Exponea (mapping of the columns onto customer identifiers/attributes in Exponea will be shown later).

Each cell can only contain a single value - e.g. a customer record often has several cookies. However, it is possible to only import 1 cookie in 1 row. To import multiple cookies we would need to have some other identifier for the customer. The situation would look as follows.

Now customer record with registered ID equal to 1, would have [cookie1, cookie2].

In the same way, you can also update individual properties of existing customer profiles i.e. you can only import the relevant columns with the values you wish to update.

Column names in your CSV import do not need to match names of customer attributes in Exponea but can be mapped. The following image illustrates the situation - to generate this window in Exponea go to: Under Data&Assets - imports - new import - Customers.

We can see that one of the columns in the CSV import was called ‘fn’ (first_name). We can click on the field highlighted in yellow and then select the appropriate attribute. In this case, we are assuming that ‘fn’ stands for ‘first_name’. This column will now be imported as a customer attribute first name.

📘

Technical limitations

Data should be imported in UTF-8 encoding. File size must not be greater than 1 GB. The file cannot contain more than 260 columns. It is recommended not to import more than approximately 1 million rows in a single file.

CSV data source requirements:

  • Provide HTTPS endpoint with SSL certificate signed by one of Trusted CAs (trusted CA list)
  • Authenticates requests with HTTP authorization header (HTTP Basic Auth described in RFC 2617)
  • Output CSV follows specification of CSV format (as described in RFC 4180)
  • Encoding: Allowed encodings are 'utf-8', 'utf-16', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'latin1', 'iso-8859-2'
  • Delimiter: Allowed delimiters are ' ', '\t', ',', '.', ';', ':', '-', '|', '~'
  • Escape: double quote "
  • New line: CR LF (\r\n)
  • File size limit 100 MB
  • Transfer speed allowing Exponea to download the file in less than 1 hour

Working with XML

XML format supports all import types. The XML format is supported as a copy-paste, manual file upload, import from URL or in file storage (SFTP, Google Cloud Platform, S3) integration. XML is one of the most used data formats, usually when it comes to product catalogs for e-commerce. Adding XML format support will speed up the integration phase and make it cheaper to run and more secure at the same time.

Brief introduction to XML

XML stands for eXtensible Markup Language (similar to HTML). XML serves as a way to store or transport data in separate XML files. Using XML makes it easier for both producers and consumers to produce, receive, and archive any kind of information or data across different hardware or software.

XML documents are formed as element trees. An XML tree starts at a root element and branches from the root to child elements. XML documents must contain one root element that is the parent of all other elements. Moreover, in XML elements must have a closing tag "/".
For more information about XML syntax and its usage visit this online guide.

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

For instance, this XML example would be used to store data for a bookstore company.

<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

Below you can see the way Exponea would read this XML code.

XML imports copy the existing functionality of CSV imports. XML does not have a flat structure. However, Exponea supports only flat structures in import (e.g. CSV, database connections), so the structure transformation (flattening) is needed.

Amazon S3 integration

Amazon S3 is one of the most used cloud file storage. You can connect Amazon S3 file storage with Exponea directly and with the support for both data imports and exports. Importing from Amazon S3 works Amazon S3 REST API, which we call using HTTP protocol

Enablement

Users can specify the exact region and bucket where Exponea will look for files. In both imports and exports, it includes all features - regex file lookup, new file uploaded trigger and selection rules for imports and dynamic file naming for exports.

To set up the integration,
Press data analysis and select integrations (if you do not see the button integrations contact your project manager to enable this function). You will be directed to a page where you can press Add new integration at the upper right corner. At this point, you should be able to + add this integration from the list of available integrations.

In order to complete your integration, you need to generate Access Tokens through Amazons AWS. Each user will have its own access level. This is a configuration within AWS itself. To set up the integration, you need to select either "list objects", "read only", or "read all" access level. You have to connect those rights to the Access Token, which you use in the AWS integration to ensure that imports from Amazon S3 work correctly.

To find the already set up integration when importing/exporting your files:
Simply export your data through the Data Export and in Step 4 you can access the Amazon S3 integration by clicking at Select a File storage integration. You can choose Amazon S3 and by clicking on the + sign you will be directed to the same image as the one above this text.

Save your Access Key ID and your Secret Acess Key. When you will be importing or exporting your data, you can use the already set up Amazon S3 integration.

Use cases

  • One-time imports - Importing historical data, one-time campaign segments
  • Regular imports - Synchronization with internal tools (e.g. CRM)
  • Exports - Synchronization with internal or campaign tools (e.g. consents, events, segments)
  • Adform - Exporting customers to custom audiences in Adform (handled automatically by Exponea)

Best practices

When you import events, there are certain things to look out for so that you avoid errors/duplications and also keep your data organized.

🚧

Errors' influence on the import

Only the rows with an error will not be imported. The rest of the file is.

General

  • Split large files into multiple smaller ones and schedule large imports for off-peak times to achieve the best performance and stability. Start importing the oldest events first and in case of an error in the middle of the process, you can remove the newly added information by simply using “events expiration” in the Data manager.
  • It is a good practice to add a static attribute "source" that will tell you what import the data comes from. This will make troubleshooting and orienting in your data much easier should there be any discrepancies in the future.
  • To be able to distinguish between old and newly imported events add a new attribute (e.g. import_date=20190715). The total number of events with this attribute should match the number of lines in the imported CSV file.
  • Double-check whether all headers have a unique name, especially when importing the same event from multiple .csv files. Import wizard would treat two headers with the same name as one column and subsequently, within Exponea, as one attribute.
  • Make sure that the registered_id is consistent with the IDs of existing customers. Some lists might use automatically-generated IDs while others may use emails.
  • Use test projects/events before doing the actual imports so that you avoid accidental deletion/duplication and the subsequent need for reimporting.
  • Ensure that all purchase_item events have a purchase_id attribute, that is tracked in another purchase event so that more purchase_item events coming from a single purchase can be treated as such.

Timestamp

  • Check the timezone of the timestamp.
  • Transform all the time formats into UNIX timestamp which stands for the number of seconds since 1.1.1970 00:00:00 UTC. Using a UNIX timestamp avoids problems with different time formats where, for example, 02/04/2017 could be both April and February.

Values

  • Check whether commas, dots, and spaces in the values are used correctly. When working with .csv files, one comma can move values to the wrong column. When locating and changing the values, beware that the Excel dropdown filter only lists the first 10 000 rows.
  • Be aware of regional differences with various Excel settings. Excel can read the value "6.1" as price in one language setting while reading it as a date in another. When using the “Text to Columns” function you can you can specify the formatting of each column and set its decimal point to avoid this problem.
  • Lowercase and trim the most important values (like IDs) to avoid wrong identification. To easily achieve this result you can use the TRIM(LOWER(xx)) function in Excel.
  • When combining two sets of events, be sure that price attribute within the purchase event is in both sets indexed with the same VAT, shipping, discount setting. Also, check whether both sets include refunded purchases.

Re-import

  • After deleting events, do not start reimporting right away. Events and customers might look deleted in the Exponea app but they might be deleted only from the frontend while the deletion process on backend might still be in process.
  • Deletion process has low priority on the backend, so it takes a longer time than import.

Technical reference

Maximum size and number of rows

Any import should not have more than 1 million rows. If your import file is bigger, we recommend splitting it into several smaller files.

The size of the file needs to be under 1GB.

Date format and time format supported

The supported date formats and time formats are explained in the Data Manager article.

Deduplication of events

Every imported or tracked event is automatically checked with all events imported/tracked in the last 3 months based on the time of that import/tracking. If there is another 100% identical event (meaning that it has the same attributes with the same values defined as the same data types), the new event will be discarded.

In order for deduplication to work, the given events must be completely identical. This means that the event names, attributes, and timestamp (to the last decimal place) of the events must be identical and that there cannot be discrepancies because of varying import sources. Such a discrepancy could arise, for example, if the same event was formatted as a string in one source and as a number in the other.

Automatic trimming and lowercasing of IDs

This feature will trim and lowercase your IDs when they are imported or tracked to Exponea. You must enable it first in settings.

Duplicate identifiers in the import source

Import sources are processed from top to bottom and in case of customers and catalogs, each row is an update operation for a record found by a given ID.

  • In the case of catalog imports, the order of rows is respected. In other words, if you import a source, which contains lines with duplicate item IDs, but different columns, in general, the last occurrence of the row will execute the final catalog update and therefore that will be the final value.

  • In the case of customer imports: the order of rows is guaranteed as long as they have exactly the same values in all ID columns. Rows using different IDs are processed in an arbitrary order, even if they belong to the same customer.

Delays in the visibility of imported data

Our import system was designed to cope with imports of large data sets in sequential order.
We optimized it for throughput instead of reactivity. For this reason, users can sometimes see delays between seeing a finished import in UI and seeing actually imported data.

Here is a simplified description of how our import system works:

Imports from all projects use the same queue. This means that many large imports in one project can delay imports in other projects. When you create an import, it is first scheduled into the queue and it will start once there is an available worker. In general, this scheduled phase should take only a few seconds, but if there are many parallel imports (e.g. dozens) in other projects, new import has to wait until the previous ones finish. Once the import starts, in UI you will see an import progress bar, which indicates the loading of your source into our system. Loading phase time depends on the size of the import and in case of external sources (e.g. databases) also on connectivity. After the import loading phase finishes, it still needs to be further processed by our system. The processing phase is not yet visible in UI. Smaller imports are processed in minutes, but the processing of your import depends on how many other imports are being processed at the same time. After the processing phase finishes, your data are available in the system.

Timeouts when using HTTP(S) as a source

In order to prevent the situation, when the server doesn't respond, but the import is still in progress and therefore blocks the workers' slots, there is a set timeout.
The first part of the timeout is connect timeout and it is set to 30 seconds. It is time to open the socket to the server (TCP/SSL). Therefore, if the server is not available, we drop the import.
The second part of the timeout is read timeout. If the socket was opened, this is the time to get the first bytes of the HTTP response to an HTTP request from a remote webserver. It set to 3600 seconds. If that doesn't happen in the set time, the import is canceled.

Updated 10 days ago


Imports


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.


We rely on cookies

to optimize our communication and to enhance your customer experience. By clicking on the Accept and Close button, you agree to the collection of cookies. You can also adjust your preferences by clicking on Manage Preferences. For more information please see our Privacy policy.

Manage cookies
Accept & close

Cookies preferences

Accept & close
Back