EN
Search…
Data preparation: what you need to know to get the best out of crystal
crystal is here to help you process your data and get insights out of it to make successful business decisions.
In order to do so, she needs a data source from which to retrieve the information wanted and, to get the best out of her performance, this data needs to be prepared according to certain guidelines.
The higher the data and structure quality is, the better results and impact crystal can bring.
We recommend you to follow our advice to get the best out of your crystal experience.
But before we dive into specific details, let’s see (or review) some basic concepts as an introduction to the perfect crystal experience.
In this article we will see:
  1. 1.
    crystal foundations: entities, aggregation, aliases, objectives and topics
  2. 2.
    Data preparation outside crystal
  3. 3.
    Data manipulation inside crystal

Entities are the core of crystal’s functioning: they are essentially the content that you allow the members to explore through the advisor conversation.
There are two types of entities: the main entities and the filter entities.
A main entity is the subject the insight focuses on, according to the objective you set for the analysis. For example, if the question for the advisor would be: “Can you show me total of sales?”, the main focus of this request would be “sales”.
A filter entity is the dimension that you can associate with the main entity to drill down, filter, or deepen your analysis. For example, for our question “Show me total of sales”, a filter entity would be the value of the "country" entity, like “Show me total of sales for Germany”, or a year, like “Show me total of sales in 2019”.
A filter entity should not be confused with a filter value: for example, in our case, a filter entity might be “Country”, which would retrieve the sales for a single country, while a filter value is a specific item in the entity’s data column. In our case, “Germany” would be a specific value for the entity “Country”.
Here’s another example:
“Show me the revenues (main entity) for Italy (filter value) in 2019 (filter value)”.

The aggregation in crystal is based on SQL aggregation functions and it represents how the different values of a main entity can be put together in a specific topic, meaning that the same entity can be aggregated in different ways.
For example, if we have the entity “Sales”, we can either sum the different values of the entity, count them or calculate the average.
If we want to show the user how much money we made, then we would use Sum, if we want to count how many sales transactions we had, we would go for Count and if we want to represent the average sale we would use Average.

Aliases are very important for crystal, because they enable the flexible conversational experience. The same entity can be in fact associated with multiple words: “sales” can be also called “revenues”, “turnover”, and so on. Another example: if the entity is “items” words like “units” or “articles” might also be used to refer to it.
These synonyms associated with the same entity are called aliases and they can be used in the conversation to refer to the same entity.

An objective is essentially the main purpose of the topic. Each objective can be represented through one or more visualizations that will be chosen by the admins during the phase of topic creation. Each objective has a number of consistent visualizations, that can be charts or other formats (such as ranking tables and lists).
For example, the comparison objective shows the similarity or dissimilarity between different items or facts, like in the question: “Can you compare items sold versus budget by country?” The over time objective, instead, shows how data changed over a specific time period, as in: “Can you show me the sales over time by month for the last year?"

A topic is an insight on your data that you can get from crystal. It is the combination of one or multiple main entities that can be filtered for one or more filter entities that fulfill an objective and that are surfaced to the user through a visual or textual representation.

crystal’s conversation is driven mainly by the following factors:
  • Main entity
  • Objective
  • Filter entity
A good rule of thumb is the following:
  • Show me the objective for the main entity in filter value.
    • For example: “Show me the trend for sales in 2021”.
  • Show me the main entity objective in filter value.
    • For example: “Show me sales over time in Germany”.
You can find all the options and guidelines to follow for the conversation here.
Consequently, it is very important to prepare the data to create topics that have a natural way of requesting them.
Each objective has one or multiple visualizations that best fits the need. Find more details on the visualization here.

In order to display your data correctly, crystal will follow these data formatting conventions and rules.

The data must be formatted with a certain logic to be properly consumed by crystal. Check out this article to know more.

crystal works with Text fields. It supports most of the commonly used categorical types. Here you can find the complete list of supported data sources. Depending on the connected data source, you could need to transform some of your dataset columns.
For example
Scenario
Query
VARCHAR type
CAST ( column AS varchar [ ( 120 ) ] )
Mind that this is just an example and it needs to be adapted, based on the exact case and technology used.

We often encounter qualitative fields, such as product description, address and others, containing special characters that are not properly encoded as Unicode.
To clean these data types, it is more convenient to use the regular function expression.
How to handle
Scenario
Query
publicitÖÖÖÖ
regexp_replace('publicitÖÖÖÖ','[Ö�]+|√†!','à','g');

Empty values
crystal expects empty values to be represented as NULL.
How to handle
Scenario
Query
““
nullif(column, "")
“None”
nullif(column, "None")

The floating numeric from customer data source should follow the standard format that uses the comma as thousand separators (e.g.1,200 for one thousand two hundred) and the dot as decimal separator (e.g. 1.2 for one and two).
  • 1.234 = one and two hundred thirty-four thousandths
  • 1,234 = one thousand two hundred thirty-four
However it can happen quite often to find numeric data that violates these conventions.
How to handle
Scenario
Query
Result
We just need to read it correctly from text type and save it as numeric type.
to_number('1,234,567.009678' , '999G999G999G999D999999'::text)
1234567.009678
Negative sign is at the end of the number.
to_number('123,456,789.01-', '999999999G999D999S'::text);
-123456789.01
In the following example, you can find some common scenarios and how to fix them.
  • If our data already complies with the common standard (dot for decimal and comma for thousands separators):
Scenario
Query
Result
We just need to read it correctly from text type and save it as numeric type.
to_number('1,234,567.009678' , '999G999G999G999D999999'::text)
1234567.009678
Negative sign is at the end of the number.
to_number('123,456,789.01-', '999999999G999D999S'::text);
-123456789.01
  • If our data does NOT comply with the common standard:
Scenario
Query
Result
It uses dot for thousands and comma for decimal separators (eg. 1.200 and 1,2).
NULLIF(replace(replace('5.295,22' , '.'::text, ''::text), ','::text, '.'::text), ''::text)::numeric
5295.22
It uses comma for decimal separator (1,2) and nothing for thousands separator (1200).
NULLIF(replace('2271,02', ','::text, '.'::text), ''::text)::numeric
2271.02

At the moment crystal is not supporting complex fields like “Json objects“ for Postgres or “LOB“ for Oracle. If in a table such field types are present, crystal will not be able to use them in the topic configuration process, nor will she be able to preview the related data.
Note that, for Oracle, all the data types described here are not supported.

Before we start to dive into the data preparation details, take a look at this article to know more about crystal’s limitations.

The topic configuration is a key process for the success of the crystal project, that’s why it is extremely important to provide a clean data structure by following some guidelines.

A data source is mainly composed of tables that are also made of columns.
Main entities can be configured by using a simple rule of thumb:
One column = one main entity
This is the suggested and recommended approach for most cases. Each column must contain the values or content of a single entity.
For example, let’s consider that we have to create a topic whose main entity is “Sales” and this is our data:
Product
City
Channel
Sales
Chargebacks
Product 1
Milan
Online
$20,56
$0
Product 2
London
Store
$50,36
$0
Product 1
Turin
Online
$97,76
$0
Product 1
New York
Store
$190,12
$0
In this case Sales will be our main entity. Other main entities, such as the chargebacks in our example, can be configured on the related column.
A crucially important factor to keep in mind is that entities’ aliases MUST be unique , therefore we will not be able to name another main entity within crystal “Sales”.
Also note that:
  • If, conceptually, there are multiple columns that refer to generic “sales”, they need to be centralized in the same column and in the same table to enable the entity “sales” to be unique in the conversation and allow the system to identify the source field.
  • if there are multiple sales columns that conceptually have differences (e.g. actual sales & budget sales), they can be different columns and can have different alias names (such as “actual sales“ and “budget sales”).
There is a possible alternative approach that can be used in peculiar cases. Let’s take a table like this as an example:
Product
Indicators
Value
Product 1
Revenues
$0
Product 1
Costs
$0
Product 1
Inventory
$0
In this case, we can use the “value” column as the main entity and the “indicator” column as a filter entity. Nonetheless, this approach is solid when the values within the column “Value” can be logically aggregated with one of the supported functions: otherwise there is the risk of showing inconsistent data to the final user.
Moreover, the conversation structure will be directly affected, as the main entity, and therefore the focus of the topic, will be “value”, rather than the values of the filter dimension (such as, for example, revenues or costs).

If a conceptual dimension (for example, “product”) is present in multiple tables (like, 3), and therefore in multiple columns (also 3), and it has to be used as a filter for topics configured on the three objects, there are two options:
  • We could configure two different entities (for example, “Sales”and “Sales target“) and potentially trigger the disambiguation feature, in the case the request to the advisor is partial or incomplete (for example, by specifying only "sales").
  • Or, we could centralize the filter column in a single object to configure the entity in a single column.
Let’s see an example.
If we assume that “product” is the only filter dimension that we want to share between the two tables, we will have to create a master table and two “data” tables as follows, linked by a key that we will use to join the two tables.
Master table
ID
ProductID
Price
34
2
12
85
6
15
45
3
23
Table 1: Sales targets
ID
ProductID
Target
23
2
45,000
54
6
125,000
57
3
10,000
Table 2: Products
The “Name” column would become the Product filter entity.
ID
Name
2
Apples
6
Pears
3
Bananas

Assuming that Product is the only filter dimension that we want to share between the two tables, we can directly create a single table containing all the filters and main entities.
The driver for this decision would be the granularity of the data and the amount of entities that share the same filters.
For example:
Master table: sales
ID
Product
Price
34
Apples
12
85
Pears
15
45
Bananas
23
Table 1: Sales targets
ID
Product
Target
23
Apples
45,000
54
Pears
125,000
57
Bananas
10,000
Table 2: Products
ID
Product
Price
Target
1
Apples
12
45,000
2
Pears
15
125,000
3
Bananas
23
10,000

Calculated columns are customized columns created dynamically through operations. In the below case, for example, we have the total column which is generated by the system by multiplying price per quantity. At the moment crystal does not support the creation of calculated columns.
ID
ProductID
Price
Quantity
Total
34
34
5
3
15
85
85
10
5
50
45
45
20
6
120

Join can be performed on a single key, meaning that you cannot use multiple fields to join two tables. Moreover, join operations can be performed only if the objects reside in the same data source (such as, database connection).
The joined tables can be further joined to a maximum of 5.
The performance of crystal’s answers depends on the number of the join operations and the performance of the underlying data source (for example, indexing and other database optimization techniques can be performed to speed up the data retrieval and enhance crystal answering performance).
The supported joins are:
  • Inner
  • Left
  • Right

You can sort the final result by any column of the connected data, choosing ascending or descending order.

In crystal, static filters correspond to the SQL where clause. It is possible to apply static filtering to the connected data: static filters are applied by default to a topic and they are not to be confused with dynamic filters, applied in the filter step of the topic configuration flow.
You can apply multiple static filters to a topic with a specific operation. For example, you can select “category” as a static filter and say that you want to filter category values that EQUAL “gadget” (meaning that only the “gadget” objects will be taken into account in the conversation), or the “price” as a static filter with operation “greater than” and value 100, and your topic will only retrieve those values which are above 100.
The available options are:
  • Equal
  • Different
  • Empty
  • Not Empty
  • Greater Than
  • Less Than
  • Greater Than Or Equal To
  • Less Than Or Equal To
  • Between

You can choose among the following aggregation functions:
  • Count
  • Count distinct
  • Sum
  • Average
We hope this article will help you to prepare and format your data at its best, to give you the most complete and successful crystal experience.
Do you want to know more about topic creation? Check out our Tutorials for admins section on Resources!
If you have any questions about crystal’s features, you have encountered a problem or you would like to share your feedback, contact us using this form.
Copy link
On this page
1. crystal foundations: entities, aggregation, aliases, objectives and topics
Entities
Aggregation
Aliases
Objectives
Topics
How to ask questions to crystal
Data formatting
Complex fields types
Product limitations
2. Data preparation outside crystal
The main entity
The filter entity
Centralize all entities
Calculated columns
3. Data manipulation inside crystal
Joins
Sorting
Static filtering
Aggregation functions