Data preparation: what you need to know to get the best out of Crystal
crystal is here to help you process your data and get insights out of it to make successful business decisions.
In order to do so, she needs a data source from which to retrieve the information wanted and, to get the best out of her performance, this data needs to be prepared according to certain guidelines.
The higher the data and structure quality is, the better results and impact crystal can bring.
We recommend you to follow our advice to get the best out of your crystal experience.
But before we dive into specific details, let’s see (or review) some basic concepts as an introduction to the perfect crystal experience.
In this article we will see:
- 1.crystal foundations: entities, aggregation, aliases, objectives and topics
- 2.Data preparation outside crystal
- 3.Data manipulation inside crystal
There are two types of entities: the main entities and the filter entities.
A main entity is the subject the insight focuses on, according to the objective you set for the analysis. For example, if the question for the advisor would be: “Can you show me total of sales?”, the main focus of this request would be “sales”.
A filter entity is the dimension that you can associate with the main entity to drill down, filter, or deepen your analysis. For example, for our question “Show me total of sales”, a filter entity would be the value of the "country" entity, like “Show me total of sales for Germany”, or a year, like “Show me total of sales in 2019”.
A filter entity should not be confused with a filter value: for example, in our case, a filter entity might be “Country”, which would retrieve the sales for a single country, while a filter value is a specific item in the entity’s data column. In our case, “Germany” would be a specific value for the entity “Country”.
Here’s another example:
“Show me the revenues (main entity) for Italy (filter value) in 2019 (filter value)”.
The aggregation in crystal is based on SQL aggregation functions and it represents how the different values of a main entity can be put together in a specific topic, meaning that the same entity can be aggregated in different ways.
For example, if we have the entity “Sales”, we can either sum the different values of the entity, count them or calculate the average.
If we want to show the user how much money we made, then we would use Sum, if we want to count how many sales transactions we had, we would go for Count and if we want to represent the average sale we would use Average.
Aliases are very important for crystal, because they enable the flexible conversational experience. The same entity can be in fact associated with multiple words: “sales” can be also called “revenues”, “turnover”, and so on. Another example: if the entity is “items” words like “units” or “articles” might also be used to refer to it.
These synonyms associated with the same entity are called aliases and they can be used in the conversation to refer to the same entity.
An objective is essentially the main purpose of the topic. Each objective can be represented through one or more visualizations that will be chosen by the admins during the phase of topic creation. Each objective has a number of consistent visualizations, that can be charts or other formats (such as ranking tables and lists).
For example, the comparison objective shows the similarity or dissimilarity between different items or facts, like in the question: “Can you compare items sold versus budget by country?” The over time objective, instead, shows how data changed over a specific time period, as in: “Can you show me the sales over time by month for the last year?"
A topic is an insight on your data that you can get from crystal. It is the combination of one or multiple main entities that can be filtered for one or more filter entities that fulfill an objective and that are surfaced to the user through a visual or textual representation.
crystal’s conversation is driven mainly by the following factors:
- Main entity
- Filter entity
A good rule of thumb is the following:
- Show me the objective for the main entity in filter value.
- For example: “Show me the trend for sales in 2021”.
- Show me the main entity objective in filter value.
- For example: “Show me sales over time in Germany”.
Consequently, it is very important to prepare the data to create topics that have a natural way of requesting them.
In order to display your data correctly, crystal will follow these data formatting conventions and rules.
crystal works with Text fields. It supports most of the commonly used categorical types. Here you can find the complete list of supported data sources. Depending on the connected data source, you could need to transform some of your dataset columns.
Mind that this is just an example and it needs to be adapted, based on the exact case and technology used.
We often encounter qualitative fields, such as product description, address and others, containing special characters that are not properly encoded as Unicode.
To clean these data types, it is more convenient to use the regular function expression.
How to handle
How to handle
The floating numeric from customer data source should follow the standard format that uses the comma as thousand separators (e.g.1,200 for one thousand two hundred) and the dot as decimal separator (e.g. 1.2 for one and two).
- 1.234 = one and two hundred thirty-four thousandths
- 1,234 = one thousand two hundred thirty-four
However it can happen quite often to find numeric data that violates these conventions.
How to handle
In the following example, you can find some common scenarios and how to fix them.
- If our data already complies with the common standard (dot for decimal and comma for thousands separators):
- If our data does NOT comply with the common standard:
At the moment crystal is not supporting complex fields like “Json objects“ for Postgres or “LOB“ for Oracle. If in a table such field types are present, crystal will not be able to use them in the topic configuration process, nor will she be able to preview the related data.
The topic configuration is a key process for the success of the crystal project, that’s why it is extremely important to provide a clean data structure by following some guidelines.
A data source is mainly composed of tables that are also made of columns.
Main entities can be configured by using a simple rule of thumb:
One column = one main entity
This is the suggested and recommended approach for most cases. Each column must contain the values or content of a single entity.
For example, let’s consider that we have to create a topic whose main entity is “Sales” and this is our data:
In this case Sales will be our main entity. Other main entities, such as the chargebacks in our example, can be configured on the related column.
A crucially important factor to keep in mind is that entities’ aliases MUST be unique , therefore we will not be able to name another main entity within crystal “Sales”.
Also note that:
- If, conceptually, there are multiple columns that refer to generic “sales”, they need to be centralized in the same column and in the same table to enable the entity “sales” to be unique in the conversation and allow the system to identify the source field.
- if there are multiple sales columns that conceptually have differences (e.g. actual sales & budget sales), they can be different columns and can have different alias names (such as “actual sales“ and “budget sales”).
There is a possible alternative approach that can be used in peculiar cases. Let’s take a table like this as an example:
In this case, we can use the “value” column as the main entity and the “indicator” column as a filter entity. Nonetheless, this approach is solid when the values within the column “Value” can be logically aggregated with one of the supported functions: otherwise there is the risk of showing inconsistent data to the final user.
Moreover, the conversation structure will be directly affected, as the main entity, and therefore the focus of the topic, will be “value”, rather than the values of the filter dimension (such as, for example, revenues or costs).
If a conceptual dimension (for example, “product”) is present in multiple tables (like, 3), and therefore in multiple columns (also 3), and it has to be used as a filter for topics configured on the three objects, there are two options:
- We could configure two different entities (for example, “Sales”and “Sales target“) and potentially trigger the disambiguation feature, in the case the request to the advisor is partial or incomplete (for example, by specifying only "sales").
- Or, we could centralize the filter column in a single object to configure the entity in a single column.
Let’s see an example.
If we assume that “product” is the only filter dimension that we want to share between the two tables, we will have to create a master table and two “data” tables as follows, linked by a key that we will use to join the two tables.
Table 1: Sales targets
Table 2: Products
The “Name” column would become the Product filter entity.
Assuming that Product is the only filter dimension that we want to share between the two tables, we can directly create a single table containing all the filters and main entities.
The driver for this decision would be the granularity of the data and the amount of entities that share the same filters.
Master table: sales
Table 1: Sales targets
Table 2: Products
Calculated columns are customized columns created dynamically through operations. In the below case, for example, we have the total column which is generated by the system by multiplying price per quantity. At the moment crystal does not support the creation of calculated columns.
Join can be performed on a single key, meaning that you cannot use multiple fields to join two tables. Moreover, join operations can be performed only if the objects reside in the same data source (such as, database connection).
The joined tables can be further joined to a maximum of 5.
The performance of crystal’s answers depends on the number of the join operations and the performance of the underlying data source (for example, indexing and other database optimization techniques can be performed to speed up the data retrieval and enhance crystal answering performance).
The supported joins are:
You can sort the final result by any column of the connected data, choosing ascending or descending order.
You can apply multiple static filters to a topic with a specific operation. For example, you can select “category” as a static filter and say that you want to filter category values that EQUAL “gadget” (meaning that only the “gadget” objects will be taken into account in the conversation), or the “price” as a static filter with operation “greater than” and value 100, and your topic will only retrieve those values which are above 100.
The available options are:
- Not Empty
- Greater Than
- Less Than
- Greater Than Or Equal To
- Less Than Or Equal To
You can choose among the following aggregation functions:
- Count distinct
We hope this article will help you to prepare and format your data at its best, to give you the most complete and successful crystal experience.
Do you want to know more about topic creation? Check out our Tutorials for admins section on Resources!