
Elixir Data Designer provides a generic processor that allows certain specific data processing tasks to be executed. These tasks are designed to be easily customized in different versions of the tools, so you may find additional options available.
The standard processors are:
Remove Duplicates
This task removes duplicate records. A duplicate record is determined by a set of key fields being the same. The set may contain a single field, for example an identity card number, or a collection of fields, for example name and address. The fields are chosen by selecting a checkbox next to the appropriate items in the Test column of the schema table, as shown in Figure 4.39, “Remove Duplicates Processor”
Whenever subsequent records are identified as being duplicates of those already processed, the later records are always discarded. This means only the first record with a given set of key fields will be passed through. If the records are sorted such that duplicate records are adjacent, you should tick the Input Sorted checkbox so that the system can use an alternate algorithm to reduce memory usage. When the Input Sorted checkbox is selected, each record is only compared against the previous record, instead of all previous records, resulting in faster performance and reduced memory requirements.
Invert Data
This task inverts the data records so that the rows become the columns and vice-versa. In order to convert the rows to columns, one original column must provide the names of the columns in the new schema. This column should contain unique values to ensure the resulting schema doesn't have duplicate column names. Select the column that will provide the column names at the top of the wizard, as shown in Figure 4.40, “Invert Data Processor”. Conversely, you might want the old column names to be retained as row values. If so, enter a field name in the next text field, otherwise leave it blank. Specifying a name will add a column in the output schema of type String, whose values will be equal to the Keep columns described below.
The next step is to identify which of the old columns to keep - those that should be mapped into the new inverted structure. The resulting dataset will contain only as many records as are marked Keep. Because each row becomes a column when inverted, the system will need to determine the appropriate column type in the output schema. If the selected columns are all of the same type, then this type will be used. Alternatively, if the selected columns contain mixed types, then the String type will be used in the output schema to ensure all values can be represented.
As an example, assume a simple table like this:
A B C D 1 2 3 4 5 6 7 8
Where A,B,C and D are the column names. If we choose InvertColumn=B, FieldName=Field, Keep=A,D then the output after Invert Data will be:
Field 2 6 A 1 5 D 4 8
As you can see, because two columns were selected for keeping, there are two records in the output. The old column names are assigned to be the values of the new column called Field (the FieldName value), and the unique values of the B column (2 and 6) become the column names in the new schema.
Javascript
This task allows the user to perform any kind of record operation or schema re-arrangment which might be too complicated for regular flows. With this processor, a user can now save a lot of complex joins, filters and concatenations which is required previously.
The scripts are to be entered in the text field in the respective tabs, as shown in Figure 4.41, “Javascript Processor”
SQL
This task will allow the user to send any SQL commands to the database based on the standard flow events after a JDBC connection is set up on the second page of Processor Wizard, as seen in Figure 4.42, “SQL Processor Wizard”.
The output of the SQL Processor is the same as the input and records passing through are not being modified as the user is only interacting with the database. With this, the user can load data in bulk into the database faster.