|
Cubes
|
Dataodyssey Express allows to retrieve and analyze data directly from heterogeneous, compatible and incompatible data sources without intermediate storing and normalization. Naturally in most cases data is initially structured for ease of input, editing and online analytical processing. But it practically always fails to create a data warehousing structure that could be at most handy for storage, effective operation and quick comprehensive analysis from every conceivable aspect. That is why data is turned into structures maximally apt for specific analysis to be efficiently analyzed. In terms of OLAP, the information is periodically retrieved from databases, mapped and stored in data warehouses. According to the same terminology data in the warehouses is stored in the form of cubes. A cube is a logical concept. Actually for Dataodyssey Express it presents a set of tables which (tables) in most cases with redundancy, and have columns with analytical features called dimensions. There were two invoice tables considered in the sample (INVOICE_HEADERS and INVOICE_ITEMS) and you can draw the following conclusions:
storing invoice headers and items in different tables eliminates abundance and maximally fits the invoice structure. In terms of relational algebra the full partitioning (its tables) described in the sample is in one of normal forms (normalization);
in whole sales analysis requires every time to join headers with items, what takes much time when operating data bulks and to sum and group after only;
for most reports the invoice tables usually use mass of information that is suitable for one analytical report and is not for another (e.g. account statement time, internal codes, limax cell numbers, storekeepers' names, car numbers, etc.);
for ease of sales analysis it would come in handy to place year and month of sales in special columns;
it would be extremely good not to search for names and states of clients and for names of goods and warehouses in separate directories.
It is useful to retrieve only the information required for the sales report group, unite it in a table and create indices to speed up search in columns. This table is called a cube (in terms of Dataodyssey Express), analytical columns with indices – dimensions, and a retrieval by one or several dimensions – a cube cutoff. Naturally there is abundance, but it is a sacrifice to ease and speed of access.
Dataodyssey Express allows to create cubes which include a single table or set of tables, describe several updates for each table and update the cubes with layers of information that could be rolled back. The list of cubes is opened by clicking \Space\Cubes. The instrument looks like an explorer that consists of three parts: the left with the tree of cubes and analytical features (destinations, data source), top-right wit the list of cube tables and bottom-right with the list of cube table updates. The lowest list is the update log, where error and success records are fixed.
To create a new
cube click the
button above list of cube tables and fill the cube's properties. Cube is just a
group of tables and you can create these groups (cubes) from inside of
properties of tables.
Reversible and irreversible cubes
It is recommended to always create reversible cubes. There will be additional fields within and in some cases the cubes are slower to update (and faster in other), but you will always have an opportunity to roll update layers back one by one. Moreover, when creating a new layer of a reversible cube, no irreversible changes are made to relevant information of previous layers, and the possibility of data restoration by operative intervention remains.
Cube properties are few and they are always subject to change. Click
.
When creating a cube you should specify a data source that it will be stored in, name, destination and the update mode (reversible or irreversible). Only the name and destination will remain editable then.
You can create several updates for one cube. For example, data is received from
various sections of a distributed network and the peculiarities of every section
should be taken into account (e.g. warehouses are geographically distant). To
create a new update select a created cube definition in the
cube list and click
. Update structure is very much like the
structure
(properties) of a
summary mission. To
open an update for editing click the
button.
There are the following elements in addition (to summary mission structure):
![]() |
![]() |
Depending on update type and methods you should select or clear these checkboxes for certain dimensions. |
![]() |
![]() |
Like for dimensions |
![]() |
The Update filter condition area |
Besides the filter condition for retrieving data you can build a filter condition to update data. If it is created, the nonconforming cube data remains unchanged. |
![]() |
The date interval for a layer |
For updating by layer dates |
If certain tables are located in hard-to-reach places, it makes sense to store their instances in easily accessible data sources and periodically update the contents. There can be a reversed situation, when the updated directory needs to be periodically sent to remote data sources with slow transmission links. Everything depends on the side the changes are made to the table from (though the changes can be made from different places). Replications can be called the instances of an entire table or its fragments that need to be periodically updated with the data from original table. In Dataodyssey Express the replication differs technically from cube only in two specialities:
the analytical feature "Replication";
the name of a replication (cube) is the name of the table (part of the table for cube).
Replication is usually simpler than cube by contents and is not based on data models. For example, if you are going to manage an instance of the CUSTOMERS table from a Microsoft Access database in a Microsoft SQL Server database, you should:
create a cube, select the "Replication" checkbox in its properties and specify the table's name as its name. For example, CUSTOMERS_ACCESS;
create an update, not based on models, adding only one table to it, as in the picture:
If you create
only keys for an update, the program will replace the old key records in the
process of update. New records that have no correspondences in the cube will
just be added to it. If the cube is reversible, it has two fields:
L_A_Y_E_R - has an entry with unique date-time layer values
in the form of YYYYMMDDhhmmss.
D_E_L_ETED - has an entry with unique date-time layer values
in the form of YYYYMMDDhhmmss that mark it as outdated.
!!! When further employing
reversible cubes in
missions you should always add
conditions for the D_E_L_ETED field. It should be empty for actual records !!!
During the process of updating a reversible cube by keys the program won't
replace old key records if they match the new ones. The D_E_L_ETED field of such
records will have the new layer's identifier (if empty).
Updating by keys and layer dates
In the process
of updating by keys the intervals of new and previous layer can slightly
overlap, for example because the previous layer had insufficient or excessive
data at its creation. In this case the overlapped date fragment should be
deleted before creating a new layer. Otherwise at consequent deleting there are
already no records in the retrieved data, while they still exist in the leftover
previous update and will be excessive (wrong) in the cube. For such preliminary
cleanup of old updates by dates it is necessary to specify one field in
dimensions or an
aggregate, selecting the Date-Time checkbox by clicking
.
Before performing an update it is required to specify
layer
dates correctly. After deleting the remains of the previous layer by these dates
the rest of the update will go by keys.
Naturally, no previous layer remains are deleted in a
reversible cube, and the D_E_L_ETED field will get the identifier of a new
layer.
You can have a specific need for comparing modification dates of previous and
new records, when updating by keys. If the modification date of a new record is
later, the record gets into layer. All the rest happens like when updating by
keys. Select one of the
dimensions
or
aggregates by the
button.
Updating without keys and layer dates
If you specify no keys and layer dates at all, no old data checks will be made in a cube. The new layer will just be added to the cube. If the cube is irreversible and its data source is compatible with that of the update, the process will be at maximum fast using the "Insert Into ... Select From" construction.
Keyless updating by layer dates
If you specify no keys, defining a Date-Time layer date only, the program will clean the old layer remains by the date interval and add new data as a new layer without analyzing old records.
To edit update properties select a cube and update in the lists, click the right button above the list of cubes. The following update properties can be changed:
last layer's dates (expiry date is actual, because it is the starting date for a new layer);
A date interval can be specified in the bottom of the update model window or in the window of update properties and:
is used when updating by layer dates;
is stored (the last one) in the update structure;
is offered changed when performing a new update;
can be combined with certain options in the Update filter and Select filter conditions.
To combine it with certain condition options select the "Layer dates 1" checkboxes in these options.
Copying tables by means of replications
You can select the "Replace the entire table" checkbox in the update properties. If the checkbox is selected the current table will be deleted before performing every update. Practically you can customize complex table copying schemes and batch them, execute them from command line or another program.
To change an
update you can click the
button in the
list of cubes, after the Updates row. The edit window is
totally an analog of that of
update creating, though its not dialog and can be
in the background when you operate other objects. After making changes to update
structure in the process of saving, if a cube already exists the program
compares the new update structure with that of the given cube. The cube
structure will be changed, if necessary.
When you create a new update and have created no cube yet, the program creates a cube. Cube is a table in a specified data source. The name of the table is like DTODYE_CUBE_cube_name_cube_system_ID. If the cube already exists only the missing indices are created.
To update an
existing cube select it from the
list of cubes and click the button
. The
entire update structure will be displayed in a dialog window. It will be
necessary to specify
layer dates. Besides, you can temporarily change the
update
structure just for this time. Structure changes are not saved.
Attention! If a cube has been created without client side information retrieval, the length of calculated fields may not coincide with the one forecasted. In this case next time the cube updates the program will suggest changing the field length (once per cube). But! If there are several updates defined for the cube, be sure that the calculated field lengths of different updates match. To do it, open the expression builder for dimensions.
Select a cube from the list of cubes and click "More actions".
If the cube is reversible, the "Undo last layer" item will be available. The program will find the last layer and request for confirmation.
In the process of rollback Dataodyssey Express:
removes all the rows with the L_A_Y_E_R_ field value equaling the last layer's identifier;
clears the D_E_L_ETED field
for the rows in which it equals the last layer's identifier.
When you are selecting a particular update from the list of cubes, this update's log extract is displayed in the bottom of the window. The log traces the following facts:
cube table deletion
Besides, if an update is batched (for a group), the log will trace error messages. For all that it is not always possible to refer an error to a certain update, that is why you should use the table browser (the dtoe_logrec table in the data source with the data map).
When making the
list of dimensions for an update each dimension gets the Index checkbox
selected automatically. When the update is saved, the program creates these
indices. But if they are damaged, the speed of data retrieval from cube can
drop. Use appropriate tools (SQL Enterprise Manager, etc.) to watch over the
indices. In case of damage all the indices can be recreated for an existing
cube. Find the cube on the
list of cubes and click
. The
program will find all the fields with the Index checkbox selected and
recreate them for all
updates.
To delete an update find it on the list of cubes and click "More actions" in the update button group. No changes will be made to the cube, but when you create a new update or change another the program will try to delete all the missing fields in all updates.
Find the cube on the list of cubes and click "More actions" from the cube button group. After deleting all cube properties from system tables the program tries to delete the cube's table. Before doing all that the program will check if any saved mission plan has reference to the table.
You can add a cube as a table into a data model for any mission or an update of another cube. But !!! If the cube is reversible, it is compulsory for this table to specify in the filter condition that the D_E_L_ETED field of the retrieved rows is empty. The program will take care for this when adding the table to the data model, but you had better take over control!
You can perform an
update for several cubes
simultaneously from the cube list. Select the group you
need from the tree of analytical features (the tree structure can always be
changed by the
button) and click
. Then you will be able to change the condition for the group in a dialog. The
condition can involve:
destination of a cube/replication
a group of cubes/replication
cube feature
After batch updating the update log can contain error messages highlighted red.
Batch updating is possible when Dataodyssey Express is launched from the command line. You can order the Windows Scheduler to run the errand. All the errors and update facts will be traced in the log.
The syntax:
DataOdyssey.exe uid pwd UPDATE_CUBES [purpose] [group] [replication]
The first three parameters are compulsory: user name, password, and the UPDATE_CUBES command. The fourth parameter is used to transmit destination code, the fifth - group code, sixth - the replication feature 1 (replication) or 0 (cube). For example:
DataOdyssey.exe sa 123 UPDATE_CUBES 1 1 0