December 8, 2011

Line Item Dimension and High Cordinality



1)  Line item: This means the dimension contains precisely one characteristic. This means that the system does not create a dimension table. Instead, the SID table of the characteristic takes on the role of dimension table. Removing the dimension table has the following advantages: When loading transaction data, no IDs are generated for the entries in the dimension table. This number range operation can compromise performance precisely in the case where a degenerated dimension is involved. A table- having a very large cardinality- is removed from the star schema. as a result, the SQL-based queries are simpler. In many cases, the database optimizer can choose better execution plans. Nevertheless, it also has a disadvantage: A dimension marked as a line item cannot subsequently include additional characteristics. This is only possible with normal dimensions. We recommend that you use Data Store objects, where possible, instead of Info Cubes for line items. See Creating Data Store Objects. 
2) High cardinality: This means that the dimension is to have a large number of instances (that is, a high cardinality). This information is used to carry out optimizations on a physical level in depending on the database platform. Different index types are used than is normally the case. A general rule is that a dimension has a high cardinality when the number of dimension entries is at least 20% of the fact table entries.
The size of the dimension tables should account for less than 10% of the fact table. If the size of the dimension table amounts to more than 10% of the fact table, mark the dimension as a line item dimension.
The line item flag in dimension: 
- You can flag it for a dimension having only one characteristic
- when this indicator is flagged, no dimension table will be created. This means that when you will filter on this char in a report, the system will directly access your Info Cube fact table instead of first going to the DIM and then JOIN the fact table.
When you'll drilldown this char, the same, no need to JOIN the DIM table since your char SID is stored directly in the fact table.
During loading, the same again; no need to check and generate DIMID for this characteristic --> shortens the loading time.


- When to flag it: you need to estimate what will be the ration "Number of different Values of the char posted" / "Total number of records posted".
For example; one record is posted to a cube for each Document Number. In this case the ratio is 100%, you can flag the dim as Line Item.
Imagine now that you are posting the document and its items and you can say that you have in average 10 items per documents. The ratio for your document number characteristic is 1/10 = 10%. In this case you shouldn't flag it because filtering for document numbers will much faster via the DIM table than directly from the FACT (much more records to scan to find it).


No comments:

Post a Comment