Tuesday, January 21, 2020

Fact and Dimension Tables

fact-table-and-dimension-tableThe Fact Table and Dimension Table, are the essential factors to create a schema. A fact table’s  record is a combination of attributes from different dimension tables. Fact table helps the user to analyze the business dimensions which helps him in decision taking to improve his business. On the other hands, dimension tables help fact table to collect dimensions along which the measures has to be taken.
The point that distinguishes Fact table and Dimension table is that the dimension table contains attributes along which measures are taken in fact table. There are some other factors that create differences between Fact Table and Dimension Table to view them, let’s have a glance at the comparison chart show below.

Comparison Chart

Basis for ComparisonFact TableDimension Table
BasicFact table contains the measurement along the attributes of a dimension table. Dimension table contains the attributes along which fact table calculates the metric.
Attribute & Records Fact table contains less attributes and more records.Dimension table contains more attributes and less records.
Table sizeFact table grows Vertically.Dimension table grows horizontally.
Key Fact table contains a primary key which is a concatenation of primary keys of all dimension table. Each dimension table contains its primary key.
CreationFact table can be created only when dimension tables are completed.Dimension tables need to be created first.
Schema A schema contains less number of fact tables.A schema contains more number of dimension tables.
AttributesFact table can have data in numeric as well as textual format.Dimension table always contains attributes in textual format.

Definition of Fact Table

A Fact table is a table that contains measurements along the attributes of dimension tables. It can contain the information at lowest possible level. Some fact table just contains summary data, called as Aggregated Fact Table. The fact table almost contains the date stamped data. Let us discuss the characteristics of a fact table.
Concatenated Key
Fact table contains Concatenated key which is the concatenation of primary keys of all the dimension tables. The concatenated key of fact table must uniquely identify the row in a fact table.
Data Grain
Data grain shows how deep the measurements in fact table have been stored. Data grain must be at the possible highest level.
Additive Measures
Attributes of the fact table can be fully additive or semi-additive. Fully additive measures are those that can be easily summed up for all dimensions in fact table. For example quantity_ordered, is an attribute that can be summed up for all dimensions. Like, we can take out total quantity_order, for a particular customer, region, date, brand, etc.  Semi-additive measures are those which can be summed along some dimensions of fact table but not all dimensions. Like, balance amount can not be summed up over time dimension as it changes over the time.
Sparse Data
Sometimes we may see the records in fact table that has attributes with null measures.  For example, there may not be any order on a holiday. So, attributes for this date will have null measures. We do not have to store measure for such kind of records as it does not provide any information.
Degenerated dimensions
Sometimes you may come across some dimensions in fact table, which are not additive at all. For example order_number, customer_id, you can not add these types of dimensions. However, in case, you need to find order made by a particular customer in this month; then you will need the customer_id to relate your search. These types if attributes or dimensions of fact table are called Degenerated Dimension.

Definition of Dimension Table

Dimension Table is a key component for Start Schema.A dimension table contains the attributes which represent dimensions, along which the measurement is taken in fact table. Further, we will discuss some characteristics of a dimension table.
Attributes and Keys
Every Dimension table must have a primary key that uniquely identifies each record of the table. It is commonly observed that the dimension table contain many attributes. Hence, it appears to be wide i.e. when you create a dimension table you will find it spreading horizontally.
Attribute values
The values of the attributes in dimension table are rarely numeric, most of the times you will find the values in attributes are in textual format. For example product name, brand, category, sub-category, etc.
Relation among Attributes
Frequently you can observe, the attributes you come across in a dimension table are not directly related. Like, Product_brand  has to do nothing with the package_date but still both could be the attributes of Product dimension table.
Normalization
The dimension table is not supposed to be Normalized. This is because normalizing a table would create many intermediate tables. When a  query picks up an attribute from dimension table and recovers measurements along that for the fact table, the query has to go through those intermediate tables which become inefficient. Hence, dimension tables are not Normalized.
Drilling down, rolling up
Attributes of dimension table allow you to get the details either by traversing from higher level of aggregated attributes to lower level attributes. For example, if you want to find the total sale in a region then you may drill down to find sales by state, city, zip. You can even roll up to find total sales first by the zip, then by city and then state.
Multiple Hierarchy
Often dimension table offers multiple hierarchies. For example, we have a product dimension table for a departmental store. Now, we have two departments marketing and accounting department.
The marketing department will drill down among attributes of product dimension table in a certain hierarchy to obtain measurements for the fact table. On other hands, accounting department will drill down among the attributes of product dimension table in the different hierarchy to obtain measurements for the fact table.
So, dimension table must have multiple hierarchies or level of aggregation of attributes to let user drill down along any of the multiple hierarchies.
Records
Though a dimension table has too many attributes, it has fewer records.

fact-table_vs_dimension-table

Key Differences Between Fact Table and Dimension Table

  1. Fact table contains measurement along the dimension/attributes of a dimension table.
  2. Fact table contains more records and less attribute as compared to dimension table whereas, dimension table contain more attributes and fewer records.
  3. The table size of fact table grows vertically whereas, table size of dimension table grows horizontally.
  4. Each dimension table contains a primary key to identify each record in the table whereas, fact table contains concatenated key which is a combination of all primary keys of all dimension table.
  5. Dimension table has to be recorded before the creation of fact table.
  6. A Schema contains fewer fact tables but more dimension tables.
  7. Attributes in fact table are numeric as well as textual, but attributes of dimension table have textual attributes only.

Conclusion

Both are equally important for a creation of schema but dimension table must be recorded before fact table. As it is impossible to create fact table without dimensions.

No comments:

Post a Comment