Star and snowflake schemas are the most popular multidimensional data models used for a data warehouse. The crucial difference between Star schema and snowflake schema is that star schema does not use normalization whereas snowflake schema uses normalization to eliminate redundancy of data. Fact and dimension tables are essential requisites for creating schema. You can also refer our previously published article on the difference between fact and dimension table to understand it thoroughly.
The design of relational databases involves entity-relationship data model. In these models, a database schema consists of a set of entities and the relationships between them. Such kind of data model is appropriate for online transaction processing. Further, data warehouse needs brief subject oriented schema which assists online data analysis. A schema is used to describe the entire database logically. Similarly, data warehouse requires schema for its maintenance.
The design of relational databases involves entity-relationship data model. In these models, a database schema consists of a set of entities and the relationships between them. Such kind of data model is appropriate for online transaction processing. Further, data warehouse needs brief subject oriented schema which assists online data analysis. A schema is used to describe the entire database logically. Similarly, data warehouse requires schema for its maintenance.
Comparison Chart
Basis for comparison | Star Schema | Snowflake Schema |
---|---|---|
Structure of schema | Contains fact and dimension tables. | Contains sub-dimension tables including fact and dimension tables. |
Use of normalization | Doesn't use normalization. | Uses normalization and denormalization. |
Ease of use | Simple to understand and easily designed. | Hard to understand and design. |
Data model | Top-down | Bottom-up |
Query complexity | Low | High |
Foreign key join used | Fewer | Large in number |
Space usage | More | Less |
Time consumed in query execution | Less | More comparatively due to excessive use of join. |
Definition of Star Schema
Star schema is the simple and common modelling paradigm where the data warehouse comprises of a fact table with a single table for each dimension. The schema imitates a star, with dimension table presented in an outspread pattern encircling the central fact table. The dimensions in fact table are connected to dimension table through primary key and foreign key.
Example :
We are creating a schema which includes the sales of an electronic appliance manufacturing company. Sales are intended along following dimensions: time, item, branch, and location. The schema contains a central fact table for sales that includes keys to each of the four dimensions, along with two measures: dollar-sold and units-sold. The capacity of the fact table is reduced by the generation of dimension identifiers such as time_key and item_key via the system.
Only a single table imitates each dimension, and each table contains a group of attributes as it is shown in the star schema. The location dimension table encompasses the attribute set {location_key, street, city, state and country}. This restriction may introduce some redundancy. For example, two cities can be of same state and country, so entries for such cities in the location dimension table will create redundancy among the state and country attributes.
Definition of Snowflake Schema
Snowflake schema is the kind of the star schema which includes the hierarchical form of dimensional tables. In this schema, there is a fact table comprise of various dimension and sub-dimension table connected across through primary and foreign key to the fact table. It is named as the snowflake because its structure is similar to a snowflake.
It uses normalization which splits up the data into additional tables. The splitting results in the reduction of redundancy and prevention from memory wastage. A snowflake schema is more easily managed but complex to design and understand. It can also reduce the efficiency of browsing since more joins will be required to execute a query.
Example :
In the snowflake schema, we are taking the same example as we have taken in the star schema. Here the sales fact table is identical to that of the star schema, but the main difference lies in the definition of dimension tables.
The single dimension table for the item in the star schema is normalized in the snowflake schema, results in creation of new item and supplier tables. For instance, the item dimension table comprised of the attributes item_key, brand, item_name, type, and supplier_key, where supplier_key is connected to the supplier dimension table, which holds supplier_key and supplier_type information.
Similarly, the location dimension table involves the attributes location_key, street, and city_key, and city_key is linked to city dimension table containing the city, state and country attribute. Here state attribute can also further normalized.
Key Differences Between Star and Snowflake Schema
- Star schema contains just one dimension table for one dimension entry while there may exist dimension and sub-dimension table for one entry.
- Normalization is used in snowflake schema which eliminates the data redundancy. As against, normalization is not performed in star schema which results in data redundancy.
- Star schema is simple, easy to understand and involves less intricate queries. On the contrary, snowflake schema is hard to understand and involves complex queries.
- The data model approach used in a star schema is top-down whereas snowflake schema uses bottom-up.
- Star schema uses a fewer number of joins. On the other hand, snowflake schema uses a large number of joins.
- The space consumed by star schema is more as compared to snowflake schema.
- The time consumed for executing a query in a star schema is less. Conversely, snowflake schema consumes more time due to the excessive use of joins.
Conclusion
Star and Snowflake schema is used for designing the data warehouse. Both have certain merits and demerits where snowflake schema is easy to maintain, lessen the redundancy hence consumes less space but complex to design. Whereas star schema is simple to understand and design, uses less number of joins and simple queries but have some issues such as data redundancy and integrity.
However, use of snowflake schema minimizes redundancy, but it is not popular as star schema which is used most of the time in the design of data warehouse.
No comments:
Post a Comment