- Conformed Dim.: are the dimension which is shared between multiple facts
- Parent – Child Dim : One dimension is derived from other
- Role playing Dim. : Dimension are used for multiple application in same database
- SCD : Dimension in which value change slowly in time
- Time Dim : Dim which uses time as an changing factor
- Rapid changing Dim: Which changes rapidly continuously
- Junk Dim. : is group of flags. The attributes in junk dimension do not belong to the fact table.
- Degenerate Dim: is a dimension which is derived from the fact table and doesn't have its own dimension table. (it do not have its own fact)
- Small Dim.: Few no of records
- Big Dim.: Duplicate records
- Empty Dim.: It derived from a fact table and it does not have any dimension.
- Mini Dimension: subset of large dimension
SSIS, SSAS, SSRS, T-SQL, Optimization, DTS, ETL strategy ,ETL packages, Troubleshooting, Data Modeling, Dimensional Modeling, Data Warehouse Design, Cube Design, OLTP, OLAP, XML
Friday, July 12, 2013
Types of Dimensions
Initial and Incremental Load using SQL Server
Initial and Incremental load is one of main part of development in DW. Initial load is also known as first load. In initial load We had only insertion using left join.
In Dimension modeling we had star and snowflake schema. In snowflake schema dimensions are more normalized. In Initial load we first load data into child dimension than into parent dimensions. After inserting record into all dimension final insertion is done on fact tables.
In incremental load, extraction of data plays major role.
To find new and updated data there are many options are available.
1. Checksum
2. Hashbytes
3. Auditing architecture
4. CDC
5. Timestamp
Checksum can be performed at both level table as well as column level.
Hashbytes are performed at column level. Hashbytes are best option for text columns instead of using checksum.
Auditing architecture is handled by triggers on INSERT, UPDATE and DELETE options.
CDC is one of best feature in SQL Server 2008. You can enable first on database level. Than can enable on column level or table level.
Timestamp is used for both INSERT and UPDATE operations.
In transformation phase, transformations are done according to destination and business requirements.
In load phase, data are loaded in DW or Datamart using insertion and updation. Updations are done based on dimension type. If dimension SCD than Type 0,1 and 2.
In the case of flat files BULK INSERT is best option to load data into staging tables.
In Dimension modeling we had star and snowflake schema. In snowflake schema dimensions are more normalized. In Initial load we first load data into child dimension than into parent dimensions. After inserting record into all dimension final insertion is done on fact tables.
In incremental load, extraction of data plays major role.
To find new and updated data there are many options are available.
1. Checksum
2. Hashbytes
3. Auditing architecture
4. CDC
5. Timestamp
Checksum can be performed at both level table as well as column level.
Hashbytes are performed at column level. Hashbytes are best option for text columns instead of using checksum.
Auditing architecture is handled by triggers on INSERT, UPDATE and DELETE options.
CDC is one of best feature in SQL Server 2008. You can enable first on database level. Than can enable on column level or table level.
Timestamp is used for both INSERT and UPDATE operations.
In transformation phase, transformations are done according to destination and business requirements.
In load phase, data are loaded in DW or Datamart using insertion and updation. Updations are done based on dimension type. If dimension SCD than Type 0,1 and 2.
In the case of flat files BULK INSERT is best option to load data into staging tables.
Subscribe to:
Posts (Atom)