Initial and Incremental load is one of main part of development in DW. Initial load is also known as first load. In initial load We had only insertion using left join.
In Dimension modeling we had star and snowflake schema. In snowflake schema dimensions are more normalized. In Initial load we first load data into child dimension than into parent dimensions. After inserting record into all dimension final insertion is done on fact tables.
In incremental load, extraction of data plays major role.
To find new and updated data there are many options are available.
1. Checksum
2. Hashbytes
3. Auditing architecture
4. CDC
5. Timestamp
Checksum can be performed at both level table as well as column level.
Hashbytes are performed at column level. Hashbytes are best option for text columns instead of using checksum.
Auditing architecture is handled by triggers on INSERT, UPDATE and DELETE options.
CDC is one of best feature in SQL Server 2008. You can enable first on database level. Than can enable on column level or table level.
Timestamp is used for both INSERT and UPDATE operations.
In transformation phase, transformations are done according to destination and business requirements.
In load phase, data are loaded in DW or Datamart using insertion and updation. Updations are done based on dimension type. If dimension SCD than Type 0,1 and 2.
In the case of flat files BULK INSERT is best option to load data into staging tables.
In Dimension modeling we had star and snowflake schema. In snowflake schema dimensions are more normalized. In Initial load we first load data into child dimension than into parent dimensions. After inserting record into all dimension final insertion is done on fact tables.
In incremental load, extraction of data plays major role.
To find new and updated data there are many options are available.
1. Checksum
2. Hashbytes
3. Auditing architecture
4. CDC
5. Timestamp
Checksum can be performed at both level table as well as column level.
Hashbytes are performed at column level. Hashbytes are best option for text columns instead of using checksum.
Auditing architecture is handled by triggers on INSERT, UPDATE and DELETE options.
CDC is one of best feature in SQL Server 2008. You can enable first on database level. Than can enable on column level or table level.
Timestamp is used for both INSERT and UPDATE operations.
In transformation phase, transformations are done according to destination and business requirements.
In load phase, data are loaded in DW or Datamart using insertion and updation. Updations are done based on dimension type. If dimension SCD than Type 0,1 and 2.
In the case of flat files BULK INSERT is best option to load data into staging tables.
No comments:
Post a Comment