home -> academy -> Case-study projects -> TatyaBank_cardPayments_csProject2
On this page > End to end DW... <> Practices <> Project tools...

End-to-end DW construction plan by stages

0. Source data resides on the source location (as DB or files).

1. Landing stage
Populating the LND DB schema. The data is got from the source location as it is.
No type-conversion is applied, no transformation logic /rules are applied. Bulk-load utility usage is recommended.

2. Staging stage
Populating the STG DB schema. All type-conversion happens here, plus some transformation logic.
Surrogate Key-cutting (from Business Key) can happen here to some extend.

3. Integration stage
Populating the INTG DB schema (DW itself). It is normalized (3NF).
The rest Surrogate Key-cutting and transformation logic are applied here.

4. Presentation stage
Populating the DataMart(s) DB schema. It is denormalized, star or snowflake schema.
DataMarts are designed and produced for the needs of specific department, for example Sales dept, Risk dept, etc.

5. BI reports ran by the client /business users.

Landing and Staging could merge in one stage.

Practices

What adds complexity and some practices:
  • information security standards - encryption, hashing, masking; on-the-fly load...
  • some source data can be tricky to read (cobol files, for example);
  • do implement parallel execution /processing wherever possible;
  • the data dependencies force the ETL elements execution to happen in strict sequence;
  • Staging and Integration stages should apply SCD2 and SCD1 logic for the dimensions (dim tables)
  • Presentation stage could use Conformed Dimensions
  • ...

The job /project includes:
  • 1. DB /DW modelling.
  • 2. ETL-tool-specific transformations usage, SQL queries optimization, pipes' performance optimization;
  • 3. RDBMS-specific -> DB custom DB-env functions (in packages), i.e. stored procedures, triggers, etc in PL/SQL (oracle), T-SQL (ms SQL srv) or other SQL-procedural-language supported by the DBMS engine used.
  • 2. Batch scheduling (orchestration), could be ETL-tool-specific (point 2).
  • 3.&4. Security-related -> encryption (RSA keys).
  • 4. BI-tool-specific -> reports development.
  • 5. Some OS-environment-specific (applicable, acceptable) programming/source-code could be necessary for QA, orchestration purposes, security, DBA tasks (grants, deployments...) automation (3.&5.), etc.
    I.e. source-code in bash shell (linux), power shell (windows) or even Java, .Net and any other preffered.

Q: What is necessary in order the system to be able to efficiently handle all oncoming high /big data volumes, new CR requirements...("CR" is defined in home>academy>Glossary) all clients' expectations and requirements?

A: Defined and working mechanisms /processes are necessary, for fine-tuning of all server environments (dev qa preProd prod), the whole infrastructure to be secure, easy manageable, optimized for good performance, monitored for software updates and hardware upgrades...

Project tools, for example

  • 1. DB /DW modelling tool -> ER-studio.
  • 2. ETL tool -> Informatica.
  • 3. RDBMS -> Oracle, incl. SQLLDR utility.
  • 4. BI tool -> Microstrategy.
  • 5. OS-environment, server operating system -> Red-hat Enterprise linux on all environments (dev qa preProd prod). Hardware configuration - HDD RAID, disaster recovery plan...
  • Other software - SVN version mngmt, Jira collaboration and bug-tracking, Sharepoint...
  • !!! Key company /corporate /client - partners with proven excellence relevant to SECURITY-AUDIT authorities. Trustwave Inc. , Alert-Logic...hosting from MochaHost, Rackspace...