Describe the processes associated with data extraction, cleansing, and transformation tools.
What will be an ideal response?
The extraction step targets one or more data sources for the EDW; these sources typically
include OLTP databases but can also include sources such as personal databases and
spreadsheets, enterprise resource planning (ERP) files, and web usage log files. The data
sources are normally internal but can also include external sources, such as the systems used
by suppliers and/or customers.
The transformation step applies a series of rules or functions to the extracted data, which
determines how the data will be used for analysis and can involve transformations such as data
summations, data encoding, data merging, data splitting, data calculations, and creation of
surrogate keys. The output from the transformations is data that is clean and consistent with
the data already held in the warehouse, and furthermore, is in a form that is ready for analysis
by users of the warehouse.
The loading of the data into the warehouse can occur after all transformations have taken place
or as part of the transformation processing. As the data loads into the warehouse, additional
constraints defined in the database schema as well as in triggers activated upon data loading
will be applied (such as uniqueness, referential integrity, and mandatory fields), which also
contribute to the overall data quality performance of the ETL process.
You might also like to view...
________ networks use cell towers to transmit voice and data over large distances
Fill in the blank(s) with correct word
PowerPoint 2010 templates cannot be modified by a user
Indicate whether the statement is true or false