Data warehouse architecture is about organizing the building blocks or the components in such a way that they extract more benefit for an enterprise. It’s all up to the requirement of the enterprise whether it wants to stress on a specific component or boost any other component with tools and services.
In this context, we are going to discuss the architecture of the data warehouse. We will also study the building blocks or the component required to build a data warehouse for an enterprise.
Content: Data Warehouse Architecture and its Components
- Difference Between Data Warehouse and Data Mart
- Architecture of Data Warehouse
- Components of Data Warehouse
- Key Takeaways
Difference Between Data Warehouse and Data Mart
Data warehouse and data marts go hand in hand. Some might have thought that the two terms are synonymous. But data warehouse and data mart both are distinct terms. Now, you may get confused about what to build first, data warehouse or data mart.
To decide what we need to build a data warehouse or data mart you must find answers to the following questions.
- Whether you want to take a bottom-up or a top-down approach?
- Whether you want to arrange the data enterprise-wide or you want it departmental-wide?
- Whether you want to build a pilot or you want to start building a full-fledge data warehouse?
- Whether you want o build a dependent data mart or independent data mart?
If you want to build a full-fledge enterprise-wide structure then follow the top-down approach and build a data warehouse. If you want to build a departmental-wide structure then follow the bottom-up approach and build a data mart.
Still not clear then let us overview a tabular representation of the differences of the data warehouse and data mart.
Data Warehouse | Data Mart |
---|---|
Data warehouse has data of the entire enterprise. | Data Mart has data of a specific department in the enterprise. |
Data in a data warehouse can be fed from several data marts. | Data in a data warehouse can be fed from several data marts. |
In data warehouse data is retrieved from various staging areas. | In data warehouse data is retrieved from various staging areas. |
It is complicated and difficult to build. | It is less complicated and easy to build. |
Now, I hope the picture is a bit clear and you are able to understand what you have to build according to your requirement. As in this section we have to emphasis on the architecture of data warehouse so let’s discuss the architecture of data warehouse.
Architecture of Data Warehouse
A data warehouse can be built by using both the approaches i.e. top-down as well as bottom-up. You can first build a data warehouse that would feed dependent data marts or you can build a data warehouse by combining all the data marts.
Let us discuss the top-down and bottom-up approach to build a data warehouse. We will discuss both the approaches along with their advantages and disadvantages.
1. Top-Down Approach
The top-down approach is building an enterprise-wide data warehouse. This approach retrieves data from several operational systems. Entire data is integrated and then the inconsistencies in the data are removed. Then the data is transformed into an acceptable format and kept in the data warehouse.
This kind of approach builds a data warehouse that is prone to failure and does not yield any information soon.
Advantages
- It represents an enterprise-wide view.
- It does not have data from disparate data marts.
- The data and the control over the data are centralized.
- It may yield quick results if prepared with iterations.
Disadvantages
- It needs a long time to build a data warehouse with this approach even if implemented with the iterative method.
- Data warehouse with this approach is at high risk of failure.
- It requires experienced professionals to implement this approach.
2. Bottom-Up Approach
With this approach, you have first built data marts for each department of your enterprise. You may put a sequence from which department you have to start implementing the data mart. Data from all these data marts are then combined and fed to the data warehouse. As all these data marts are independent, they hardly have a view of the requirements of the entire enterprise.
Advantages
- It is easy to build a data warehouse with this approach as we have to deal with the manageable fragments of data.
- It has less risk of failure.
- It is an incremental approach as the important data marts can be implemented first and later build other data marts to grow data warehouse.
Disadvantages
- As data marts are independent, they have a narrow view of the specific department.
- Data from data marts may have inconsistencies.
- Data from data marts may have redundancy.
Building Blocks or Components of Data Warehouse
Studying the approaches to build the data warehouse now we will look into the building blocks or the components required to build the data warehouse.
1. Source Data Component
The source data component is classified into four types:
a. Production Data
An enterprise may have several operational systems from which the data has been collected. Based on the informational requirement of the enterprise data segments are collected from different operational system.
As data is from the different operational system it may be of different format. The important characteristic of production data is disparity. So, the challenge is to standardize data and transform it into a suitable format. Then integrate these pieces of data and store them in a data warehouse.
b. Internal Data
Every enterprise maintains the internal data structure such as they have customers profile, documents, spreadsheet etc. this data is important for the data warehouse. Storing internal data to the data warehouse is a little bit complicated. So, a certain tactic must be introduced to collect and store internal data to the data warehouse.
c. Archived Data
As we have studied earlier that the data warehouse has historical data; archived data is the source of historical data. The operational system is meant to run the day-to-day business. So, the old data from the operational system is removed and kept in the archived files.
The old data is removed in stages like, in the first stage the latest data is stored in the ‘archival database’. The second stage might be storing the older data to ‘files’ or ‘disks’. In the next stage, the oldest data can be stored to ‘magnetic tapes’.
d. External Data
External data is the statistical data that is produced by the ‘external agencies’ for your company. These kinds of information are received at a regular interval. But there is one big problem this data may be in the different format as of yours.
So, you must set a strategy and convert this outside data to your internal format.
2. Data Staging Component
After the data is collected from the source you have to extract the useful data, transform it into a suitable format and load it carefully into the data warehouse. Data staging performs all these three functions of extracting, transforming and loading data into the data warehouse.
Data extracting stage has to deal with various sources. In the market, the tools are also available for extracting the data but using the outside tools only increases the cost. The extracted data must be placed in separate physical storage as it eases the transferring of data into a data warehouse.
Once the data is extracted it transformed into the suitable format for that it is cleaned standardized and summarized.
The data is now ready to load. The initial loading of data takes a long time. Once the data warehouse is active you have to feed data on an incremental basis.
3. Data Storage Component
A data warehouse is a data repository where the data must be stored in a format which is easy to analyze. The first loading of data to the data warehouse is time-consuming and circumstances of the enterprise decide the revision of data.
4. Information Delivery Component
The data warehouse has to provide information to a wide variety of users who do not have any prior training of retrieving data from the data warehouse.
Information delivery component implements several techniques to provide information to the users. Like user can request for the data online and would get the information online.
User can also use the intranet for getting the information or you may get the reports through emails or users can also use the inter`net to get the information.
5. Metadata Component
Metadata is also a component of the data warehouse. The metadata definition is similar as we have in the database. It has data about the information stored in the data warehouse.
6.Management and Control Component
This component control development of data from staging phase until the data is stored in the data warehouse. It controls the data transfer from component to component and also the transformation of data into a suitable format. On an all it controls the activities and services of the data warehouse.
Key Takeaways
- The data warehouse has data that represents an entire enterprise.
- Data warehouse architecture has two approaches top-down and bottom-up approach.
- The building blocks of a data warehouse are source data component, data staging component, data storage component, information delivery, metadata and management control component.
So, this is all about the architecture of the data warehouse and the components that are used to build the data warehouse. We have also seen how the data warehouse is different from the data mart.
Leave a Reply