Olap, Data Marts and Warehouses, Three-Tier Architecture and Asp
Essay Preview: Olap, Data Marts and Warehouses, Three-Tier Architecture and Asp
Report this essay
WEEK 4 INDIVIDUAL PAPER
OLAP, DATA MARTS AND WAREHOUSES,
THREE-TIER ARCHITECTURE AND ASP
DBM405
OLAP, Data Marts and Warehouses, Three-Tier Architecture and ASP
The term OLAP stands for On-Line Analytical Processing. OLAP is a technology used to process data a high performance level for analysis and shared in a multidimensional cube of information. The key thing that all OLAP products have in common is multidimensionality, but that is not the only requirement for an OLAP product.
An OLAP application is targeted to deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds. Impatient users often assume that a process has failed if results are not received with 30 seconds, and they are apt to implement the 3 finger salute or Alt+Ctrl+Delete unless the system warns them that the report will take longer. Even if they have been warned that it will take significantly longer, users are likely to get distracted and lose their chain of thought, so the quality of analysis suffers. This speed is not easy to achieve with large amounts of data, particularly if on-the-fly and ad hoc calculations are required. A wide variety of techniques are used to achieve this goal, including specialized forms of data storage, extensive pre-calculations and specific hardware requirements, but a lot of products are yet fully optimized, so we expect this to be an area of developing technology. In particular, the SAP Business Warehouse is a full pre-calculation approach that fails as the databases simply get too. Likewise, doing everything on-the-fly is much too slow with large databases, even if the most expensive server is used. Slow query response is consistently the most often-cited technical problem with OLAP products.
OLAP is used for mainly for analysis. This means that the system copes with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user. This analysis is done in the applications own engine or in a linked external product such as a spreadsheet. All the required analysis functionality can be provided in an intuitive manner for the target users. This could include specific features like time series analysis, cost allocations, currency translation, goal seeking, ad hoc multidimensional structural changes, non-procedural modeling, exception alerting, data mining and other application dependent features.
The OLAP system implements all the security requirements for confidentiality. Not all applications need users to write data back, but for the growing number that does, an OLAP system handles multiple updates in a secure manner.
Multidimensional data is a key requirement. If one had to pick a one-word definition of OLAP, this is it. The OLAP system provides a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies, certainly the most logical way to analyze your business or organization.
Information is gathered based on business needs, wherever it is and however much is relevant for the application. The sure capacity of various applications in terms of how much inputted data, differs greatly — the largest OLAP applications can hold at least a thousand times as much data as the smallest. Many considerations are made here, including data duplication, memory requirements, disk space utilization, performance, integration with data warehouses and the like.
DATA WAREHOUSE AND DATA MART
Most data in OLAP applications originates in other systems. However, in some applications (such as planning and budgeting), the data might be captured directly by the OLAP application. When the data comes from other applications, it is usually necessary for the active data to be stored in a separate, duplicated, form for the OLAP application. This may be referred to as a data warehouse or, more commonly today, as a data mart.
The most common uses for a data warehouse include performance, multi-data stores, data cleansing, data adjusting, timing, and historical analysis.
Data warehouses are often large, but are nevertheless used for unpredictable interactive analysis. This requires that the data be accessed very rapidly, which usually dictates that it be kept in a separate, optimized structure which can be accessed without damaging the response from the operational systems.
Most data marts multiple feeder systems from other sourced data, possibly including external sources and even desktop applications. The process of merging these multiple data feeds can be very complex, because the underlying systems probably use different coding systems and may also have different periodicities. For example, in a multinational company like Wyeth, it is rare for departments in different countries to use the same coding system for suppliers and customers, and they may well also use different ERP systems.
It is common for transaction systems to be full of erroneous data, which needs to be cleansed before it is ready to be analyzed. Apart from the small percentage of accidentally mis-coded data, there could be examples of optional fields that have not been completed. For example, many companies would like to analyze their business in terms of their customers vertical markets. This requires that each customer (or even each sale) be assigned an industry code; however, this takes a certain amount of effort on the part of those entering the data, for which they get little return, so they are likely, at the very least, to cut corners. There may even be deliberate distortion of the data if sales people are rewarded more for some sales than others: they will certainly respond to this direct temptation by adjusting (ie distorting) the data to their own advantage if they think they can get away with it.
Data marts are often used for adjusting data before it can be used for analysis. In order that this can be done without affecting the transaction systems, the data needs to be kept separate. The data in stored in these warehouses more often than not comes from sources and it is very likely that they are updated on different cycles. At any one time, therefore, the data may be at different stages of update. For example, the month-end updates may be complete