This article was originally written for Conspectus Magazine in December 2006 and has been updated in November 2009 by the original author
Despite all the hype from vendors the basics of data warehousing have remained fundamentally unchanged – extract data from multiple source systems, reformat the information into an easy to query structure, load it into a dedicated database and add an effective user interface to allow users to query the information. The cost of this environment is substantial and directly relates to the complexity of the Extract, Transform and Load (ETL) process and the volume of data held in the system.
The complexity of the ETL process has two cost impacts: the first is in the cost of the initial design and development and is reasonably well understood. The second is the cost of changes over the lifetime of the system, for example if an organisation have four source systems and each system under goes a change once a quarter then the data warehouse support team have to modify and test an interface every three weeks, and all this without any changes in the users requirements. The volume of data also hits the bottom line, not only in the cost of storage but in the size and (more expensive) skills of team required to support it, especially as data explosion forces the business to enter the very large database arena where load time and user query performance are critical.
Against this background it is unsurprising that vendors are looking to compete by reducing storage, improving query times and simplify administration. Oracle have taken steps to enhance their core database engine with features such as Exadata that improve each of these areas and continue to develop their strategy, however more and more is built into the core of its flagship general purpose engine resulting in software that has many features not needed by a specific application. Sybase have taken the more radical step of creating an entirely new database engine called Sybase IQ that does away with some of the limitations required of a general purpose engine to produce a solution that is both much faster in load and user query performance and far more efficient in its disk usage than other general purpose databases. The other traditional database vendors have all upgraded their product suite to chase the Business Intelligence market
Into this market enters the data warehousing appliance vendors, a breed of dedicated hardware and software solutions designed to solve a business’ data warehousing woes. Such systems use low cost commodity components in large volumes with dedicated business intelligence engines to deliver radically faster load times whilst at the same time reducing the query times and simplifying the systems administration process.
The first hurdle for many organisations is that data warehousing appliances are to some extent proprietary and therefore going against a corporate policy of open systems to allow technology re-use, however a solution built on one of the current market leading platforms, Teradata, is no less so. In fact Teradata can be considered one of the original data warehouse appliances and it is the use of the low-cost commodity components and the ability to achieve massive parallelism by the new-comers that differentiates them.
The second hurdle is credibility – the promises of such large benefits (typically query performance of ten to fifty times faster whilst using three to six times less storage on a platform that only requires a small amount of systems administration support) will be doubted, especially by systems and database administrators who have had to work so hard to maintain the performance of the existing solution. Vendors such as Netezza have overcome this challenge with some key accounts by providing a system on the basis that if it meets agreed performance criteria it will be purchased and thus significantly reducing the risk to the purchasing company.
The final obstacle is migration and its associated cost: an existing solution that is built, for example, on an Oracle database, using Oracle Warehouse Builder and Oracle Discoverer is effectively proprietary and therefore more difficult, but not impossible, to migrate. This is also a reason to review the existing data warehousing architecture now to ensure that when these and other new technologies come along the business will be able to take advantage of them. Companies will be able to get a clear competative advantage if, by architecting their Business Intelligence systems into functional components, they can quickly change and adopt bigger, faster, cheaper technologies.
Those organisations that have overcome the hurdles report that they are achieving the immediate huge performance gains for their queries without the need for tuning the database whilst lowering the disk footprint and reducing the support costs. The systems also continue to deliver benefit as the fast query times allow more complex data models to be queried, which in turn reduces the need for complex ETL to restructure the data. These changes to the data model and to reduce the complexity of the ETL can be made either as part of the migration project (which delivers the largest benefit quickly but at the greatest risk) or as part of the change management process for the source systems (which delivers benefit over a longer time frame but significantly reduces the risk).
Added to this is the emergence of MapReduce, originally develeoped by Google, it is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. This is becoming a must-have feature for appliance vendors handling very large data sets.
There are now a significant number of vendors working to produce some form of data warehousing appliance (Netezza, GreenPlum, AsterData, etc. to name but a few) and it is clear that appliances are going to form a key part of data warehouse architectures going forward, the risks of using a smaller vendor and a proprietary solution being outweighed by the business benefit of much more timely information at a significantly reduced cost. Note also that there will be market consolidation and some vendors will disappear.
For further information analyst Curt Monash is just one of a number of analysts who follow this subject and provides regular updates on the market.
Read the original version of the article
This article was originally published on BIonRails, another Data Management & Warehousing website