The top three mistakes I have seen in data warehousing projects

A guest blog by Don DeLoach, CEO of Infobright

Having been a supplier of technology for many years, I have talked to a lot of prospects and customers and seen a lot of projects. Most projects are reasonably good, some are great, and then some others are disasters. The disasters are probably more prominent than they should be, but over time I have noticed there are a number of common themes here. In particular, there are three really common mistakes I see often:

Mistake One: the Big Bang Theory

This is the single most common mistake. The project plan envisions the “perfect deliverable” on day one of production. The team is not content with delivering something that is usable, but far short of optimum. It needs to to be the ultimate data warehouse, capable of mesmerizing its users from the onset. What generally happens is the spec gets over-engineered, the timetable is extended over and over as target dates are missed, and the budget is reworked and increased accordingly. It takes money to have people work for extra months at a time. But the real kicker is that when it finally gets to the day of reckoning, where the non-believers are put in their place based on what will surely be an out of body data warehousing experience, the unveiling falls flat. I mean, really flat. Why do you think this is? You would think with so much more time and extra money, the end result should be fantastic, right? It should, but that is seldom the case. This is for two reasons. First, the fabulous end result, when doing a big bang approach, is seldom the same target by the time the result is delivered. In other words, nirvana as defined at the project start eighteen months earlier, no longer looks like nirvana now. Second, the knowledge of the cost and time overruns has the net effect of raising expectations. Basically, people say “this thing must be really wonderful if it costs this much and takes this long”. Whoops.

Mistake Two: Great map, but no destination

In general, the plans that go into data warehousing projects are often well thought out… to a point. Well thought out like political positions. Strong on those ideas that we all want to aspire to, yet a bit short on specifics. Such as is it able to deliver insightful information about our operations, or it will allow us to probe the depth of our data to determine meaningful trends to guide our business? All good. Yep. Except when you find out that the report you tailored the warehouse for runs in 10 seconds, but the query you ran to see what the main referring sites were to the promotion you ran last week where the conversion rate was over 12%, took 2 1/2 hours. The best projects start small, and address the very real, very specific requirements that need to be addressed. These are not lofty and nebulous designs, but specific definitions of what needs to be done.

Mistake Three: Forgetting that it’s a swimming pool

Have you ever owned, or do you know anyone who owned a pool? They are really great fun. More so in the summer, actually, but really fun. But here’s the thing. The cost up front is just the beginning. Opening up at the start of the season…bucks. Weekly chemical treatments…more bucks. Skimming? Cleaning? Mechanical systems? Bucks, bucks, bucks. End of summer, time to close the pool…more bucks still. And so it goes with data warehouses. It’s not just the cost of the hardware and the software. It’s not even just the additional cost of getting it set up in the first place. Depending on what you are doing, what tools you are using, and what problems you are trying to solve, you will likely spend the majority of your money on people to keep it going. That’s right, the pool guy meets the data warehouse guy. Only instead of skimming and adding chemicals he (or she) is indexing, partitioning, and possibly adding additional disk subsystems or more cores as well. In my experience this ongoing cost is often not heavily considered, and more often underestimated. And just as the pool suppliers don’t emphasize this ongoing cost when they sell you the pool, neither do data warehousing vendors.

Have you ever heard of salt water pools? While probably it is not for everyone, they are indeed growing in popularity. Do you know why? You guessed it, they can be maintained for a slight fraction of the requirements of traditional pools. A fraction. As technology evolves, there are equivalents to salt water pools, requiring less ongoing resources yet perfectly if not uniquely suited for certain situations.

All in all, technology has evolved to an exciting place as we come to grips with the emerging world of Big Data. The norms of yesterday are not the de facto norms of today, and the big mistakes most commonly made in the past do not need to be repeated.

Don DeLoach is President & CEO of Infobright Inc. a database vendor with technology that combines a column-oriented database and a ‘knowledge grid’ architecture. The Infobright database software is integrated with MySQL but has its own proprietary data storage and query optimization layers. Infobright focus on the machine generated data market.

Got something to say?  Please write an article for us as a Guest Blogger!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.