By John Ryan
In order to compete, companies are developing the ability to better analyze their ever-changing customers, competitors and market. The rise of mobile devices, location tracking and social networking give companies the opportunity to understand where their customers are, what they are doing, and what they are feeling in any given moment. And as a result, the amount of data we collect is growing exponentially, which consequently impacts query performance. Dig deeper into this issue and you discover that the need for speed is the biggest issue Business Intelligence (BI) users face today.
Why Fast Query Performance Matters?
- The Data Warehousing Institute (TDWI) Best Practices Reports lists Poor Query Response (45%) as the top problem that will eventually drive users to replace their current data warehouse platform.
- According to every edition of the annual The BI Survey, poor query performance is by far the most frequently reported product-related problem.
- In 2010, Garner Magic Quadrant, clients increasingly report performance constrained data warehouses during inquiries. Based on these inquiries, they estimated that nearly 70% of data warehouses experience performance constrained issues of various types.
Google has taught us to expect answers instantly. And business decision makers will not wait minutes, let alone hours, for a BI tool to generate their query. Instead, BI users and data analysts increasingly rely on busy IT departments or Data Warehousing teams to find and aggregate data for them. This severely limits the adoption of BI tools within the organization. It restricts what data end users can explore, and prevents deeper insight as the data is often summarized. It means asking new questions might take weeks or even months to add new dimensions of data to their models. In fact according to 2010 TDWI BI Benchmark Report, on average it takes 7.4 weeks just to add a new data source, and 5 weeks to change a hierarchy (new way of classifying products or organizing sales regions).
The Cost of Poor Query Performance?
Ralph Kimball, the father of Data Warehousing lists Speed as one of the top Design Constraints and Unavoidable Realities of data warehousing. Other unavoidable realities include implementation costs, daily administrative costs, and hardware costs.
- Cost of new BI implementations – According to the 2010 TDWI BI Benchmark Report, it takes on average 6.6 weeks to create a complex report or dashboard. Not only do users find it takes too long for a BI tool to answer their query, but the creation of new BI projects is often lengthy and expensive.
- Cost of daily administration – When query performance is poor, most companies rely on database tuning techniques to aggregate results. The Forrester Research Report 2010 explains that 70% of survey respondents say their requirements change on a monthly, daily or even hourly basis. 51% of respondents said that BI requests tended to accumulated in a backlog, while 66% said that BI requests accumulated precisely because their IT organization already had too much on their plate. The 2010 TDWI BI Benchmark Report meanwhile cites that 25% of the average BI/DW team is allocated to maintenance/change management. With the cost of daily administration already high and backlogs long, many organizations are given no choice but to throw even more resources at the problem.
- Cost of new hardware – When the BI/DW teams are pushed to their performance limit, their first option is usually to ask for more hardware. For some, that means upgrading their hardware to the latest, more powerful model. For others it might mean adding additional hardware via clustering or Multi-Parallel (MPP) solutions. Either way, throwing more hardware at the solution becomes a seemingly never ending cycle. A cycle that, according to the IDC, will only get worse. IDC predict data to grow at 44x by 2020, a rate significantly greater than Moore’s Law.
The costs of delivering a fast, easy to use BI environment appears constrained by escalating costs on all sides. But there is hope. The VectorWise team has been working on a revolutionary technology that eliminates the main cause of slow performance.
Understanding Why Query Performance Is SO Slow
Most would suspect that query performance is slow because there is too much data being processed. However, the inefficient way we process data is also a major cause. The X100 project highlighted that traditional databases were about 100 times slower than hard-coding the same query. Only by understanding all the data bottlenecks, and removing them, can you achieve the best performance on any hardware.
Column databases remove one bottleneck by only extracting the data you need for that query. And for reporting and analytics, most database vendors now have their own flavor. However, the VectorWise team discovered there are a number of other additional bottlenecks slowing performance that were yet to be addressed.
CPUs are constantly evolving and expending with larger caches, more threads and longer pipelines. Computer games have leveraged these new features for years, yet the database had not. Today, many consider in-memory databases fast because RAM is much faster than Disk. Yet moving data from RAM to CPU wastes 300 clock cycles. Data is then decompressed, sent back to RAM for storage, and then finally processed in the CPU. VectorWise usese today’s significantly larger CPU caches to optimize data through-put to eliminate wasted data trips and CPU clock cycles. For big data, VectorWise’s In-Chip computing makes In-Memory seem slow.
Another key innovation to eliminate data bottlenecks is Vector Processing or Single Instruction Multi-Data (SIMD). Rather than repeating an instruction for each bit of data, Vector Processing enables a single instruction to be applied an entire set of data. Each innovation adds an order of magnitude to the performance of the database. VectorWise exploits the full potential of hardware enables queries to run faster so you can do more with less.
Breaking benchmarks in query performance
In 2011, VectorWise smashed every benchmark record it entered with independent benchmarking body The Performance Council (TPC) for 100GB, 300GB and 1TB (non-clustered) records. These include fastest performance, price/performance and energy consumption. Each record broke the previous benchmark by the largest margins ever recorded.
What makes this achievement even more extraordinary is that VectorWise chose to submit the benchmarks using hardware no better than the previous benchmark leaders, to emphasize this was not hardware driven innovation. Robin Bloor from Bloor Research states, “This puts VectorWise 4 years ahead of the competition in terms of performance – and it will remain 4 years ahead until some competitor finds a way to catch up at a software level. This is unprecedented.”
Note: Both Oracle and Microsoft have posted benchmark for the 1TB benchmark post the Vectorwise benchmark. Microsoft used more than twice the cores and twice the RAM with newer hardware, yet could only achieve half the performance of Vectorwise. Oracle’s solution used hardware that cost 10-20x more than what the Vectorwise hardware cost. The Performance Council (TPC)
The Future of BI – A New Paradigm
The aim of VectorWise is to make BI faster and analytics affordable. No longer does there need to be a trade-off. We live in the Information Age, and customers are expecting greater access to their own data. New business models are emerging to aggregate and share data. And new data sources, such as GPS and digital monitoring, enable companies to learn more about their customer than ever. VectorWise enables businesses to make more data more accessible to end users. Performance is the key because end users, particularly those on mobile devices, will not adopt if it takes minutes for their reports to load. If you remove the database bottleneck, BI becomes more accessible. Business Intelligence is about finding insight to act more efficiently. So why not start by making BI faster and more efficient?