Skip to main content

What is the TPC-H benchmark?

The Transaction Processing Performance Council Benchmark H (TPC-H) is a decision support benchmark. It consists of a suite of business-oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance.

Large volumes of data are examined to give answers to business-critical questions using complex queries and high levels of concurrency.

Why is the TPC-H benchmark important?

TPC-H is often referred to as the ad-hoc Decision Support (DS) benchmark and is an OLAP workload that measures query analytics in a ‘data warehouse’ context.

Decision Support (DS) queries tend to be far more complex, deal with larger volume of data and are therefore far more demanding than transactional workloads like Online Transaction Processing (OLTP).

Because Decision Support (DS) are so complex, it can be extremely challenging for a database designer to plan accordingly and optimize performance and therefore this kind of query would end up running for hours and even days. Brytlyt’s GPU database reduces this to seconds.

How does the TPC-H benchmark work?

Twenty-two queries run as single user, concurrently, and are based on a typical retail use-case. All but two of the queries contain joins, and they also include aggregations, complex expressions, nested queries and correlated queries.

Aggregations:

  • Occur in all TPC-H queries and group-by performance is important.

Complex expressions:

  • Raw expressions in aggregations, complex expressions in joins, and string matching.

Nested queries and sub-queries:

  • Used to handle intermediate results in the real world.

JOINs:

  • All but two of the queries contain joins.

Correlated queries:

  • Special case of nested query where the subquery uses values from the outer query.

The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries.

These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.

How has brytlyt improved on a record that has stood for 5 years?

More and more businesses are looking at GPU database companies to speed up their data processing times. Brytlyt stand out from the pack because the founder came from a retail analytics background and developed his GPU database to handle complex retail analytics on Big Data. Brytlyt is the first GPU database company to have the capability to run high-speed joins and other complex queries without sacrificing query speed.

In addition, Brytlyt worked closely with Nvidia, using just one of the latest DGX-2 machines to run the queries. The results were demonstrated last month at Nvidia’s annual GTC technology conference in Silicon Valley.

Brytlyt chose to use the DGX-2 for their TPC-H queries as this hardware gives Brytlyt a step change in the GPU footprint of a single server. The DGX-2 is made up of 16 Nvidia v100 GPUs WITH 32GB of VRAM. This gives a total of 512 GB of VRAM and 2 petaflops. Nvidia’s NVSwitch provides 2.4 TB/s of GPU data transfer between GPUs.

Scale factor 1,000 GB (6 billion rows in the lineitem table)
Brytlyt: 2019, DGX-2, Version 3.1 Alpha
Exasol: 2014, twenty machines, TCO $719k
Microsoft: 2017, one machine, TCO $472k

Demonstrating their ability to set new query times for the TPC-H benchmark is a measure of Brytlyt’s maturity in the GPU database space.

Some of the largest retailers in the UK and the US regularly run analytics on a 2-year Electronic Point of Sale (EPOS) dataset using a 10% sample. This sample can be 400GB with in the region of 4 billion rows. What is so relevant about Nvidia’s latest hardware is that this sample dataset can fit onto just 1 machine, and a query that might have taken up to an hour to perform in the retail environment can now be executed in seconds.

Richard Heyns is the CEO and founder of Brytlyt. He had the following comments: “We are very excited with how well Brytlyt’s GPU database performed on Nvidia’s hardware. For quite a while now, I’ve had my eye on the TPC-H benchmark, as success there proves the immense value of our product to many industry verticals including telecommunications, retail and finance. Having software that we have built, breaking records using Nvidia’s machines, is very satisfying. We are incredibly lucky to have some of the world’s finest engineers on the Brytlyt team. We have proved ourselves to be true innovators in the GPU database space.”

Brytlyt has spent the last 5 years developing and improving their software. With our own web visualisation tool, SpotLyt, data scientists and analysts can get the most out of the Brytlyt GPU database with real time analysis and interactive exploration. Spotlyt allows users to visually discover correlations and anomalies in billion row datasets in real time.

What kind of results are brytlyt getting?

Exasol officially holds the top spot, but not for long. Initial results from Brytlyt are showing significantly faster run-times for all but 2 queries, where those 2 queries are currently very equal.

Query 1, for example, is currently running over 4 times faster than the record holder. Brytlyt still have more optimisation to do before formally publishing results – something that competitors like Omnisci and Kinetica have not attempted to do.

Q5 is a good example of Brytlyt’s JOIN capability. Brytlyt have ran query 5 over twice as fast as the current record-holder. The query uses 6 tables and looks like this:

SELECT n_name,

    sum(l_extendedprice * (1 - l_discount)) as revenue

FROM    customer,

    JOIN orders ON c_custkey = o_custkey

    JOIN lineitem ON l_orderkey = o_orderkey

    JOIN supplier ON l_suppkey = s_suppkey

    JOIN nation ON s_nationkey = n_nationkey

    JOIN region ON n_regionkey = r_regionkey

WHERE  c_nationkey = s_nationkey

    r_name = '[REGION]'

    and o_orderdate >= date ‘1995-01-01'

    and o_orderdate < date '1995-01-01' + interval '1' year

GROUP BY n_name

ORDE BY revenue desc;

 

Note that all the TPC-H queries Brytlyt ran returned sub-second results!

About brytlyt

Founded in 2013, Brytlyt’s GPU database acceleration technology, with its patent-pending IP, features:

Astonishing Performance: Brytlyt’s GPU-accelerated database technology is

transforming the way businesses use data. With Brytlyt, companies can query multi-billion row datasets in milliseconds.

Easy integration with existing systems: There’s no need for businesses to give up their current investments in code, analytics, and visualisation. Instead, they can accelerate them with Brytlyt with little to no effort.

Smooth scalability: Businesses can add and remove GPU resources at will, scaling their processing capability to suit their needs, ensuring they can massively reduce data processing costs.

Functionality-rich and easy to use: Brytlyt is built on PostgreSQL with a full SQL editor. Its deep functionality is complimented by outstanding ease of use.

Brytlyt’s mission is to empower organisations through Speed of Thought Analytics.

  • The world’s fastest database according to independent benchmarking.
  • Four years in research and development.
  • Only vendor to have patent pending IP for JOINs.
  • Fourth generation GPUManager bridges the gap between SQL and AI.

The true value of Brytlyt lies in how this extreme performance is packaged for the end user.