You shifted to Greenplum database, NOW WHAT?

28th June 2017 by Brytlyt

Raise your hand if you are an enterprise product decision-maker and have never regretted a product purchase. Chances are the vast majority of us would keep our hands down and just look at each other and smile sheepishly, remembering a product we bought to solve one problem, only to find it created many more.

Even the most well-thought out and carefully planned purchase decisions can go awry at times, and most CIO/CTOs have faced this situation at some point. Sometimes a lot of money, time and resource is invested in a product that causes more headaches than it cures.

Take for example Greenplum database.

In its day, Greenplum was the way to run SQL queries on Big Data via Massively Parallel Processing (MPP).

Looking at it as a CIO/CTO at that time, Greenplum was brilliant. To start with, it’s based on an existing open-source relational database (PostgreSQL), with its great community-developed infrastructure and adapters. It also has an MPP back end for storage and query performance grafted on, and was being widely adopted by CIOs and CTOs during its peak and even some very high-profile enterprises like Allstate were also using Greenplum.

Up to 2015, Greenplum was the leading enterprise MPP-based analytics database for the following reasons:**

  1. It was built using a shared-nothing architecture with collocated storage and compute. It supported parallel loading from diverse structured data sources and Apache Hadoop data lakes and could run massively parallel high-performance ad-hoc queries. This enabled Greenplum to be deployed in a diverse set of data pipeline processing architectures. Also, its hardware capacity could be expanded on an incremental basis with automatic or controlled load redistribution, minimising lifecycle management costs.
  2. At the time, Greenplum was the most flexible and adaptable product out there. It had polymorphic storage that enabled columnar and row-based storage simultaneously and it could be used for scanning large volumes of data and small lookups just as easily. This enabled a Greenplum database to scale up to handle large data sets with thousands of columns as well as scale down in terms of cost and latency to handle smaller data sets. Column level compression, flexible indexing and partitioning provided full control to enterprises to trade off performance with cost.
  3. It offered the most advanced analytics of the time. In addition to OLAP queries such as cube and grouping set operations, Greenplum database had the richest support at the time for massively parallel machine learning capabilities invoked from SQL, Python, R, etc.

So what could go wrong?

Many companies adopting Greenplum discovered the product was very difficult to tune. Greenplum tries to use all the available system resources for every single query, which led to instability when executing multiple queries in parallel. Although this got fixed to an extent, there were other problems.*

Greenplum slowed down when disk caching was enabled. Additionally, under high write load, Greenplum caused Linux journaling errors that required rebuilding the entire database, which in itself was a two-day process.

And with Pivotal, Greenplum’s parent company, effectively washing their hands of it when they took it open-source, many companies have been left stranded. After open-sourcing the project, Pivotal is not spending energy improving the product, and nor does it seem likely that the open-source community will ever embrace Greenplum, effectively leaving many companies between a rock and hard place.

The good news is, now there are some great options that can help companies address the shortcomings of Greenplum without throwing the baby out with the bathwater.

With Brytlyt’s GPU database and analytics platform, especially since it is also based on PostgreSQL, Brytlyt can help any company using Greenplum today.

What Brytlyt can offer to companies using Greenplum database?

  1. Brytlyt solves the biggest problem facing Greenplum users today – Performance. Especially when many users query it at the same time. A single line of code is all that’s needed to connect Brytlyt to Greenplum, and with Brytlyt’s basis in PostgreSQL, any code originally developed for Greenplum, will run out of the box on Brytlyt. Plus, Brytlyt has been designed to run large data processing workloads on Graphics Processor Units (GPU), which means any query will run up to 100x faster. So not only can businesses continue to use their analytics, SQL code and data visualizations developed for Greenplum, they get the benefit of it all running an order of magnitude faster!
  2. The second biggest problem for Greenplum users is data corruption and stability. By moving the data analysis workloads to Brytlyt, Greenplum resources are released , stress on the system reduced and therefore data coruptions less likely.

Effectively by combining Brytlyt with Greenplum, most technology teams can resolve the problems of Greenplum while effectively giving a massive performance boost to their data queries – killing multiple birds with one stone and finally removing Greenplum from their list of mistakes.

*The rise and fall of GreenPlum database:
** Why MPP based analytical databases are still key for enterprises: