Skip to main content

Dump/Restore GPU Table

dump_gpu_table / restore_gpu_table

These tools provide fast dumping tables into shared memory and restoring them as well.

In general, there are two DB functions added to the gpu_manager_fdw extension:

  • dump_gpu_table(<table_name as text>)
  • restore_gpu_table(<table_name as text>)

They may be called on the coordinator and directly on the datanode, e.g.:

SELECT dump_gpu_table('lineitem')

or

EXECUTE DIRECT ON (d0) 'SELECT dump_gpu_table(''lineitem'')'

From the technical side, the first example will execute an equivalent of this:

EXECUTE DIRECT ON (d0,d1,d2,...,dN) 'SELECT dump_gpu_table(''lineitem'')'

Comments

Please note, that if you run it inside docker container, you need to run this container with –shm-size option to specify the size of
shared memory.

Tutorial

Let’s assume we have this table:

CREATE FOREIGN TABLE PART
(
       P_PARTKEY INTEGER NOT NULL,
       P_NAME VARCHAR(55) NOT NULL,
       P_MFGR VARCHAR(25) NOT NULL,
       P_BRAND VARCHAR(10) NOT NULL,
       P_TYPE VARCHAR(25) NOT NULL,
       P_SIZE INTEGER NOT NULL,
       P_CONTAINER VARCHAR(10) NOT NULL,
       P_RETAILPRICE REAL NOT NULL,
       P_COMMENT VARCHAR(23) NOT NULL,
       placeholder VARCHAR(1)
)
SERVER
       gm_fdw_server
OPTIONS
(
       max_size '10000'
)
DISTRIBUTE BY REPLICATION;

 

And we load data on all datanodes using this command (an explanation of DataLoader you can found here):

EXECUTE DIRECT ON
      ( data_datanode0, data_datanode1, data_datanode2, data_datanode3, data_datanode4, data_datanode5,
data_datanode6, data_datanode7 )'
      SELECT load_data(
      ''part'',
      ''/home/postgres/part_0.tbl'',
      ''|'',
      100000,
      ''0'',
      '''',
      ''append''
)';

 

Let’s check the amount of loaded data:

brytlyt=
# EXECUTE DIRECT ON ( data_datanode0, data_datanode1, data_datanode2, data_datanode3, data_datanod
e4, data_datanode5, data_datanode6, data_datanode7 ) 'SELECT COUNT(*) FROM part';
count
-------
10000
10000
10000
10000
10000
10000
10000
10000
(8 rows)

Now, let’s dump the data on data_datanode2 and check again the data:

EXECUTE DIRECT ON ( data_datanode2 ) 'SELECT dump_gpu_table(''part'')';

brytlyt=# EXECUTE DIRECT ON ( data_datanode2) 'SELECT COUNT(*) FROM part';
count
-------
0
(1 row)
brytlyt=
# EXECUTE DIRECT ON ( data_datanode0, data_datanode1, data_datanode2, data_datanode3, data_datanod
e4, data_datanode5, data_datanode6, data_datanode7 ) 'SELECT COUNT(*) FROM part';
count
-------
10000
10000
10000
10000
10000
10000
0
10000
(8 rows)

 

Now let’s dump entire table and do SELECT COUNT:

brytlyt=# SELECT dump_gpu_table('part');
dump_gpu_table
----------------
(1 row)
brytlyt=
# EXECUTE DIRECT ON ( data_datanode0, data_datanode1, data_datanode2, data_datanode3, data_datanod
e4, data_datanode5, data_datanode6, data_datanode7 ) 'SELECT COUNT(*) FROM part';
count
-------
0
0
0
0
0
0
0
0
(8 rows)

 

So, the data is erased from the GPU. Now, let’s restore the data on the data_datanode7:

 brytlyt=
# EXECUTE DIRECT ON ( data_datanode7 ) 'SELECT restore_gpu_table(''part'')'; restore_gpu_table
-------------------
(1 row)
brytlyt=# EXECUTE DIRECT ON ( data_datanode7) 'SELECT COUNT(*) FROM part'; count
-------
10000
(1 row)

 

And now let’s do it on all datanodes:

 brytlyt=# SELECT restore_gpu_table('part');
restore_gpu_table
-------------------
(1 row)
brytlyt=
# EXECUTE DIRECT ON ( data_datanode0, data_datanode1, data_datanode2, data_datanode3, data_datanod
e4, data_datanode5, data_datanode6, data_datanode7 ) 'SELECT COUNT(*) FROM part';
count
-------
10000
10000
10000
10000
10000
10000
10000
10000
(8 rows)

 

As you can see, data is again on GPU.