Wednesday, March 12, 2014

Learning PostgreSql: old versions of rows are stored right in the table

PostgreSql features multi-version concurrency control aka MVCC. To implement MVCC, old versions of rows are stored right in the same table, and this is very different from what SQL Server does, and it leads to some very interesting consequences. Let us play with this thing a little bit, but first we need to set up some test data.

Setting up.

First of all, let us create a numbers table. Any production database must have it anyway:
CREATE TABLE Numbers(i INTEGER);
INSERT INTO Numbers(iVALUES(0),(1);
CREATE OR REPLACE FUNCTION PopulateNumbers()RETURNS VOID AS$BODY$
DECLAREk INTEGER;BEGIN
FOR 
IN 1..19 LOOP
    INSERT INTO Numbers(iSELECT POW(2,kFROM Numbers;END LOOP;END;$BODY$
LANGUAGE plpgsql ;
SELECT PopulateNumbers();
   
Next, let us use the numbers table to create another test table:
CREATE TABLE public.TestUpdateAS
SELECT 
iAS SomeValueCAST('Let''s add some space to make the table wider' AS VARCHARAS SpaceFillerFROM public.Numbers;
ALTER TABLE public.TestUpdate ADD CONSTRAINT PK_TestUpdate PRIMARY KEY(i); 

Updates insert new versions of rows without modifying old ones.

Whenever we update a row, we essentially insert a new version, and the old version stays in the same table. This behavior is completely different from SQL Server, and leads to many consequences we need to be aware of.
For example, if we update all the rows in a table, it can grow twice as big. Let us see for ourselves.
At this time the test table's size is 84 Mb, as returned by the following script:
select pg_size_prettypg_relation_size('public.TestUpdate')); 
Presumably the following query scans all the rows in the table:
SELECT MIN(SomeValue), MAX(SomeValueFROM public.TestUpdate; 
This query returns 2 for both MIN and MAX - this is how we populated our test table. 
EXPLAIN output clearly demonstrates that all 1M rows are scanned:
EXPLAINSELECT MIN(SomeValue), MAX(SomeValueFROM public.TestUpdate;
"  ->  Seq Scan on testupdate  (snip) rows=1048576 "
Let us update all but one rows in this table, as follows:
BEGIN TRANSACTION;UPDATE public.TestUpdate SET SomeValue=i;
Query OK1048576 rows affected (execution time: 5.912 sectotal time: 5.912 sec)
After the update the test table's size is 169 Mb, as returned by the following script:
select pg_size_prettypg_relation_size('public.TestUpdate'));
Both uncommitted new versions and old versions are stored in the same table, which is why it uses up twice as much storage.
The following query, executed from the same connection, shows that although the table still has exactly 1M current rows, 2M-1 rows are scanned: both original data and new versions:
EXPLAINSELECT MIN(SomeValue), MAX(SomeValueFROM public.TestUpdate;
(snip) Rows=2097055
We can rerun this query from another connection, and see the original unupdated values, with MIN and MAX both equal to 2: 
SELECT MIN(SomeValue), MAX(SomeValueFROM public.TestUpdate;
Yet if we EXPLAIN this query, we still see that 2M-1 rows are scanned.

Rollback is instantaneous.

It took the database engine almost 6 seconds to modify the table. Yet when we rollback the transaction, it rolls back immediately, because the original unupdated rows are still in the table:
Query OK, 0 rows affected (execution time: 0 ms; total time: 0 ms)
In SQL Server, one would expect the rollback to take considerable time, because the update took considerable time as well.
The table does not shrink, however: it's size is still 169 MB.

The extra storage temporarily used by the uncommitted transaction cannot be reused yet

At this time we have 1M current versions and 1M-1 old versions in the table. Let us update the table one more time:
UPDATE public.TestUpdate SET SomeValue=i; 
The table size grows again to 253 MB. Even though we have rolled back the update, the storage used by rolled back versions is not available yet, so new storage is allocated for the new versions.
At this time, for every current row in the table we have two old versions. As a result, even though the table still has 1M rows, exactly as before, its size has grown three times.

Using VACUUM to reclaim unused storage 

To reclaim the unused storage, we need to VACUUM the table, either explicitly, or via an automated process. Let us do it explicitly:
VACUUM public.TestUpdate;Query OK0 rows affected (execution time: 31 mstotal time: 31 ms) 
Having reclaimed the free space, we can add 1M rows, and the table size stays the same:
INSERT INTO public.TestUpdate(isomeValueSpaceFiller)SELECT 1048576someValueSpaceFillerFROM public.TestUpdate;
Let us add 1M rows one more time, and the table size still stays the same, even though the table now has 3 M rows:
INSERT INTO public.TestUpdate(isomeValueSpaceFiller)SELECT 2*1048576someValueSpaceFillerFROM public.TestUpdateWHERE i>=0; 

VACUUM and real execution costs 

Currently our table has 3M rows. Let us delete 2M rows:
DELETE FROM public.TestUpdate WHERE i<0;
Even though the changes are committed, the selects against our table still have to scan all 3M rows:
EXPLAINSELECT MIN(SomeValue), MAX(SomeValueFROM public.TestUpdate;
"  ->  Seq Scan on testupdate  (snip) rows=3735300 "
Let us VACUUM the table, and rerun the EXPLAIN. It will show that less rows are scanned:
(snip) rows=1591111
Yet the number of scanned rows is still about 50% more than the exactly 1M rows in our table. Clearly some space is not reclaimed yet. VACUUM FULL is our friend - after it completes, EXPLAINshows that exactly 1M rows are scanned.

Conclusion

We have seen several significant differences between PostgreSql and SQL Server, caused by the way MVCC is implemented in PostgreSql. As we continue migrating functionality to PostreSql, we need to be aware of these profound differences, so that the modifications against PostgreSql perform well.
On a side note, I am really impressed by the quality of PostgreSql online documentation.

No comments:

Post a Comment