Wednesday, March 12, 2014

Learning PostgreSql: Embracing Change With Copying Types and VARCHAR(NO_SIZE_NEEDED)

PostgreSql 9.3 allows us to declare parameter types to match column types, aka Copying Types. Also it allows us to omit the length of VARCHAR fields, without any performance penalty. These two features make PostgreSql a great back end for agile development, because they make PL/PgSql more resilient to changes. Both features are not in SQL Server 2008 R2. I am not sure about later releases of SQL Server.
Let us discuss them in more detail and see why they are so useful.

Using Copying Types

Suppose that we have the following table:
CREATE TABLE MyData(ID SMALLINT, SomeValue REAL);INSERT INTO MyData(ID, SomeValue) VALUES(1, 1.23); 
The following two functions do the same thing. The second one uses copying types, and as such it will be more robust should the table structure change. We shall see that soon.

CREATE OR REPLACE FUNCTION GetSomeValueById(p_ID INT) RETURNS TABLE(SomeValue REAL) AS$BODY$BEGIN
    RETURN QUERY SELECT MyData.SomeValue FROM MyData
                 WHERE ID = p_ID;END;$BODY$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION GetSomeValueById_Robust(p_ID MyData.ID%TYPE) RETURNS TABLE(SomeValue MyData.SomeValue%TYPE) AS$BODY$BEGIN
    RETURN QUERY SELECT MyData.SomeValue FROM MyData
                 WHERE ID = p_ID;END;$BODY$
LANGUAGE plpgsql;

Let us change the table structure, as follows:

ALTER TABLE MyData ALTER COLUMN ID Type NUMERIC(10,2);ALTER TABLE MyData ALTER COLUMN SomeValue Type FLOAT;
INSERT INTO MyData(ID, SomeValue) VALUES(1.34, 1.234567890101112);

All we need to do to get the second function working is drop and create it:

DROP FUNCTION GetSomeValueById_Robust(MyData.ID%TYPE);
The first function needs more work - we have to explicitly change all the hard-coded types:
DROP FUNCTION GetSomeValueById(p_ID INT);
CREATE OR REPLACE FUNCTION GetSomeValueById(p_ID NUMERIC) RETURNS TABLE(SomeValue FLOAT) AS$BODY$BEGIN
    RETURN QUERY SELECT MyData.SomeValue FROM MyData
                 WHERE ID = p_ID;END;$BODY$
LANGUAGE plpgsql;
As we have seen, copying types reduces the need for tedious error-prone busywork after changes to table structure.

Using VARCHAR without explicitly specifying the size

Theoretically, we might want to specify the length of VARCHAR columns, and use that length as some kind of CHECK constraint. 
However, in practice I usually do not do it - typically this approach is not good enough, for two reasons. 
First reason: usually we do not want silent truncation of values that are too long, as shown in the following example:
DECLARE @ConfirmationCode VARCHAR(6);SET @ConfirmationCode = '1234567';SELECT @ConfirmationCode AS ConfirmationCode;
ConfirmationCode----------------123456
Usually we want the code to blow up rather than silently truncate a wrong value. CHECK constraints do exactly what we want: blow up. 
This is why we typically prefer to use CHECK constraints to validate the length of a VARCHAR value. 

Second reason: changing column length may require changes of child tables and parameters. 
Should we have to change the length of ConfirmationCode in a parent table, for example, we will have modify all the child tables that 
refer to ConfirmationCode. 
This process is slow and expensive.
Similarly, we may have to comb through all our code, modifying all the occurrencies of the parameter @ConfirmationCode VARCHAR(6). 
This is error-prone and inefficient.
This is why in our SQL Server development we typically declare our VARCHAR columns wider than necessary, 
and enforce the length limit via a CHECK constraint.

In other words, in my practice the need to specify the length of VARCHAR columns is an inconvenience.
In PostgreSql, this inconvenience is eliminated we are recommended not to specify the length of VARCHAR columns at all. 
This carries no performance penalty, as discussed here
http://people.planetpostgresql.org/dfetter/index.php?/archives/24-VARCHARn-Considered-Harmful.html
 
and here:
http://www.postgresql.org/docs/9.3/static/datatype-character.html 
Quoting from the last link: "While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead." 

No comments:

Post a Comment