Wednesday, March 12, 2014

Learning PostgreSql: UPDATE...FROM works differently

UPDATE...FROM command on PostgreSql may raise no errors, but produce completely different results. Later I shall provide a repro, but first let us briefly refresh how UPDATE...FROM works on Sql Server.

UPDATE...FROM on SQL Server ignores ambiguity 

We shall need the following test data:

CREATE TABLE #Problems(ProblemID INT NOT NULL, Problem VARCHAR(30) NOT NULL,Solution VARCHAR(30) NULL);
CREATE TABLE #SuggestedSolutions(SuggestedSolutionID INT NOT NULL,ProblemID INT NOT NULL, IsAccepted CHAR(1) NULL,Solution VARCHAR(30) NOT NULL);
INSERT INTO #Problems
        
ProblemIDProblemSolution )VALUES  0'Washer won''t drain', NULL),
1'Kitchen looks dirty', NULL);
INSERT INTO #SuggestedSolutions
        
SuggestedSolutionID ,
          
ProblemID ,
          
IsAccepted ,
          
Solution
        
)VALUES  (00'Y''Rebalance the load'),
(
10'N''Turn washer off then on'),
(
21'N''Turn off the light'),
(
31'Y''Clean up the kitchen'); 

While we are at it, let us also make sure that at most one proposed solution per problem can be accepted:

CREATE UNIQUE INDEX OneSolutionPerProblem ON #SuggestedSolutions(ProblemIDWHERE (IsAccepted='Y');

The following update ignores ambiguity: two proposed solutions match each problem, and the database engine picks one of them to update. It does not raise any errors. I have no idea how the engine chooses the value to update in case of ambiguity, so the output on your server may be different:

UPDATE #Problems SET Solution s.SolutionFROM #Problems AS p#SuggestedSolutions AS WHERE p.ProblemID s.ProblemID;
SELECT FROM #Problems;
0  Washer won''t drain   Rebalance the load1  Kitchen looks dirty   Turn off the light 

We can narrow down the WHERE clause, so that there is no ambiguity at all, and the results of UPDATE are predictable:

UPDATE #Problems SET Solution s.Solution
FROM #Problems AS p#SuggestedSolutions AS s
WHERE p.ProblemID s.ProblemID AND s.IsAccepted='Y';
SELECT FROM #Problems;
0  Washer won''t drain   Rebalance the load1  Kitchen looks dirty   Clean up the kitchen
   

As we have seen, SQL Server interprets the FROM clause in this UPDATE as a correlated subquery - if it uniquely identifies a matching row, we can predict the results of UPDATE command.

PostgreSql interprets the same UPDATE...FROM differently

Let us set up test data again:

CREATE TEMP TABLE Problems(ProblemID INT NOT NULL, Problem VARCHAR(30) NOT NULL,Solution VARCHAR(30) NULL);
CREATE TEMP TABLE SuggestedSolutions(SuggestedSolutionID INT NOT NULL,ProblemID INT NOT NULL, IsAccepted CHAR(1) NULL,Solution VARCHAR(30) NOT NULL);
CREATE UNIQUE INDEX OneSolutionPerProblem ON SuggestedSolutions(ProblemID) WHERE (IsAccepted='Y');
INSERT INTO Problems
        ( ProblemID, Problem, Solution )VALUES  ( 0, 'Washer won''t drain', NULL),
( 1, 'Kitchen looks dirty', NULL);
INSERT INTO SuggestedSolutions
        ( SuggestedSolutionID ,
          ProblemID ,
          IsAccepted ,
          Solution
        )VALUES  (0, 0, 'Y', 'Rebalance the load'),
(1, 0, 'N', 'Turn washer off then on'),
(2, 1, 'Y', 'Turn off the light'),
(3, 1, 'N', 'Clean up the kitchen');

Note: OneSolutionPerProblem used to be called a filtered index in SQL Server universe. In PostgreSql, it is a partial index.
Let us rerun the second update, which was producing predictable results on SQL Server:

UPDATE Problems SET Solution = s.SolutionFROM Problems AS p, SuggestedSolutions AS s WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';
SELECT * FROM Problems;
0;"Washer won't drain""Rebalance the load"
1;"Kitchen looks dirty""Rebalance the load" 

See how the second problem was updated with a solution for the first one? 
The reason is simple: PostgreSql interprets the FROM clause of this UPDATE as the following uncorrelated subquery:

SELECT s.Solution  FROM Problems AS p, SuggestedSolutions AS s WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';

So every row in its output is a potential match for every row in Problem, which is why we are getting these results.
To demonstrate this behavior one more time, let us add a problem without any suggested solutions, and rerun the update:

INSERT INTO Problems
        ( ProblemID, Problem, Solution )VALUES  ( 2, 'Fridge is empty', NULL); 
UPDATE Problems SET Solution = s.SolutionFROM Problems AS p, SuggestedSolutions AS s WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';
SELECT * FROM Problems; 
0;"Washer won't drain";"Rebalance the load"
1;"Kitchen looks dirty";"Rebalance the load"
2;"Fridge is empty";"Rebalance the load" 

Rewriting UPDATE..FROM

The following script shows the correct way to update in PostgreSql
-- erase wrong valuesUPDATE Problems SET Solution = NULL;
-- the correct UPDATEUPDATE Problems SET Solution = s.SolutionFROM SuggestedSolutions AS s WHERE Problems.ProblemID = s.ProblemID AND s.IsAccepted='Y';
SELECT * FROM Problems ORDER BY ProblemID;
0;"Washer won't drain""Rebalance the load"
1;"Kitchen looks dirty""Turn off the light"
2;"Fridge is empty""" 

Conclusion

To ensure correct results, all UPDATE...FROM commands need to be rewritten, because UPDATE...FROM is interpreted differently by PostgreSql

No comments:

Post a Comment