Deferrable SQL Constraints in Depth
August 27, 2017
One strength of relational databases is their constant vigilance over data correctness. The user can declare constraints that their data must obey, and then leave it to the database to enforce the rules. This saves a lot of procedural application code and potential errors.
Automatic constraint enforcement is a powerful feature, and should be leveraged whenever possible. However there are times when it is convenient – and even necessary – to temporarily defer enforcement.
Table of Contents
- Constraint Checking Granularities
- Why Defer?
- Reasons Not to Defer
- Deferring by Column
Constraint Checking Granularities
As I mentioned in a previous article, transactions are the units of consistency, thus the SQL standard does not allow a transaction to commit with any constraint violations. While we can defer constraint checking to some degree, the buck stops at commit.
To be more precise, PostgreSQL supports three enforcement granularities:
- By row. In this case a statement which updates multiple rows will fail immediately, possibly partway through its operation, when one of the rows violates a constraint.
- By statement. In this case a statement is allowed to make its full set of changes before the database checks for violations.
- By transaction. Any statement inside a transaction is free to violate constraints. However at commit time the constraints will be checked, and the transaction will fail if any constraints do not hold.
By default PostgreSQL checks each constraint with granularity type one or two, depending on the constraint’s type. In the following table the NOT DEFERRABLE column lists the default for each constraint.
|NOT DEFERRABLE||INITIALLY IMMEDIATE||INITIALLY DEFERRED|
To change a constraint’s validation granularity to the value in another column, we have to explicitly declare a constraint as deferrable. Notice that some types of constraints won’t budge. There’s no way to defer CHECK and NOT NULL any later than per-row. (This is PostgreSQL behavior which violates the SQL standard.)
Before considering when/why to use deferrable constraints, let’s see how they work in general. The first step is marking a constraint deferrable like this:
ALTER TABLE foo ADD CONSTRAINT foo_bar_fk FOREIGN KEY (bar_id) REFERENCES bar (id) DEFERRABLE INITIALLY IMMEDIATE; -- the magic line
Deferrable constraints give transactions flexibility. Any transaction can choose to defer checking
foo_bar_fk to the end:
BEGIN; -- defer the constraint SET CONSTRAINTS foo_bar_fk DEFERRED; -- ... -- now we can do whatever we want to bar_id -- ... COMMIT; -- but it better be correct at this point
Additionally, we can approach it the other way and mark the constraint
DEFERRABLE INITIALLY DEFERRED. In this mode the constraint will be checked per-transaction by default. If a transaction wants to not defer it, the transaction must say
SET CONSTRAINTS constraint_name IMMEDIATE.
Without an explicit BEGIN command each statement runs in its own single-statement transaction where there is no difference between initially immediate or deferred. Attempting to defer constraints outside of a transaction will do nothing and warn,
WARNING: 25P01: SET CONSTRAINTS can only be used in transaction blocks.
There’s one other important caveat. As the table above shows, declaring UNIQUE or PRIMARY KEY constraint
DEFERRABLE INITIALLY IMMEDIATE actually changes it from per-row to per-statement checking. Even if a transaction does not choose to defer those constraints, their granularity is still subtly altered.
What’s the practical difference between per-row and per-statement checking? It’s best illustrated with an example.
CREATE TABLE snowflakes ( i int UNIQUE ); INSERT INTO snowflakes VALUES (1), (2), (3); UPDATE snowflakes SET i = i + 1;
The UNIQUE constraint here is not deferrable, so by default an UPDATE statement will enforce it per-row and will raise an error.
ERROR: 23505: duplicate key value violates unique constraint "snowflakes_i_key" DETAIL: Key (i)=(2) already exists.
If PostgreSQL had waited for all the rows to be updated (i.e. per-statement checking) thre would be no problem. The rows would be uniformly incremented and would end up unique again. However, as it is, PostgreSQL checks for violations immediately after incrementing i=1 to i=2, and at that moment the table contains 2, 2, 3.
Checking constraints per row is fragile and depends on the physical ordering of the rows. If we had inserted items in the opposite order (
INSERT INTO snowflakes VALUES (3), (2), (1)) then the update would have worked.
To recap, declaring a constraint deferrable allows transactions to defer validation until commit time. Confusingly it also changes the semantics of some constraints even when no transaction chooses to defer them.
Cyclic Foreign Keys
Sometimes you have to. The classic example is creating two items related by cyclic consistency checks. Consider,
CREATE TABLE husbands ( id int PRIMARY KEY, wife_id int NOT NULL ); CREATE TABLE wives ( id int PRIMARY KEY, husband_id int NOT NULL ); ALTER TABLE husbands ADD CONSTRAINT h_w_fk FOREIGN KEY (wife_id) REFERENCES Wives; ALTER TABLE wives ADD CONSTRAINT w_h_fk FOREIGN KEY (husband_id) REFERENCES husbands;
Creating rows in either requires a row in the other to already exist. There’s no way to get started because foreign keys are checked per statement and inserting into two tables requires two statements. What we can do is to defer the constraint.
ALTER TABLE husbands ALTER CONSTRAINT h_w_fk DEFERRABLE INITIALLY DEFERRED; ALTER TABLE wives ALTER CONSTRAINT w_h_fk DEFERRABLE INITIALLY DEFERRED;
Because the cyclic constraints pretty much need to be deferred, we set this as their default behavior in a transaction. We can now run:
BEGIN; INSERT INTO husbands (id, wife_id) values (1, 1); INSERT INTO wives (id, husband_id) values (1, 1); COMMIT; -- and they lived happily ever after
What a neat and tidy textbook example. Only there’s a dirty little secret. PostgreSQL has an alternative to deferrable foreign keys for making this example work! Check this out:
-- We won't need to defer them ALTER TABLE husbands ALTER CONSTRAINT h_w_fk NOT DEFERRABLE; ALTER TABLE wives ALTER CONSTRAINT w_h_fk NOT DEFERRABLE; -- Instead do the two inserts as a single statement WITH wife AS ( INSERT INTO wives (id, husband_id) VALUES (2, 2) ) INSERT INTO husbands (id, wife_id) VALUES (2, 2);
Common table expressions (CTEs) allow multiple queries to be considered as the same statement when checking constraints. Since foreign keys are checked per statement this approach works fine.
Although we found a workaround for deferring cyclic foreign keys, they’re still a good way to start thinking about deferrable constraints.
Reallocating Items, One per Group
In this example, we have a set of school classes each with exactly one assigned teacher. Suppose we would like to switch the teachers of two classes. The constraints make it hard to do.
CREATE TABLE classes ( id int PRIMARY KEY, teacher_id int UNIQUE NOT NULL ); INSERT INTO classes VALUES (1, 1), (2, 2);
The CTE trick will not work for swapping teachers because a non-deferrable uniqueness constraint is checked by row rather than by statement.
-- fails unless deferrable since PostgreSQL -- checks per row, not per statement WITH swap AS ( UPDATE classes SET teacher_id = 2 WHERE id = 1 ) UPDATE classes SET teacher_id = 1 WHERE id = 2; ERROR: 23505: duplicate key value violates unique constraint "classes_teacher_id_key" DETAIL: Key (teacher_id)=(1) already exists.
To swap teachers without deferring the uniqueness constraint on
teacher_id, we would need to assign a temporary teacher to a class.
-- first assign a temporary teacher to free -- teacher 1 to assignment to chass 2 UPDATE classes SET teacher_id = 999 WHERE id = 1; UPDATE classes SET teacher_id = 1 WHERE id = 2; UPDATE classes SET teacher_id = 2 WHERE id = 1;
Using a temporary teacher is a bit of a hack. It’s more natural to create the table with a deferrable constraint instead.
CREATE TABLE classes ( id int PRIMARY KEY, teacher_id int NOT NULL UNIQUE DEFERRABLE INITIALLY IMMEDIATE );
Then we can do a simple swap:
BEGIN; SET CONSTRAINTS classes_teacher_id_key DEFERRED; UPDATE classes SET teacher_id = 1 WHERE id = 2; UPDATE classes SET teacher_id = 2 WHERE id = 1; COMMIT;
Also a CTE approach with no explicit transaction will now work because a deferrable uniqueness constraint switches to per-statement checking.
Renumbering a List
One might model the position of todos todos in an ordered list using an integer position column:
CREATE TABLE todos ( list_id int, position int, task text, PRIMARY KEY (list_id, position) ); INSERT INTO todos VALUES (1, 1, 'write grocery list'), (1, 2, 'go to store'), (1, 3, 'buy items');
Each todo’s position is unique per list because of the compound primary key constraint.
Suppose we suddenly remember a new item for the top of the list, like “plan menus.” If we want the new item to have position 1 then we’ll need to shift the others down before inserting it. If you recall, this is the same basic problem as our earlier “snowflakes” example.
Marking the primary key as deferrable would fix the problem:
CREATE TABLE todos ( list_id int, position int, task text, PRIMARY KEY (list_id, position) DEFERRABLE INITIALLY IMMEDIATE );
This would have the effect of making PostgreSQL use per-statement semantics, with no actual
SET ... DEFERRED needed in a transaction.
UPDATE todos SET position = position + 1 WHERE list_id = 1; INSERT INTO todos VALUES (1, 1, 'plan menus');
However the best solution isn’t deferrable constraints, it’s better data modeling. Re-orderable position shouldn’t be stored as integers but as fractions (rational numbers). That way there is always room to find a number between any other two. See the article User-specified Ordering with Fractions. Correct modeling in this case will elegantly remove the need for deferring constraints.
Deferring constraints while loading data can make the process more convenient. For instance, deferred foreign keys allow tables to be loaded in any order, children before parents. This can accommodate data arriving out of order, from the network perhaps.
Some articles online claim that deferred constraints allow more efficient bulk inserts. By my measurement this appears to be a myth. At commit time the same number of elements must be checked for consistency, and the query time balances out. To reproduce:
CREATE TABLE parent ( id int PRIMARY KEY, name text NOT NULL ); CREATE TABLE child ( id int PRIMARY KEY, parent_id int REFERENCES parent DEFERRABLE INITIALLY IMMEDIATE, name text ); INSERT INTO parent SELECT generate_series(1,1000000) AS id, md5(random()::text) as name;
Now we insert five million child elements and measure the time to populate the data and check foreign keys. With a stock installation of PostgreSQL 9.6.3 on my wimpy laptop:
INSERT INTO child SELECT generate_series(1,5000000) AS id, generate_series(1,1000000) AS parent_id, md5(random()::text) as name; -- -- Time: 89064.987 ms
Trying again with the foreign key constraint deferred:
BEGIN; SET CONSTRAINTS child_parent_id_fkey DEFERRED; INSERT INTO child SELECT generate_series(1,5000000) AS id, generate_series(1,1000000) AS parent_id, md5(random()::text) as name; -- -- Time: 40828.810 ms COMMIT; -- -- Time: 47211.533 ms -- Total: 88040.343 ms
What time we save in the insert we lose again in the commit.The only way to make the ingest faster is to temporarily disable the foreign key (assuming we know ahead of time that the data to be loaded is correct). That’s generally a bad idea because we may end up taking more time to debug inconsistency down the line than it takes to do consistency checks during ingestion.
Reasons Not to Defer
Deferrable constraints seem great, right? Marking them deferrable simply gives us more options, so why not mark them all that way?
Query Planner Performance Penalty
The database query planner uses facts about the data to choose execution strategies that are correct and efficient. Its first duty is returning correct results, and it will use optimizations only when database constraints guarantee correctness. By definition deferrable constraints are not guaranteed to hold at all times, so they inhibit the planner.
To learn more about this I asked Andrew Gierth, RhodiumToad, on IRC who explained, “The planner can determine that a set of conditions on a table forces uniqueness of the result. If there’s a compatible unique, non-deferrable index it can elide an otherwise necessary sort/unique or hashagg step. It can’t do that for deferrable constraints, because there might be actual duplicate values present.”
He outlined two optimizations, one in PostgreSQL 9, and one in version 10. The older feature is join elimination:
CREATE TABLE foo ( a integer UNIQUE, b integer UNIQUE DEFERRABLE ); EXPLAIN SELECT t1.* FROM foo t1 LEFT JOIN foo t2 ON (t1.a = t2.a);
Notice the join disappears, and it’s a simple sequential scan:
QUERY PLAN ---------------------------------------------------------- Seq Scan on foo t1 (cost=0.00..32.60 rows=2260 width=8)
Whereas joining on
b, which has a deferrable uniqueness constraint, does not eliminate the join.
EXPLAIN SELECT t1.* FROM foo t1 LEFT JOIN foo t2 ON (t1.b = t2.b);
QUERY PLAN ---------------------------------------------------------------------- Hash Left Join (cost=60.85..124.53 rows=2260 width=8) Hash Cond: (t1.b = t2.b) -> Seq Scan on foo t1 (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=32.60..32.60 rows=2260 width=4) -> Seq Scan on foo t2 (cost=0.00..32.60 rows=2260 width=4)
PostgreSQL 10 provides another optimization that turns the usual semi-join of an IN subquery into a regular join when the column of the subquery is guaranteed unique.
EXPLAIN SELECT * FROM foo WHERE a IN ( SELECT a FROM foo );
It detects that
a is unique:
QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=60.85..121.97 rows=2260 width=8) Hash Cond: (foo.a = foo_1.a) -> Seq Scan on foo (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=32.60..32.60 rows=2260 width=4) -> Seq Scan on foo foo_1 (cost=0.00..32.60 rows=2260 width=4)
Let’s try it on
EXPLAIN SELECT * FROM foo WHERE b IN ( SELECT b FROM foo );
The deferrable constraint prevents the optimization:
QUERY PLAN ------------------------------------------------------------------------- Hash Semi Join (cost=60.85..124.53 rows=2260 width=8) Hash Cond: (foo.b = foo_1.b) -> Seq Scan on foo (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=32.60..32.60 rows=2260 width=4) -> Seq Scan on foo foo_1 (cost=0.00..32.60 rows=2260 width=4)
In addition to helping the planner, non-deferrable constraints are required for target columns of foreign keys. In order to point a foreign key at a column with a uniqueness or primary key constraint, that constraint must be non-deferrable or else the database throws an error.
Getting errors only after a bunch of statements finish makes troubleshooting harder. The resulting error does not pinpoint which statement caused the problem. You may or may not be able to determine the statement based on the message.
CREATE TABLE u ( i int UNIQUE DEFERRABLE INITIALLY IMMEDIATE ); BEGIN; SET CONSTRAINTS u_i_key DEFERRED; INSERT INTO u (i) VALUES (1), (2); -- ... other statments INSERT intu u (i) VALUES (2), (3); -- ... other statements INSERT intu u (i) VALUES (3), (4); COMMIT;
At commit it says merely,
ERROR: 23505: duplicate key value violates unique constraint "u_i_key" DETAIL: Key (i)=(2) already exists.
In this case the message provides enough detail to trace it back to the offending statement, but it would be more difficult if the statements lacked literal values.
Not only might errors at commit time be confusing for people, they can mess up object relational mappers (ORMs). The mappers are often designed for simplistic database access and may not all be programmed with enough knowledge of the SQL spec to properly handle transaction-level constraint failures.
Also any work performed after a deferred constraint violation will ultimately be lost when the transaction rolls back. Deferred constraints can waste processing power.
Finally there’s the matter of expressing our intentions through the database schema. In a perfect world there would be no performance penalties for using deferrable constraints and all constraints would all be be deferrable (initially immediate).
It’s an unfortunate implementation detail that we’re forced to choose manually between performance and flexibility. But given that the distinction exists, marking a constraint deferrable documents that deferring it is a worthwhile or necessary action. This is not the case for the vast majority of database constraints and labelling all as deferrable would hide the distinction.
Deferring by Column
Here’s one last trick for fun.
SET CONSTRAINTS command acts on constraints by their names. However it can be convenient to defer eligible constraints by column instead. PostgreSQL’s information schema allows us to look up constraints by column.
-- A handy view CREATE VIEW deferrables AS SELECT table_schema, table_name, column_name, conname, contype FROM pg_constraint, information_schema.constraint_column_usage WHERE constraint_name = conname AND condeferrable = TRUE; -- Defer all constraints for column CREATE FUNCTION defer_col_constraints( t_name information_schema.sql_identifier, c_name name ) RETURNS void AS $$ DECLARE names text; BEGIN names := ( SELECT array_to_string(array_agg(conname), ', ') FROM deferrables WHERE table_name = $1 AND column_name = $2 ); EXECUTE format( 'SET CONSTRAINTS %s DEFERRED', names ); END; $$ LANGUAGE plpgsql;
In our example loading parent and child data we could use this function to disable the child’s foreign key constraint like so:
BEGIN; SELECT defer_col_constraints('child', 'parent_id'); -- ... COMMIT;