All we need is an easy explanation of the problem, so here it is.
We have defined a series of configurations, where, driven by a RESTful API, end-users can build up new revisions. Some of the components of the configuration can have more than one value; a revision involves multiple tables with one-to-many relationships.
Because the configuration is shipped off elsewhere, revisions are marked as deployed, and become immutable. Users have to create a new revision (which can be cloned from an existing one) if they want to make changes to a configuration. One revision per configuration can be marked as ‘current’; this allows the users to switch between past revisions at will, or disable the configuration entirely by not picking any revision. The current revision is deployed, when marking a different revision as ‘current’ you replace the deployed config.
We already have everything in place to enforce immutability of deployed revisions; the
deployed column is automatically transitioned to
TRUE when you first use a revision as the current revision, and all further
DELETE operations concerning rows that match a deployed revision id in revision-related tables is blocked.
However, any value used for the
name column in the public name table, must be unique across all the ‘current’ revisions across all current configurations. I’m trying to figure out the best strategy to enforce this.
If this was a plain one-to-many relationship from config to public names, this would be solved by using a unique constraint on the
name column. This is, instead, a one-to-many-to-many pattern with
revision acting as the bridge table, and the
current_revision_id "collapses" the one-to-many-to-many to a virtual one-to-many relationship from config to public names.
Here is a simplified set of tables that illustrate our situation:
-- Configurations CREATE TABLE config ( id INT PRIMARY KEY, name VARCHAR(100), current_revision_id INT ); -- Have multiple revisions CREATE TABLE revision ( id INT PRIMARY KEY, config_id INT NOT NULL REFERENCES config(id), created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP, description VARCHAR, foo INT NOT NULL, bar BOOLEAN NOT NULL, deployed BOOLEAN NOT NULL DEFAULT FALSE ); -- A configuration has one _current_ revision ALTER TABLE config ADD CONSTRAINT current_revision_id_fk FOREIGN KEY (current_revision_id) REFERENCES revision(id); -- Revisions are automatically numbered in a view CREATE VIEW numbered_revision AS ( SELECT *, row_number() OVER ( PARTITION BY config_id ORDER BY created_at, id ) AS number FROM revision ); -- Configurations have multiple 'public names' CREATE TABLE public_name ( id INT PRIMARY KEY, revision_id INT NOT NULL REFERENCES revision(id), name VARCHAR(100), UNIQUE (revision_id, name) );
The view only serves to provide revisions with gapless numbers per config (revisions are never deleted).
As an ERD diagram:
Some sample data to illustrate the setup:
INSERT INTO config (id, name) VALUES (17, 'config_foo'), (42, 'config_bar'); INSERT INTO revision (id, config_id, created_at, description, foo, bar) VALUES (11, 17, '2021-05-29 09:07:18', 'Foo configuration, first draft', 81, TRUE), (19, 17, '2021-05-29 10:42:17', 'Foo configuration, second draft', 73, TRUE), (23, 42, '2021-05-29 09:36:52', 'Bar configuration, first draft', 118, FALSE); INSERT INTO public_name (id, revision_id, name) VALUES -- public names for foo configuration, first draft (83, 11, 'some.name'), (84, 11, 'other.name'), -- public names for foo configuration, second draft (85, 19, 'revised.name'), (86, 19, 'other.name'), (87, 19, 'third.name'), -- public names for bar configuration, first draft; -- some of the names here are the same used by foo configurations (88, 23, 'some.name'), (89, 23, 'unique.name'), (90, 23, 'other.name'); -- Foo configuration has a current, published revision: UPDATE config SET current_revision_id = 19 WHERE id = 17; UPDATE revision SET deployed = TRUE WHERE id in (11, 19);
Here is a query showing the sample dataset:
SELECT c.name AS config, rev.number AS revision, rev.deployed, CASE WHEN c.current_revision_id = rev.id THEN 'ACTIVE' ELSE '' END AS status, string_agg(p.name, ', ' ORDER BY p.name) AS names FROM config c JOIN numbered_revision AS rev ON c.id = rev.config_id JOIN public_name p ON p.revision_id = rev.id GROUP BY c.id, rev.id, rev.number, rev.deployed ORDER BY c.id, rev.number;
config revision deployed status names config_foo 1 t other.name, some.name config_foo 2 t ACTIVE other.name, revised.name, third.name config_bar 1 f other.name, some.name, unique.name
In the above output table, the second row represents a "current" revision, made public deployed), and that row has been given exclusive access to the public names in the
The third row represents a configuration with a draft revision. Any attempts to set it as current for
config_bar should fail because the name
other.name is already in use for
config_foo, revision 2. If, in the future,
config_foo were to create a new revision that doesn’t include
other.name, only then could
config_bar revision 1 be made current.
We do pre-validate this constraint; the API runs some checks and blocks marking a configuration as current when pre-conditions are not met. Names in the
public_name table are also constrained to be unique per revision (
UNIQUE (revision_id, name)). Neither of these prevents a race condition, they just reduce the rate at which race conditions happen.
I was hoping a CONSTRAINT TRIGGER on
config, firing on
UPDATEs of the
current_revision_id column, would be sufficient to enforce this constraint:
CREATE OR REPLACE FUNCTION unique_current_names() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN IF EXISTS ( SELECT 1 FROM public_name p WHERE p.revision_id = NEW.current_revision_id AND p.name IN ( SELECT pp.name FROM config AS pc JOIN public_name pp ON pp.revision_id = pc.current_revision_id AND pc.id != OLD.id ) ) THEN RAISE EXCEPTION 'Public name is already published'; END IF; RETURN NEW; END;$$; DROP TRIGGER IF EXISTS unique_current_names_trig ON config; CREATE CONSTRAINT TRIGGER unique_current_names_trig AFTER UPDATE OF current_revision_id ON config DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE unique_current_names();
(Note that the relationship between
public_name is, in the general case, a many-to-many connection, but for the more specific
current_revision_id case, it is a one-to-many connection, and you can use
config.current_version_id = public_name.version_id to list the names directly.)
My concern is that, even though this trigger fires at the very end of a transaction, there is still the possibility of a race condition, wherein another connection also tries to make a revision current with conflicting public names.
OTOH, because all updates and inserts are the results of RESTFul API operations, there will never be a transaction that includes multiple operations (updates of
public_name, and setting
current_revision_id). Is that enough to prevent race conditions here, or are there corner cases I missed?
Another option might be to copy the public names of the current revision into a separate “published names” table (with a trigger; delete all old names, insert all new names), with a UNIQUE constraint on the name column there. Would that work better than the constraint trigger?
Note that we can’t use namespaces or other additions to the names (which are hostnames, on the public internet) to make them unique. The names must be unique entirely on their own, once deployed.
We are aware the design allows a configuration to reference a current
revision_id that belongs to a different configuration. That’s a possibility we explicitly guard against at the application level, but a trigger could also handle that.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
I did not follow the exact details of your data model, but a deferred constraint trigger is always subject to race conditions unless you operate with the
SERIALIZABLE transaction isolation level.
The reason is that concurrent updates of
config could cause the trigger function to run in parallel in two sessions, where they cannot see the effects of the other transaction, because no transaction has committed yet. Defining the trigger as
INITIALLY DEFERRED narrows the window for the race condition, but it does not close it.
As an alternative to using
SERIALIZABLE, you could modify your trigger function so that it takes locks that prevent it from running more than once at the same time. Transaction-level advisory locks come to mind as a simple way to do that.
The general idea in this solution is to add a little denormalization so the desired restrictions can all be enforced with regular foreign keys, check constraints, and filtered unique indexes.
I don’t know Postgres well enough, so this is a SQL Server implementation.
The main features are:
- Current revision moved from config to revisions
- Current revision is zero if not current, otherwise matches revision id
- Current revision denormalized to public names with cascade sync
- Tricky uniqueness constraints implemented with filtered indexes
This design may or may not be acceptable, but perhaps it will at least provoke some thought.
DROP TABLE IF EXISTS dbo.PublicNames, dbo.Revisions, dbo.Configs;
CREATE TABLE dbo.Configs ( ConfigID integer NOT NULL, ConfigName varchar(100) NOT NULL, CONSTRAINT [PK dbo.Configs ConfigID] PRIMARY KEY CLUSTERED (ConfigID), -- Assuming configuration names are unique CONSTRAINT [UQ dbo.Configs ConfigName] UNIQUE NONCLUSTERED (ConfigName), );
CREATE TABLE dbo.Revisions ( RevisionID integer NOT NULL, ConfigID integer NOT NULL, CurrentRevisionID integer NOT NULL -- Revision is current if CurrentRevisionID = RevisionID -- Zero otherwise (see check constraints) DEFAULT 0, CreatedAt datetimeoffset NOT NULL DEFAULT SYSDATETIMEOFFSET(), RevisionDescription varchar(200) NULL, Foo integer NOT NULL, Bar bit NOT NULL, -- Convenient computed column, persisted or not IsCurrent AS CONVERT(bit, IIF(CurrentRevisionID = RevisionID, 'true', 'false')), IsDeployed bit NOT NULL DEFAULT CONVERT(bit, 'false'), CONSTRAINT [PK dbo.Revisions RevisionID] PRIMARY KEY CLUSTERED (RevisionID), CONSTRAINT [FK dbo.Revisions -> dbo.Configs ConfigID] FOREIGN KEY (ConfigID) REFERENCES dbo.Configs (ConfigID), CONSTRAINT [CK dbo.Revisions Valid CreatedAt] CHECK (CreatedAt <= SYSDATETIMEOFFSET()), -- RevisionID = 0 is a reserved value, must not be used CONSTRAINT [CK dbo.Revisions Valid RevisionID] CHECK (RevisionID != 0), -- CurrentRevisionID must be zero or match RevisionID CONSTRAINT [CK dbo.Revisions Valid CurrentRevisionID] CHECK (CurrentRevisionID IN (0, RevisionID)), -- A revision can only be deployed if it is current CONSTRAINT [CK dbo.Revisions Only Deployed If Current] CHECK (IsDeployed = 'false' OR (CurrentRevisionID = RevisionID AND IsDeployed = 'true')), -- For denormalization via FK to dbo.PublicNames CONSTRAINT [UQ dbo.Revisions RevisionID, CurrentRevisionID] UNIQUE NONCLUSTERED (RevisionID, CurrentRevisionID), -- Unique current revision per config INDEX [UQ dbo.Revisions One Current Revision Per Config] UNIQUE (ConfigID, IsCurrent) INCLUDE (CurrentRevisionID) WHERE CurrentRevisionID != 0 );
CREATE TABLE dbo.PublicNames ( PublicNameID integer NOT NULL, RevisionID integer NOT NULL, CurrentRevisionID integer NOT NULL DEFAULT 0, PublicName varchar(100) NOT NULL, CONSTRAINT [PK dbo.PublicNames PublicNameID] PRIMARY KEY CLUSTERED (PublicNameID), CONSTRAINT [FK dbo.PublicNames -> dbo.Revisions RevisionID] FOREIGN KEY (RevisionID) REFERENCES dbo.Revisions (RevisionID), -- Denormalized, kept in sync via cascade CONSTRAINT [FK dbo.PublicNames -> dbo.Revisions RevisionID, CurrentRevisionID] FOREIGN KEY (RevisionID, CurrentRevisionID) REFERENCES dbo.Revisions (RevisionID, CurrentRevisionID) ON UPDATE CASCADE, -- Public names unique within a revision CONSTRAINT [UQ dbo.PublicNames PublicName, RevisionID] UNIQUE NONCLUSTERED (PublicName, RevisionID), -- To support foreign key INDEX [IX dbo.PublicNames RevisionID, CurrentRevisionID] NONCLUSTERED (RevisionID, CurrentRevisionID), -- Public names unique across all current revisions INDEX [UQ dbo.PublicNames Unique Current Public Names] UNIQUE NONCLUSTERED (PublicName) INCLUDE (CurrentRevisionID) WHERE CurrentRevisionID != 0 );
Sample data and state as provided in the question:
INSERT INTO dbo.Configs (ConfigID, ConfigName) VALUES (17, 'config_foo'), (42, 'config_bar'); INSERT INTO dbo.Revisions (RevisionID, ConfigID, CreatedAt, RevisionDescription, Foo, Bar) VALUES (11, 17, '2021-05-29 09:07:18', 'Foo configuration, first draft', 81, 'true'), (19, 17, '2021-05-29 10:42:17', 'Foo configuration, second draft', 73, 'true'), (23, 42, '2021-05-29 09:36:52', 'Bar configuration, first draft', 118, 'false'); INSERT INTO dbo.PublicNames (PublicNameID, RevisionID, PublicName) VALUES -- public names for foo configuration, first draft (83, 11, 'some.name'), (84, 11, 'other.name'), -- public names for foo configuration, second draft (85, 19, 'revised.name'), (86, 19, 'other.name'), (87, 19, 'third.name'), -- public names for bar configuration, first draft; -- some of the names here are the same used by foo configurations (88, 23, 'some.name'), (89, 23, 'unique.name'), (90, 23, 'other.name'); -- Foo configuration has a current, published revision: UPDATE dbo.Revisions SET CurrentRevisionID = 19, IsDeployed = 'true' WHERE ConfigID = 17 AND RevisionID = 19;
The status query:
WITH NumberedRevisions AS ( SELECT R.*, Revision = ROW_NUMBER() OVER ( PARTITION BY R.ConfigID ORDER BY R.CreatedAt, R.RevisionID) FROM dbo.Revisions AS R ) SELECT C.ConfigName, R.Revision, R.IsDeployed, R.IsCurrent, Names = STRING_AGG(P.PublicName, ',') WITHIN GROUP ( ORDER BY P.PublicName) FROM dbo.Configs AS C JOIN NumberedRevisions AS R ON R.ConfigID = C.ConfigID JOIN dbo.PublicNames AS P ON P.RevisionID = R.RevisionID GROUP BY C.ConfigID, C.ConfigName, R.Revision, R.IsDeployed, R.IsCurrent ORDER BY C.ConfigID, Revision;
Attempt to set the third row current:
UPDATE dbo.Revisions SET CurrentRevisionID = 23 WHERE ConfigID = 42 AND RevisionID = 23;
Msg 2601, Level 14, State 1 Cannot insert duplicate key row in object 'dbo.PublicNames' with unique index 'UQ dbo.PublicNames Unique Current Public Names'. The duplicate key value is (other.name). The statement has been terminated.
The above is a fairly standard technique to allow engine-enforced denormalization, so we can use constraints and filtered indexes to enforce data integrity—regardless of concurrency or other concerns (product bugs aside). See Denormalizing to enforce business rules by Alexander Kuznetsov for more.
There will usually be application-side impacts, but these often turn out to be surprisingly manageable in practice. In SQL Server, that is often done by abstracting any additional implementation complexity into a module (stored procedure or function, usually). That preserves a clean and consistent API.
One can’t beat the good night’s sleep you get, safe in the knowledge the database is consistent at all times.
Using a constraint trigger will not prevent a race condition, as confirmed by Laurenz Albe, unless we switch to the transaction isolation level to
SERIALIZABLE. That would complicate our application logic, as we’d have to retry commits.
Instead of using a constraint trigger, we are now simply copying the published public names into a new table using an
ON INSERT OR UPDATE trigger. The unique constraint is then enforced on that table, and so not subject to the same transaction isolation issues. If two transactions were to attempt to promote a revision with a conflicting name to being public, one or the other transaction will fail on the unique constraint.
This is what the
published_public_name table, and the trigger on
config looks like:
CREATE TABLE published_public_name ( id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, config_id INT NOT NULL REFERENCES config(id) ON DELETE CASCADE, name VARCHAR(100) NOT NULL UNIQUE ); CREATE OR REPLACE FUNCTION copy_published_public_names() RETURNS TRIGGER AS $BODY$ BEGIN -- OLD / NEW is a row in the config table -- Trigger is called for both updates and inserts IF TG_OP = 'UPDATE' THEN IF OLD.current_revision_id IS NOT DISTINCT FROM NEW.current_revision_id THEN -- Nothing changed RETURN NEW; END IF; IF OLD.current_revision_id IS NOT NULL THEN DELETE FROM published_public_name WHERE id IN ( SELECT ppn.id FROM published_public_name ppn WHERE ppn.config_id = OLD.id ORDER BY ppn.name ); END IF; END IF; IF NEW.current_revision_id IS NOT NULL THEN INSERT INTO published_public_name(config_id, name) SELECT NEW.id, pn.name FROM public_name pn WHERE pn.revision_id = NEW.current_revision_id ORDER BY pn.name; END IF; RETURN NEW; END $BODY$ LANGUAGE plpgsql; CREATE TRIGGER copy_published_public_names_trigger BEFORE INSERT OR UPDATE OF current_revision_id ON config FOR EACH ROW EXECUTE FUNCTION copy_published_public_names();
UPDATE. DELETES are handled by
ON DELETE CASCADE on the
published_public_name.config_id foreign key.
I included an
ORDER BY clause in the
INSERT statements in the trigger to avoid a potential deadlock if more than one name is involved. I’m not 100% certain that the
DELETE ordering is required, but it does no harm if not.
When trying to publish a revision with a name that’s already used by another config, the statement fails with:
ERROR: duplicate key value violates unique constraint "published_public_name_name_key" DETAIL: Key (name)=(other.name) already exists. CONTEXT: SQL statement "INSERT INTO published_public_name(config_id, name) SELECT NEW.id, pn.name FROM public_name pn WHERE pn.revision_id = NEW.current_revision_id" PL/pgSQL function copy_published_public_names() line 18 at SQL statement
See my new db<>fiddle revision.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂