Constraint to prevent immediate loop in hierarchy

All we need is an easy explanation of the problem, so here it is.

I have a companies table:

CREATE TABLE companies (
 id bigserial,
 name varchar(255) NOT NULL
 PRIMARY KEY (id)
)

I want companies to have parents and childs. A company might have many parent and many child companies, so I created the following table:

CREATE TABLE parent_companies (
 id bigserial,
 parent_company_id bigint,
   CONSTRAINT parent_companies_parent_company_id_fkey
   FOREIGN KEY (parent_company_id)
   REFERENCES companies(id),
 child_company_id bigint,
   CONSTRAINT parent_companies_child_company_id_fkey
   FOREIGN KEY (child_company_id)
   REFERENCES companies(id)
 PRIMARY KEY (id)
)

I want parent and child to be unique, so I added the following constraint:

CREATE UNIQUE INDEX parent_companies_parent_company_id_child_company_id_index ON parent_companies (parent_company_id, child_company_id)

This will not allow the same parent and child to be created twice in the database. However, I would like to also prevent a child company being saved as a parent of its parent company. For example, I would to prevent this from happening:

-[ RECORD 1 ]-----+--------------------
id                | 1
parent_company_id | 1
child_company_id  | 2
-[ RECORD 2 ]-----+--------------------
id                | 2
parent_company_id | 2
child_company_id  | 1

Since in record 1 it’s defined that company 1 is the parent of company 2, I want a way of preventing company 2 being recorded as parent of company 1.

I thought it could be the case of using the EXCLUDE or the CHECK constraints, but I couldn’t figure out a way of making either work.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

What you describe is a many-to-many relationship between a table and itself (with additional restrictions). See:

Could be implemented like this:

CREATE TABLE company (
  id   integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, name text NOT NULL
)

CREATE TABLE hierarchy (
  parent_id integer NOT NULL REFERENCES company
, child_id  integer NOT NULL REFERENCES company
, PRIMARY KEY (parent_id, child_id)
, CONSTRAINT no_1step_loop CHECK (child_id <> parent_id)
);

CREATE UNIQUE INDEX hierarchy_no_2step_loop ON hierarchy (LEAST(child_id, parent_id), GREATEST(child_id, parent_id);

The unique index hierarchy_no_2step_loop rules out loops over two steps like you addressed in your question. Related:

The CHECK constraint no_1step_loop rules out loops over one step (rows referencing themselves directly).

But ruling out loops over more steps is not as simple. You could have a trigger follow the chain of links to the root and raise an exception if a loop is detected. That’s more expensive and inherently unsafe against concurrent write operations, though. To make it bullet-proof, you’d have to operate with elevated transaction isolation (like SERIALIZABLE) or write locks on all rows in the chain. But that is very susceptible to deadlocks under heavy write load …

Method 2

So you want some kind of "anti-foreign key"?

CREATE FUNCTION chk_parent_companies(bigint) RETURNS boolean LANGUAGE SQL
AS $$
SELECT NOT EXISTS(SELECT 1 FROM parent_companies WHERE id = $1);
$$;

ALTER TABLE companies ADD CONSTRAINT chk_parent_companies CHECK(chk_parent_companies(id));

Is that what you’re trying to achieve?

Method 3

Could you not do this with a single table and a self-referencing column?

CREATE TABLE companies (
   id         bigserial,
   name       varchar(255) NOT NULL,
   parent_id  bigint           NULL REFERENCES companies (id)

   PRIMARY KEY (id)
);

Then you can write some data:

INSERT INTO companies (name, parent_id)
VALUES ('Big Parent Company', NULL),
       ('Smaller Company', 1),
       ('Tokyo Subsidiary', 1),
       ('Zurich Subsidiary', 1),
       ('Zurich Manufacturer', 4),
       ('Zurich Distributor', 4),
       ('Another Big Company', NULL),
       ('Spanish Subsidiary', 7);

And you’ll see that a subsidiary record can only have one parent record:

id  name                parent_id
--  --------------—---  --------- 
1   Big Parent Company       NULL
2   Smaller Company             1
3   Tokyo Subsidiary            1
4   Zurich Subsidiary           1
5   Zurich Manufacturer         4
6   Zurich Distributor          4
7   Another Big Company      NULL
8   Spanish Subsidiary          7

If every company needs to have a unique name (so that you don’t have two subsidiaries with the same name), then another constraint can be added … though this might create some unfortunate consequences elsewhere.


Note: Updated after a bit of clarification in the comments.

If a company can somehow have multiple parent organisations, then perhaps a second table could be used like this:

CREATE TABLE company_parent (
   company_id bigint       NOT NULL REFERENCES companies (id),
   parent_id  bigint       NOT NULL REFERENCES companies (id),

   ... { additional attributes } ...

   PRIMARY KEY (company_id, parent_id)
);

This would ensure uniqueness for any company_id+parent_id combination, but will not (by itself) ensure correctness of the data.

Of course, any company that does not have any record in the company_parent table could be considered the top-level organisation.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply