Query hierarchical file permissions

All we need is an easy explanation of the problem, so here it is.

I have a table representing some files with their paths and another table with file sharings by user and files.

Below is Table files, simplified. Example:

  • I have a directory A (path /A) containing some files.
  • Directory B is inside A and its path is /A/B and it contains some files.
  • Files C (path /A/B/C) and D (path /A/B/D) are inside B.
id path
1 /A
2 /A/B
3 /A/B/C
4 /A/B/D

Below is table file_sharings, simplified. If a user has a file sharing with path=’/A’ and readable=true, it means that he can read every files with a path starting with ‘/A’

Now I have that in this file (it’s only an example, it can be more complex):

id user_id file_id path readable
1 1 1 ‘/A’ true
2 1 2 ‘/A/B’ false
3 1 4 ‘/A/B/D’ true

So, it means that user 1 can see everything in the directory A, but he can’t see directory B and file C but he can see file D (the same way we can have with google drive for instance)

What I want is a query to have all files/directories that user 1 can see.

For the moment I have something like this:

SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=true)
AND (path NOT LIKE ANY (SELECT fs.path || '%'
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=false)
);

but it returns only:

/A

If I do:

SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=true)
AND (path NOT IN (SELECT fs.path
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=false)
);

… it returns:

/A
/A/B/C
/A/B/D

I don’t know how to have all files that user 1 can read except the ones he can’t read. Here I must have /A and /A/B/D, but in reality the tree files can be more complex but the tenet is the same that in this simple example.

It’s easy in the real life but I’m stuck to create the good request 😀

Here a dbfiddle

When I have no row in the DB, it means the user has no readable rights.

When I have a row with readable=true, it means the user has a readable right on the file and the children (the files in the repository).

When I have a row with readable=false, it means that the file and its children are specifically forbidden to the user.

I do this way because I don’t want to create billion rows when I share a root directory to another user.

file_id is a foreign key used only to recalculate the file_sharings path when I move a file/repository into another one.

So I can have a readable right on a file whereas it is not explicitly in the DB

Here a picture of the fiddle example for user1:

Query hierarchical file permissions

With this request:

SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=true)
AND (path NOT LIKE ANY (SELECT fs.path || '%'
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=false)
);

It returns A1, B2 and C3. I should also have C2.

With the second request:

SELECT *
FROM files
WHERE path LIKE ANY (SELECT fs.path || '%'
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=true)
AND (path NOT IN (SELECT fs.path
    FROM file_sharings fs
    WHERE fs.user_id='1'
    AND fs.readable=false)
);

It returns A1, B2, C3, C2 and C1. I should not have C1.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You didn’t declare any constraints except NOT NULL, so I assumed reasonable constraints as shown in my fiddle below.
I didn’t understand the role of file_id, so I ignored it completely. (Seems orthogonal to the question.)

Your logic boils down to this: each file has permissions according to the longest matching path. So match every file to rows in file_sharings with LIKE and pick the row with the longest path. Only files with a readable flag survive:

SELECT (f).*
FROM  (
   SELECT DISTINCT ON (f.id)
          f, s.readable
   FROM   files f
   JOIN   file_sharings s ON f.path LIKE s.path || '%'
   WHERE  s.user_id = 1
   ORDER  BY f.id, s.path DESC  -- choose the longest matching path !
   ) sub
WHERE  readable
ORDER  BY (f).path;

Produces your desired result.

db<>fiddle here

I simplified your fiddle, added the required UNIQUE constraint, and fixed a data error to be in sync with the question.

It’s essential to select the match with the longest path in a subquery before filtering the readable ones.

About DISTINCT ON:

Depending on undisclosed cardinalities and data distribution, there may be ways to improve performance …

Aside

I select the whole row f in the subquery for convenience – so that I don’t have to spell out all column names of table file in the outer SELECT.
Here is a more verbose form without that trick:

SELECT id, name, path
FROM  (
   SELECT DISTINCT ON (f.id)
          f.*, s.readable
   FROM   files f
   JOIN   file_sharings s ON f.path LIKE s.path || '%'
   WHERE  s.user_id = 1
   ORDER  BY f.id, s.path DESC  -- choose the longest matching path !
   ) sub
WHERE  readable
ORDER  BY path;

A subtle difference (besides short code) surfaces after changing the table definition of file later: the first query keeps returning all columns as listed in the table definition (result changes with the changed table), while the second query keeps returning columns as explicitly listed. "Late binding" vs. "early binding".

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply