How to write a query for exact pairs of groups

All we need is an easy explanation of the problem, so here it is.

I am given the schema:

EMPLOYEE(Fname,Ssn,Dno)

DEPARTMENT(Dname,Dnumber,Mgr_ssn,Mgr_start_date)

PROJECT(Pname,Pnumber,Plocation,Dno)

WORKS_ON(Essn,Pno)

fiddle: https://www.db-fiddle.com/f/9LnJN32WLVu1HsdmG8NUAc/5

Primary keys are given in bold.

The question: Is it possible to define a view that will report the following detail for each employee: Ssn and the number of employees who work on exactly the same set of projects? Explain why not or given an SQL query.

My idea was to find unique pairs of employees and check if projects worked by first employee equal those of the second one (this was done by using set operation except as if s1 except s2 union s2 except s1 is empty than it seems like s1 and s2 are equal)

My attempt for this was:

SELECT e1.ssn,count(e2.ssn)
FROM EMPLOYEE e1, EMPLOYEE e2
WHERE e1.ssn < e2.ssn
AND NOT EXISTS
(
 (
 (SELECT Pno
 FROM WORKS_ON w1
 WHERE w1.Essn = e1.Ssn)
 EXCEPT ALL
 (SELECT Pno
 FROM WORKS_ON w2
 WHERE w2.Essn = e2.Ssn)
 )

 UNION ALL
 
 (
 (SELECT Pno
 FROM WORKS_ON w2
 WHERE w2.Essn = e2.Ssn)
 EXCEPT ALL
 (SELECT Pno
 FROM WORKS_ON w1
 WHERE w1.Essn = e1.Ssn)
 )
)
GROUP BY(e1.ssn)

Does this query look correct? Also is there a neater way of doing this?

As a final side note what exactly determines if it is impossible for us to construct an SQL query for a problem?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

A simple join should be sufficient.

Now i collect every project_id an user worked on in an aeeay, and compare


Query #1

WITH CTE AS (SELECT Essn,array_agg(Pno ORDER BY Pno) AS AerPno FROM WORKS_ON
GROUP BY Essn)
SELECT c1.Essn, COUNT(*) FROM CTE c1 JOIN CTE c2 on c1.AerPno = c2.AerPno AND c1.Essn <> c2.Essn
GROUP BY c1.Essn;
essn count
5 1
1 1

View on DB Fiddle

Method 2

This is a question of Relational Division Without Remainder. Joe Celko has a very good article on this.

There are a number of ways to slice this cake. Here are a couple:

1. Get a windowed count of each employee’s projects (you can also do this as a scalar subquery).
Left-join employees with each other, matching on Pno.
Group up by each pair of employees, exclude any which either have too many or too few Projects than the other.
Then do a final grouping of each employee, counting up the matching employees.

WITH WORKS_ONwithCount AS (
    SELECT
        *,
        COUNT(*) OVER (PARTITION BY Essn) AS CountPno
    FROM WORKS_ON
),
EmployeesCrossJoin AS (
    SELECT this.Essn,
        other.Essn
    FROM cte this
    LEFT JOIN cte other ON other.Essn <> this.Essn AND other.Pno = this.Pno
    GROUP BY this.Essn, other.Essn, other.CountPno
    HAVING COUNT(*) = COUNT(other.Pno) AND COUNT(*) = other.CountPno
)
SELECT emp.Essn, COUNT(*) AS CountMatching
FROM EmployeesCrossJoin emp
GROUP BY emp.Essn;

2. Group up each employee, with their project numbers in an array.
Then take the count of other employees who: after un-nesting and full-joining both arrays, there are no NULL values on either side.

WITH cte AS (
    SELECT
        Essn,
        array_agg(Pno ORDER BY Pno) AS AerPno
    FROM WORKS_ON
    GROUP BY Essn
)
SELECT Essn,
    (SELECT COUNT(*)
    FROM cte other
    WHERE this.Essn <> other.Essn
        AND NOT EXISTS (SELECT 1
        FROM unnest(this.AerPno) AS nThis(value)
        FULL JOIN unnest(other.AerPno) AS nOther(value) ON nOther.value = nThis.value
        WHERE nOther.value IS NULL OR nThis.value IS NULL
    ) AS 
FROM cte this;

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply