Return NULL when distinct values else return value

All we need is an easy explanation of the problem, so here it is.

Below is an example of the type of data I get (collected by different users):

name surname
Moe Momo
Moe Momo
Jack JAJA
Jack Jacky

I would like to find when two users have collected different surnames for the same name.

The output I’m trying to get is:

name surname
Moe Momo
Jack NULL

I would see the surname if all users have collected the same, and NULL if there are differences.

I tried searching the internet but I’m not able to describe what I’m searching properly.

I tried a request using CASE, but with no success.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

This can be solved using COUNT(DISTINCT ...). Group the results by name. Count distinct last names per first name. If the count differs from 1, show the last name as a null, otherwise show the actual last name e.g. using MAX, like this:

SELECT
  name
, surname = CASE COUNT(DISTINCT surname) WHEN 1 THEN MAX(surname) END
FROM
  dbo.People
GROUP BY
  name
;

You have to apply an aggregate function to surname because the grouping is by name only. Since you only show it when the distinct count is 1, it should not matter much which instance you pick, since they are all the same. MIN would work as well.

Method 2

You can just group by first_name, and compare the MIN with the MAX and see if they are the same.

SELECT
  n.first_name,
  CASE WHEN MIN(n.last_name) = MAX(n.last_name) THEN MIN(n.last_name) END AS last_name
FROM #names n
GROUP BY
  n.first_name;

db<>fiddle

Method 3

Another solution would be using the window function:

with cte as 
( select n.*, 
         count(*) over (partition by first_name, last_name) as cnt
from #names n
) select distinct first_name, 
         case when cnt > 1 then last_name else NULL end as last_name
  from cte ;

The partition by first_name, last_name part will count for the first_name/last_name combined. If the cnt > 1 then same user have same first and last name.

Demo

Note. If in your dataset you have same user with twice same first_name and last_name and once with differ first_name and last_name for the differ part it will return null like below example:

Moe  | Momo
Moe  | Momo
Jack | JAJA
Jack | Jacky
Jack | Jacky

The result would be :

first_name  last_name
 Jack        null
 Jack        Jacky
 Moe         Momo

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1f604e29be13490818f5e1875e8719eb

Method 4

There are more than a few ways to do this, but to move forward with your CASE statement approach, you just need to include some aggregates, similar to the following:

SELECT first_name, CASE WHEN repeats > 1 THEN last_name ELSE NULL END as last_name
FROM
(
    SELECT *, COUNT(*) as repeats
    FROM #names
    GROUP BY first_name, last_name
) t
GROUP BY first_name, CASE WHEN repeats > 1 THEN last_name ELSE NULL END

Here’s the full dbfiddle.uk for reference

There are probably more efficient approaches out there as well, but this should at least get you the limited results you’re looking for.

Method 5

This is a modification of Andriy’s answer. It considers a NULL in the second column to be distinct:

CREATE TABLE #names
(
    first_name varchar(255) NOT NULL,
    last_name varchar(255) NULL
);

INSERT #names
    (first_name, last_name)
VALUES
    ('Moe', 'Momo'),
    ('Moe', 'Momo'),
    ('Jack', 'JAJA'),
    ('Jack', 'Jacky'),
    ('Paul', 'White'),
    ('Paul', NULL);

SELECT 
    N.first_name, 
    last_name =
        CASE
            WHEN COUNT_BIG(DISTINCT N.last_name) = 1      -- one non-null value
                AND COUNT_BIG(*) = COUNT_BIG(N.last_name) -- no nulls
            THEN MAX(N.last_name)
            ELSE NULL
        END
FROM #names AS N
GROUP BY N.first_name;
first_name last_name
Jack NULL
Moe Momo
Paul NULL

db<>fiddle online demo

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply