Can't get this GROUP BY or DISTINCT right

All we need is an easy explanation of the problem, so here it is.

On MySQL 5.7.34 I have this simple db structure. Important columns are domain name, domain total links in, but entries are duplicate (once for each found link from various sources), I need to select domain only once (link_to_domain) and sort it by link_to_domain_total_links_in DESC; Not sure how to render it here, here is the query:

CREATE TABLE IF NOT EXISTS `domain_to_domain_links` (
  `id` int(11) NOT NULL,
  `link_to_domain_hash` varchar(16) NOT NULL,
  `link_to_domain_total_links_in` int(11) NOT NULL DEFAULT '0',
  `link_to_domain` varchar(128) NOT NULL,
  `link_from_domain` varchar(128) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=270245 DEFAULT CHARSET=utf8;


INSERT INTO `domain_to_domain_links` (`id`, `link_to_domain_hash`, `link_to_domain_total_links_in`, `link_to_domain`, `link_from_domain`) VALUES
(1, 'c9b13094745bae79', 3, 'example.com', 'from-other-site1.com'),
(1, 'c9b13094745bae79', 3, 'example.com', 'from-other-site2.com'),
(1, 'c9b13094745bae79', 3, 'example.com', 'from-other-site3.com'),
(2, 'c43f16c897f72994', 2, 'foo.com', 'from-other-site4.com'),
(3, 'c43f16c897f72994', 2, 'foo.com', 'from-other-site5.com');

This returns exactly what I need but from what I understand it actually counts the entries (as links) at query time, I need it to get it from the link_to_domain_total_links_in so that it runs faster:

SELECT link_to_domain, COUNT(*) AS my_links_counter 
FROM domain_to_domain_links 
GROUP BY link_to_domain 
ORDER BY COUNT(*) DESC;

Also link_to_domain_hash is indexed, if it can take advantage of that when selecting it should be faster than link_to_domain, but this is not critical.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If I understand your table correctly, link_to_domain_total_links_in represents the number of rows that exist in the table for that particular link_to_domain.

Instead of COUNT() you can use the MAX() or MIN() aggregate function on the link_to_domain_total_links_in column (since it’ll always be the same for any instance of link_to_domain).

But at that rate you can just add link_to_domain_total_links_in to the GROUP BY clause, and then you’re allowed to SELECT and / or ORDER BY it.

But then you’re not aggregating anything anymore and don’t even need a GROUP BY clause, rather you can just use the DISTINCT keyword against your two columns like so:

SELECT DISTINCT 
    link_to_domain,
    link_to_domain_total_links_in AS my_links_counter 
FROM domain_to_domain_links 
ORDER BY link_to_domain_total_links_in DESC;

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply