All we need is an easy explanation of the problem, so here it is.
Given database of the website “eMovies” that manage data about cinema movies.
Write in SQL query the most common genre (that have the biggest number of movies)
The tables:
Actors
ActorID PK
Firstname
LastName
Directors
DirectorID PK
Firstname
LastName
Genres
GenreID PK
GenreDescr
Movies
MovieID PK
MovieName
Year
ActorsMovies
MovieID PK
ActorID PK
DirectorsMovies
MovieID PK
DirectorID PK
GenresMovies
MovieID PK
GenreID PK
My attempt:
SELECT GenresDescr
FROM GenresMovies NATURAL JOIN Genres
WHERE GenresMovies.GenresID=Genres.GenresID
GROUP BY GenreDescr
HAVING MAX(COUNT(GenresMovies.MovieID));
I’m stuck, my code does not work.
EDIT: attampt number 2 after using @Darwin von Corax hint:
SELECT Genres.GenreDescr,COUNT(GenresMovies.GenreID) AS MCount
FROM GenresMovies,Genres
WHERE GenresMovies.GenreID = Genres.GenreID
GROUP BY GenresMovies.GenreID,Genres.GenreDescr
HAVING MCount=MAX(MCount);
But still the code does not work
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
I think this one needs a subquery, although someone else may prove me wrong.
Lets take it step-by-step: First, lets get the genre of every movie in GenresMovies
, along with the description:
-- query 1 --
SELECT gm.GenreID, g.GenreDescr
FROM GenresMovies gm, Genres g
WHERE gm.GenreID = g.GenreID;
Note this is an alternate syntax for performing an inner join; I prefer it simply because it’s what I’m used to.
Now use an aggregate function to count the occurrences of each genre:
-- query 2 --
SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr
FROM GenresMovies gm, Genres g
WHERE gm.GenreID = g.GenreID
GROUP BY gm.GenreID;
Note that most databases require all columns not being aggregated to be listed in the GROUP BY
clause, in which case you would use
GROUP BY gm.GenreID, g.GenreDescr;
which in this case will produce the same result. Also note I’ve given the COUNT()
column an alias; this will be handy in the next step.
You now have a list of genres, along with the number of movies in each genre, and you want to find the largest value in that list. One way to do this is to treat the above query as though it were a table, and query it. This is known as a subquery (or derived table in this case); you simply take the previous query, wrap it in parentheses, and drop it into the outer query’s FROM
clause in place of a table name:
-- query 3 --
SELECT MAX(res.movieCount) AS movieCount, res.descr
FROM (
SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr AS descr
FROM GenresMovies gm, Genres g
WHERE gm.GenreID = g.GenreID
GROUP BY gm.GenreID, g.GenreDescr
) res;
The outer query has no GROUP BY
clause, as you want to aggregate a single value from the entire result of the inner query. The res
at the end simply gives a name to the subquery; this is required for syntactic completeness.
This solution returns a single row. For the situation where several genres are tied for first place, see below.
Note my use of indentation; the database doesn’t care about indentation, but you should, as it helps you to visualize the structure of your code.
Now the problem with query 3 is that – except for MySQL and not all its versions, even there – it doesn’t work. It will give some syntax error because for the second aggregation, this part:
SELECT MAX(res.movie_count) AS movie_count, res.descr
FROM (...) res;
the descr
is in the SELECT
list but not in the GROUP BY
list. And this is not allowed because we are aggregating many rows into one (to find the MAX
of the movie_count
) but if there are more than one, different descriptions, which one should be shown? The query is rejected. In MySQL, where it allowed, you’d get a random indeterminate result, one of the descriptions.
So, lets correct the query to get only the maximum of the movie_count:
-- query 4 --
SELECT MAX(res.movieCount) AS movieCount
FROM (
SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr as descr
FROM GenresMovies gm, Genres g
WHERE gm.GenreID = g.GenreID
GROUP BY gm.GenreID, g.GenreDescr
) res;
We can improve it a bit by removing the join to Genres
, since we don’t use any of its columns in the result:
-- query 5 --
SELECT MAX(res.movieCount) AS movieCount
FROM (
SELECT COUNT(gm.GenreID) AS movieCount
FROM GenresMovies gm
GROUP BY gm.GenreID
) res;
We now have two working queries that give us the maximum count of movies per genre. But we don’t have the genres themselves!
There are many ways to solve this. One would be to use this result and JOIN
it back to our original query. It gets a bit complicated but remember that we can just use any result as a table. Just remember to put it in parentheses and give a name (alias). check how we are combining queries 2 and 5:
-- query 6 --
SELECT q2.movieCount, q2.GenreDescr
FROM
( SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr
FROM GenresMovies gm, Genres g
WHERE gm.GenreID = g.GenreID
GROUP BY gm.GenreID, g.GenreDescr
) q2,
( SELECT MAX(res.movieCount) AS movieCount
FROM (
SELECT COUNT(gm.GenreID) AS movieCount
FROM GenresMovies gm
GROUP BY gm.GenreID
) res
) q5
WHERE q2.movieCount = q5.movieCount ;
This is one way to do it; there are others, but I hope this helps you understand the process of developing a solution.
It turns out that Access has the TOP
operator which can be combined with ORDER BY
to give for example, using TOP 1
, the first result only. But Access has a rather peculiar way of dealing with ties. Instead of returning only the 1st row, it returns all tied rows (similar to TOP 1 WITH TIES
in SQL Server). So, we can modify query 2 to get the same result as query 7 but with a far more simpler syntax:
-- query 8 --
SELECT TOP 1 COUNT(gm.GenreID) AS movieCount, g.GenreDescr AS descr
FROM GenresMovies gm, Genres g
WHERE gm.GenreID = g.GenreID
GROUP BY gm.GenreID, g.GenreDescr
ORDER BY movieCount DESC ;
In this query, the result, after the FROM
and GROUP BY
and the SELECT
have been calculated, are orderer by the movieCount
in reverse order. First the higher movie counts, then the lower, until the smallest count. Then, due to the TOP 1
only the rows that tie in first place, so only the highest movie counts are returned.
Method 2
This code is working well:
SELECT GenreDescr
FROM (SELECT Genres.GenreDescr,COUNT(GenresMovies.GenreID) AS MCount
FROM GenresMovies,Genres
WHERE GenresMovies.GenreID =Genres.GenreID
GROUP BY GenresMovies.GenreID,Genres.GenreDescr)
WHERE MCount=
(
SELECT MAX(MCount)
FROM (SELECT COUNT(GenresMovies.GenreID) AS MCount
FROM GenresMovies, Genres
WHERE GenresMovies.GenreID =Genres.GenreID
GROUP BY GenresMovies.GenreID)
);
Method 3
THis will only give you count of movies
after that you can apply having or where to get max etc
and for movie or genre name use inner join with their parent tables
Select Count(*),MovieID
From GenresMovies
Group By MovieID
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0