Finding the most common genre for "eMovies"

All we need is an easy explanation of the problem, so here it is.

Given database of the website “eMovies” that manage data about cinema movies.

Write in SQL query the most common genre (that have the biggest number of movies)

The tables:

Actors 
    ActorID      PK
    Firstname 
    LastName

Directors 
    DirectorID   PK
    Firstname 
    LastName

Genres 
    GenreID      PK
    GenreDescr 

Movies 
    MovieID      PK
    MovieName 
    Year

ActorsMovies 
    MovieID      PK
    ActorID      PK 

DirectorsMovies 
    MovieID      PK
    DirectorID   PK 

GenresMovies 
    MovieID      PK
    GenreID      PK 

My attempt:

 SELECT GenresDescr
 FROM GenresMovies  NATURAL JOIN Genres
 WHERE GenresMovies.GenresID=Genres.GenresID
 GROUP BY GenreDescr
 HAVING  MAX(COUNT(GenresMovies.MovieID));

I’m stuck, my code does not work.


EDIT: attampt number 2 after using @Darwin von Corax hint:

  SELECT Genres.GenreDescr,COUNT(GenresMovies.GenreID) AS MCount
  FROM GenresMovies,Genres
  WHERE GenresMovies.GenreID = Genres.GenreID
  GROUP BY GenresMovies.GenreID,Genres.GenreDescr
  HAVING MCount=MAX(MCount);

But still the code does not work

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I think this one needs a subquery, although someone else may prove me wrong.

Lets take it step-by-step: First, lets get the genre of every movie in GenresMovies, along with the description:

-- query 1 --
SELECT gm.GenreID, g.GenreDescr
  FROM GenresMovies gm, Genres g
  WHERE gm.GenreID = g.GenreID;

Note this is an alternate syntax for performing an inner join; I prefer it simply because it’s what I’m used to.

Now use an aggregate function to count the occurrences of each genre:

-- query 2 --
SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr
  FROM GenresMovies gm, Genres g
  WHERE gm.GenreID = g.GenreID
  GROUP BY gm.GenreID;

Note that most databases require all columns not being aggregated to be listed in the GROUP BY clause, in which case you would use

  GROUP BY gm.GenreID, g.GenreDescr;

which in this case will produce the same result. Also note I’ve given the COUNT() column an alias; this will be handy in the next step.

You now have a list of genres, along with the number of movies in each genre, and you want to find the largest value in that list. One way to do this is to treat the above query as though it were a table, and query it. This is known as a subquery (or derived table in this case); you simply take the previous query, wrap it in parentheses, and drop it into the outer query’s FROM clause in place of a table name:

-- query 3 --
SELECT MAX(res.movieCount) AS movieCount, res.descr
  FROM (
    SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr AS descr
      FROM GenresMovies gm, Genres g
      WHERE gm.GenreID = g.GenreID
      GROUP BY gm.GenreID, g.GenreDescr
  ) res;

The outer query has no GROUP BY clause, as you want to aggregate a single value from the entire result of the inner query. The res at the end simply gives a name to the subquery; this is required for syntactic completeness.

This solution returns a single row. For the situation where several genres are tied for first place, see below.

Note my use of indentation; the database doesn’t care about indentation, but you should, as it helps you to visualize the structure of your code.

Now the problem with query 3 is that – except for MySQL and not all its versions, even there – it doesn’t work. It will give some syntax error because for the second aggregation, this part:

SELECT MAX(res.movie_count) AS movie_count, res.descr
  FROM (...) res;

the descr is in the SELECT list but not in the GROUP BY list. And this is not allowed because we are aggregating many rows into one (to find the MAX of the movie_count) but if there are more than one, different descriptions, which one should be shown? The query is rejected. In MySQL, where it allowed, you’d get a random indeterminate result, one of the descriptions.

So, lets correct the query to get only the maximum of the movie_count:

-- query 4 --
SELECT MAX(res.movieCount) AS movieCount
  FROM (
    SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr as descr
      FROM GenresMovies gm, Genres g
      WHERE gm.GenreID = g.GenreID
      GROUP BY gm.GenreID, g.GenreDescr
  ) res;

We can improve it a bit by removing the join to Genres, since we don’t use any of its columns in the result:

-- query 5 --
SELECT MAX(res.movieCount) AS movieCount
  FROM (
    SELECT COUNT(gm.GenreID) AS movieCount
      FROM GenresMovies gm
      GROUP BY gm.GenreID
  ) res;

We now have two working queries that give us the maximum count of movies per genre. But we don’t have the genres themselves!

There are many ways to solve this. One would be to use this result and JOIN it back to our original query. It gets a bit complicated but remember that we can just use any result as a table. Just remember to put it in parentheses and give a name (alias). check how we are combining queries 2 and 5:

-- query 6 --
SELECT q2.movieCount, q2.GenreDescr
  FROM  
    ( SELECT COUNT(gm.GenreID) AS movieCount, g.GenreDescr
        FROM GenresMovies gm, Genres g
        WHERE gm.GenreID = g.GenreID
        GROUP BY gm.GenreID, g.GenreDescr
    ) q2,
    ( SELECT MAX(res.movieCount) AS movieCount
        FROM (
          SELECT COUNT(gm.GenreID) AS movieCount
            FROM GenresMovies gm
            GROUP BY gm.GenreID
        ) res
    ) q5 
  WHERE q2.movieCount = q5.movieCount ;

This is one way to do it; there are others, but I hope this helps you understand the process of developing a solution.

It turns out that Access has the TOP operator which can be combined with ORDER BY to give for example, using TOP 1, the first result only. But Access has a rather peculiar way of dealing with ties. Instead of returning only the 1st row, it returns all tied rows (similar to TOP 1 WITH TIES in SQL Server). So, we can modify query 2 to get the same result as query 7 but with a far more simpler syntax:

-- query 8 --
SELECT TOP 1 COUNT(gm.GenreID) AS movieCount, g.GenreDescr AS descr
  FROM GenresMovies gm, Genres g
  WHERE gm.GenreID = g.GenreID
  GROUP BY gm.GenreID, g.GenreDescr
  ORDER BY movieCount DESC ;

In this query, the result, after the FROM and GROUP BY and the SELECT have been calculated, are orderer by the movieCount in reverse order. First the higher movie counts, then the lower, until the smallest count. Then, due to the TOP 1 only the rows that tie in first place, so only the highest movie counts are returned.

Method 2

This code is working well:

  SELECT GenreDescr
  FROM (SELECT Genres.GenreDescr,COUNT(GenresMovies.GenreID) AS MCount
  FROM GenresMovies,Genres
  WHERE GenresMovies.GenreID =Genres.GenreID 
  GROUP  BY GenresMovies.GenreID,Genres.GenreDescr)  
  WHERE MCount=
 (
  SELECT MAX(MCount)
  FROM (SELECT COUNT(GenresMovies.GenreID) AS MCount
  FROM  GenresMovies, Genres
 WHERE GenresMovies.GenreID =Genres.GenreID
 GROUP BY GenresMovies.GenreID)
 );

Method 3

THis will only give you count of movies
after that you can apply having or where to get max etc
and for movie or genre name use inner join with their parent tables

Select Count(*),MovieID
From     GenresMovies 
Group By MovieID

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply