All we need is an easy explanation of the problem, so here it is.
I’m looking to keep a simple database for movies, with details like Name, YearofRelease, Genre etc. The key factor is the genres entry, where a movie could fall into multiple genres. Using a sql based approach, one way is to have a separate table for each genre and enter to movie id. Using a nosql mongodb approach, the generes can be stored as an array.
From the purpose of both storing and searching by genres, what is the better approach?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
The only real questions to mainly be considered when choosing between NoSQL and a Relational Database Management System (RDBMS) are:
- "Does the data have a well defined structure or is it very unstructured / variable in structure?"
- "Are the questions I need to ask of the data always simple?"
- And less important but valid: "Do I need to distribute the data across multi-nodes of servers in a cluster, or having one centralized server is sufficient?"
For #1 it sounds like you have a well structured set of data. In fact, it’s a very simple structure and should be very easy to store and maintain in a RDBMS. And if you go with a RDBMS then to support multiple
Genres for multiple
Movies all you would need is one additional linking table (aka bridge / junction table) called
MoviesGenres that would be many-to-many with
Movies by storing the
GenreId, one row per combination. You wouldn’t need a "separate table for each genre" to solve this, just the one additional table
MoviesGenres. So if
Genre4 then there would be two rows in your
MoviesGenres table for the records
(MovieABC, Genre1) and
For #2, that’s going to be up to your use cases. If you ever want to ask more complex questions like "How many movies with 3 or more genres were released after 2020?" or "Which movies have both Genre7 and Genre9, and were released before 2000?", etc, then a RDBMS makes it much easier to query these kinds of things and probably are more efficient at doing so (generally speaking).
For #3, this just depends on the scale of your data and your preference of how to maintain the hardware behind it. If you plan to have many terabytes of data and multi-billions of rows, do you prefer to pay for a multi-node cluster to horizontally scale the data at the tradeoff that it may not always be consistently in sync across every node in the cluster, or would you prefer to centralize all the data to one main server and scale vertically as the data grows. This is kind of a moot point nowadays, since even RDBMS now exist with features to support horizontal scaling too. So unless you foresee storing a very large amount of data upfront, I wouldn’t be as concerned with this question as the prior two.
In general, based on what information you have provided, my recommendation would be to use a SQL (RDBMS) solution so you have the flexibility to query it as you need in a performant manner, given the fact you have a well defined and simple structure for your data.
Your proposal would be a poor design. Have a table Genres holding the list of possibilities – horror, action, romcom etc. This table can also have a code or surrogate ID if they would prove useful. There are strong arguments both for and against.
Have a second table Movie_Genres which has one row for each genre applicable to each movie. For example Price And Predudice And Zombies may have three rows one each for action, horror and romance.
A relational DBMS would be perfect for this.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂