Choosing between NoSQL and Relational databases while having both flexibility and consistency requirements

All we need is an easy explanation of the problem, so here it is.

I’ve never worked with databases professionally and only toyed with database coding using a small sqlite db for my own music collection. Now I’m asked to build an internal database system at work for multiple users. After basic reading of NoSQL vs. relational databases, I’d like to have some ideas on how to choose the right technology.

Context

  • Before this database, the users work with spreadsheets entirely.
  • Those sheets differ in structure (schema) from project to project.
  • Sometimes the same project has several different sheets that need to be isolated but still exist in the same database.
  • We want to import all those sheets into a database and with new fields added and old fields names/values updated on the fly without breaking the original spreadsheets
  • When source control ops happen, we want to be able to connect to the database and update certain fields regarding the source control management status
  • The database will be used by various users from different locations, all through the in-house network for now. There won’t be a lot of users like on Facebook or Twitter.
  • All the data must be localized into multiple languages and ideally co-exist in the same database.

Questions

  • Since we’ll have different sheet structures, is NoSQL a better option than a relational database?
  • The users will go on using spreadsheets and may change their sheet structure on the fly. Will this be a lot of trouble for NoSQL or relational databases such as SQLite?
  • NoSQL does not support the ACID of relational databases, does it mean that data may get corrupted when multiple users work on the same record? We normally would like consistency.
  • If we maintain an optimized database but still want to be able to export selected portions or entire as CSV or JSON files? Will NoSQL or relational work better?
  • How to handle access control from scratch if we do it online? Will this affect the choice of database tech?
  • For this type of system, should we hire a database expert dev and/or maintainer? Can we build on some free/open-source infrastructure that requires the least amount of backend/frontend, security tech R&D?

Thanks for your input in advance!

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

To answer your questions directly:

  1. "Since we’ll have different sheet structures, is NoSQL a better option than a relational database?"

    A. Probably not, unless each of those sheet structures are very tightly related as one application that they’d all be stored in the same few tables. Otherwise it sounds like you just have a series of structured datasets (multiple applications) that can be represented each with their own different set of tables. This is quite normal and can easily be implemented in either a Relational Database Management System (RDBMS) or a NoSQL database, so no differentiation here.

  2. "The users will go on using spreadsheets and may change their sheet structure on the fly. Will this be a lot of trouble for NoSQL or relational databases such as SQLite?"

    A. Are the spreadsheets consumers of the data that was saved down to the database, i.e. besides writing to the database do the spreadsheets also need to read back from the database? If so, then changing structures is something you’ll need to handle regardless of which type of database you use, the problem just occurs at different layers of the application. With a RDBMS you need a process to handle managing schema changes in the database that have occurred at the client, otherwise your database won’t be storing any new data that is entered. With a NoSQL database, your problem exists in the other direction, when pulling data back into the client’s new schema but the instance of data from NoSQL is still out of date and doesn’t match the client. With Excel, it may actually be more forgiving though and just leave those columns blank. But keep in mind Consistency is also different in NoSQL should you choose to distribute the data across multiple nodes. It is eventually consistent, which means one user of a spreadsheet may receive a different version of the schema vs another user of that same spreadsheet, if the data from the newer structure hasn’t been replicated to the other node yet. More of this on the answer to your next question.

  3. "NoSQL does not support the ACID of relational databases, does it mean that data may get corrupted when multiple users work on the same record? We normally would like consistency."

    A. NoSQL follows the BASE principles (and is bound by the CAP theorem), which means unlike a RDBMS that follows the ACID principals, it is eventually consistent. This means a change to the database, including the change of the data itself, is not guaranteed to instantly be replicated across all nodes of which that database is distributed across. But it will be eventually replicated and consistent across all nodes. In regards to your question, that does not mean data is any more likely to be corrupted (from the system’s perspective) if you use a NoSQL database vs a RDBMS and multiple users are changing the same record. Rather, it just means that the same record could be in multiple different states at the same time between different nodes, until the last in (generally) change becomes synchronized across all nodes of that database so that it becomes eventually consistent. A RDBMS generally employs an algorithm with a locking mechanism to ensure that consistency is immediate and that the same record is never in more than one state at a time.

  4. "If we maintain an optimized database but still want to be able to export selected portions or entire as CSV or JSON files? Will NoSQL or relational work better?"

    A. At a high level, I don’t believe you’ll find any differences here between either. It’s an equally achievable goal for both types of database systems, and the differences will be very low level detail specific as you’re getting into it. Some RDBMS natively support storing and/or exporting the data as JSON and CSV formats, and typically NoSQL databases already store (or retrieve) data as JSON and there should be ways to convert the results to CSV with modern implementations of those databases.

  5. "How to handle access control from scratch if we do it online? Will this affect the choice of database tech?"

    A. By "doing it online" do you mean a cloud-based solution? If so, this shouldn’t change anything from a security standpoint. Either way, all modern database systems have a way to setup accounts either directly in the database system itself or mapped to another security system. For example, the Microsoft SQL Server database system offers the ability to either create dedicated Logins to the database or leverage Active Directory such that you can define access control within the database on top of Windows Users and Groups, and this still functions the same even when using Azure (Microsoft’s cloud solution) to host your database. Similar ideas apply to other modern database systems too.

  6. "For this type of system, should we hire a database expert dev and/or maintainer? Can we build on some free/open-source infrastructure that requires the least amount of backend/frontend, security tech R&D?"

    A. You sound like you have a lot of questions and thinking to do around the problems you’re planning to solve, so if it’s possible, hiring someone with strong database experience (especially one who has worked with both RDBMS and NoSQL) would be only of a benefit to you. Currently with the information you provided and based on the questions you’ve asked, I don’t objectively see any distinguishing factor to choose one type of database system over another. My only subjective opinion would be to go with a RDBMS since it sounds like your data is structured at least (even if it does change – that’s fine in a RDBMS) and if the eventual consistency of NoSQL can be an issue for your use-cases. If you’re looking for an open source and free RDBMS then I’d recommend looking into PostgreSQL.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply