All we need is an easy explanation of the problem, so here it is.
H2 is a single threaded database with a good reputation regarding performance. Other databases are multi-threaded.
My question is: when does a multi-thread database become more interesting than an single thread database? How many users? How many processes? What is the trigger? Anyone has experience to share?
- The usual bottleneck is disk access
- SSD’s are fast, but fragile (failure procedure is a must)
- One long query on a single thread system will block all others
- Configuring multi-threading system can be tricky
- Multithreaded databases are beneficial even on single core systems
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Here is my opinion:
Usually the bottleneck (or slowest part) of a DB system is the disk. The CPU only spikes during arithmetic operations, processing, or any other task that the CPU does. With proper architecture, multithreading can help to offset the load of a query onto the CPU instead of doing the slow disk reads/writes. There are cases where it is faster to calculate a value using the CPU cycles rather than to create a computed column (that was previously saved to disk) and read this column from disk.
In some RDBMS there is a temporary DB (tempdb) that is used by all the DBs on that instance for sorting, hashing, temporary variables, etc… Multithreading and splitting up this tempdb files can be used to improve the throughput of the tempdb, thereby improving overall server performance.
Using multithreading (parallelism), the result set of a query can be split up to be processed on the different cores of the server, rather than using one core alone. This feature does not always improve the performance, but there are cases where it does, and hence the feature is available.
The threads available to the DB are used for many purposes: reading/writing to disk, user connections, background jobs, locking/latching, network IO, etc… Depending on the OS architecture the threads are preemptively fed to the CPU and are managed using waits and queues. If the CPU can crunch these threads pretty quickly then the wait times will be low. A multi-threaded DB will be faster than a single-threaded DB, as in a single-threaded DB there will be the overhead of recycling only one thread rather than having other treads readily available.
Scalability also becomes an issue, as more threads will be required to manage and execute the scaled DB system.
If there is one thing I can say about MySQL is that InnoDB, its transactional (ACID-compliant) storage engine, is indeed multithreaded. However, it is as multithreaded as YOU CONFIGURE IT !!! Even right “out of the box,” InnoDB performs great in a single CPU environment given its default settings. To take advantage of InnoDB multithreading capabilities, you must remember to activate a lot of options.
innodb_thread_concurrency sets the upper bound on number of concurrent threads that InnoDB can hold open. Best round number to set for this is (2 X Number of CPUs) + Number of Disks. UPDATE : As I learned firsthand from the Percona NYC Conference, you should set this to 0 in order to alert InnoDB Storage Engine to find the best number of threads for the environment it is running in.
innodb_concurrency_tickets sets the number of threads that can bypass concurrency checking with impunity. After that limit is reached, thread concurrency checking becomes the norm again.
innodb_commit_concurrency sets the number of concurrent transactions that can be committed. Since the default is 0, not setting this allows any number of transactions to commit simultaneously.
innodb_thread_sleep_delay sets the number of milliseconds an InnoDB thread can be dormant before reentering the InnoDB queue. Default is 10000 (10 sec).
innodb_replication_delay imposes thread delay on a slave is innodb_thread_concurrency is reached.
innodb_read_ahead_threshold allows linear readings of the set number of extents (64 pages [page = 16K]) before switching to asynchronous reading.
Time would escape me if I named more options. You can read about them in MySQL’s Documentation.
Most people are unaware of these features and a quite satisfied with InnoDB just doing ACID-compliant transactions. If you tweak any of these options, you do so at your own peril.
I have played with MySQL 5.5 Multiple Buffer Pool Instances (162GB in 9 buffer pools instances) and have attempted to have data auto-partitioned in memory this way. Some experts say that this should give you 50% performance improvement. What I got was a ton of thread locking that actually made InnoDB crawl. I switched to 1 buffer (162GB) and all was well again in the world. I guess you need Percona experts at your disposal to set this. I’ll be at the Percona MySQL Conference in New York tomorrow and will ask about this if opportunity affords itself.
In conclusion, InnoDB behaves well now in a multi CPU server given its default settings for multithreaded operations. Tweaking them takes great care, great patience, great documentation, and great coffee (or Red Bull, Jolt, etc.).
Good morning, good evening, and good night !!!
UPDATE 2011-05-27 20:11
Came back from Percona MySQL Conference in New York on Thursday. What a conference. Learned a great deal, but I got an answer I will look into concerning InnoDB. I was informed by Ronald Bradford that setting innodb_thread_concurrency to 0 will let InnoDB decide the best course of action internally with thread concurrency. I will experiment with this further in MySQL 5.5.
UPDATE 2011-06-01 11:20
As far as one long query goes, InnoDB is ACID-compliant and operates very well using MultiVersion Concurrency Control. Transactions should be able carry isolation levels (repeatable reads by default) that prevents blocking others from accessing data.
As for multi core systems, InnoDB has come a long way. In the past, InnoDB could not perform well in a multicore environment. I remember having to run multiple mysql instances on a single server to get the multiple cores to distribute the multiple mysqld processes across the CPUs. This is no longer necessary, thanks to Percona, and later MySQL (eh, Oracle, saying that still makes me gag), as they have developed InnoDB into a more mature storage engine that can access the cores with simplicity without much tuning. The current instance of InnoDB today can operate well in a single core server.
As soon as you have multiple concurrent users or processes, or even a single process with multi-threaded database access, having a database that supports threading will become potentially interesting.
H2 is thread-safe, but serializes all requests to the database, which may become a potential performance issue in a heavy load scenario. Whether this is actually the case for a particular project depends on a combination of your performance requirements, the number of threads/users/processes accessing the database, the frequency of queries executed by these threads, and the average and worst-case performance of your queries.
For instance if your performance requirements are to have a response within a second, you have no more than 10 concurrent users executing a single query that takes 0.05 seconds to execute, a single-threaded database would still allow you to hit those goals (though multithreaded would likely already give a noticeable performance boost). Given the same scenario with a single potential query with a worst-case performance of half a second though, serializing your database access won’t allow you to meet your performance goals anymore.
If you’re currently using H2 on your project, I would advise you to run a profiler against your codebase under a load scenario (just kick off an x number of threads hitting your code concurrently using some typical usecases). This will give you actual metrics regarding the performance and bottlenecks in your codebase, instead of just theorizing. If this shows your requests spending a large percentage of their time just waiting to access the database, it’s time to move to a threaded database.
From what I can tell, “single-threaded” is a bit of a misnomer for H2. The point is that it serializes all transactions (ie does them one at a time).
The crucial question regarding whether that is “ok” or not for your application is not “How many users?” or even “How many processes?”, but “How long are my transactions going to take?”
If all your transactions are sub-second that may be fine, if some take several hours to complete, that may not be fine as all other pending transactions will be waiting for them to finish. The decision about whether that is “fine” or not will depend on your own performance requirements – ie how long is an acceptable wait for my users hitting the database with transactions.
It seems that H2 doesn’t really serialize transactions – just DML. In other words lots of short updates within a single long transaction will not block other updates. However unless you are using the experimental MVCC feature, table locking means this has a similar effect in practice. There is also an experimental “multi_threaded” feature but it cannot be used at the same time as MVCC
Quoting bits and pieces from the PostgreSQL site… Please note that I’ve absolutely no idea of the merits of these arguments — they merely didn’t fit in a comment.
From the Developer FAQ ("Why threads aren’t used…"):
Threads are not currently used instead of multiple processes for backends because:
- An error in one backend can corrupt other backends if they’re threads
within a single process
- Speed improvements using threads are small compared to the remaining
backend startup time.
- Sharing of read-only executable mappings and the use of shared_buffers
means processes, like threads, are very memory efficient
- Regular creation and destruction of processes helps protect against memory
fragmentation, which can be hard to manage in long-running processes
From the Todo list ("Features we do not want"):
All backends running as threads in a single process (not wanted)
This eliminates the process protection we get from the current setup. Thread creation is usually the same overhead as process creation on modern systems, so it seems unwise to use a pure threaded model, and MySQL and DB2 have demonstrated that threads introduce as many issues as they solve. (…)
So, again… I’ve absolutely no idea of the merits of the above. It was merely too long to fit in a comment.
A multithreaded database will only benefit you when you have more than 1 parallel query going to the database. It depends on number of users you have. If you have more than ten users working on the application at the same time, most likely they are going to produce more than one query on the database at the same time.
Moreover, a multithreaded database can only benefit when there’s multi core on the CPU. If there’s single core, multi threaded database has to queue the job and execute them sequentially on the single core. When there’s multi-core, each core can run one thread in parallel. Thus better performance.
Does this answer your query?
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂