All we need is an easy explanation of the problem, so here it is.
I am using MySQL Percona server.
This table needs to do fast bulk inserts using ‘INSERT INTO … VALUES’ for smaller requests (about 100k records) and using ‘LOAD DATA INFILE’ for larger (up to 10kk records).
Also, there will be selects with full table scan (using indexes there will be like 10-30 million of records out of total 100kk records), they also need to be done as fast as possible.
I tried myisam and got around 50k records inserted per second, with InnoDB that was only about 25k. This numbers are OK, but i would be glad, if i could improve them.
How would you organize such table and what setting to mysql server you would apply? We’ve got a server with 128G of RAM, i think it might be helpfull.
I did some research on this matter, but still feel myself uncompetent. If you have managed similar tables – please, share your expirience, thank you.
How to solve :
The problem with MySQL’s InnoDB engine is that the storage architecture is essentially using index-organized tables, which will unfortunately experience performance issues for large numbers of inserts when the target table is large. The only case in which this can be mitigated is if you have sequential inserts, that is, the primary key of the set of inserts follows a sequence.
For high ingestion rates of data, different technologies are recommended, such as Cassandra which uses a log-structured merge tree, or in your case, I might recommend TokuDB, which has been acquired by Percona, and which uses a technology called a fractal tree index, which is essentially creative way to cache, pre-sort, and batch insert into your index-organized table (of course, there’s much more to it than that, but no need to go into the details here.)
Long story short, if you have moderately high ingestion rates, you need a heap organized table structure implemented by databases like Oracle and PostgreSQL, and if you have very high ingestion rates, you need to use a database like Cassandra or TokuDB.
The question is much too broad and vague.
Is this a Data Warehouse application and you are loading the Fact table? Will there be Summary queries? Summary Tables?
Is this a logging application?
Do you purge the data after some period of time? (This factors into whether partitioning should be set up now.)
What settings did you try on InnoDB?
innodb_flush_log_on_trx_commit = ? Load into MyISAM, then process into InnoDB?
How many secondary indexes are there? This can have a big impact on performance, especially if the cache (key_buffer_size or innodb_buffer_pool_size) is not big enough.
Settings for *_instances?
Once you have ingested the data, what will you do with it? Sometimes that controls the choice of
Did you find that
INSERT INTO worked faster for smaller sets than
LOAD DATA? If not, then why not use
LOAD DATA always?
Size –> speed. Let’s see
SHOW CREATE TABLE and discuss whether you can shrink the datatypes.
Are you normalizing the data as it comes in? (Sounds like not.) Would normalization shrink the data significantly? If so, let’s discuss how you can fold that into the ingestion with minimal impact. http://mysql.rjweb.org/doc.php/staging_table
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂