All we need is an easy explanation of the problem, so here it is.
There is a table with compound key:
t_count + pageType VARCHAR(32) ascii_bin + pageId VARCHAR(255) utf8_bin + count INT PRIMARY(pageType, pageId)
Its purpose is to track how many times each page uniquely defined by ((pageType, pageId) has been viewed. So a query should either insert a new row or increment count on the existing one. How to do it in an effective manner in MySQL?
The table will be updated from multiple concurrent threads serving pages for different clients. So the effectiveness is measured in low to no locking and the ability of the query design to operate in highly concurrent environment.
Database version is:
How to solve :
That is what "IODKU" (aka "upsert") is made for:
INSERT INTO t_count (pageType, pageId, `count`) VALUES (?, ?, 1) ON DUPLICATE KEY UPDATE `count` = VALUES(`count`) + 1
You can have multiple concurrent threads "simultaneously" doing that query without any problems.
Scaling: If you expect to increment the counter more than 100 times per second (1000 if using SSD), then we need to discuss alternatives. (Spoiler: I will discuss a variation on http://mysql.rjweb.org/doc.php/summarytables )
Another tip: Have this SQL in a transaction by itself. (If it were part of a bigger transaction, it would be blocking other threads unnecessarily long.)
Do you need the count as you go along? If not, drop the count column. Insert a row for each event as it happens. Aggregate the rows afterwards in some async batch process. Maybe have a monotonic system-generated number if you only want to touch each event once. If you use a datetime instead of an integer you can get aggregates across pages over time as well as over pages.
Up front I’m going to say I’m no expert on MySQL’s internals. Take the following as unproven until you’ve researched the documentation and tested it in your own environment for yourself.
The type of workload you describe can be slow because each write requires the previous thread’s write to complete so it knows what new value to write. If the previous thread wrote value 6 my thread has to read that 6 in order to increment it to 7, and the thread after me has to wait on my write so it can read 7 and increment it to 8. In the worst case, with a single page being served by all clients, the system becomes, essentially, single-threaded at this count.
By performing inserts instead of updates the inter-thread dependency is removed. Each thread can write its event when it’s ready to. Throughput is limited only by the ability of the storage to persist bytes.
This approach treats the receiving table as a log. You want to choose an implementation which facilitates this. Likely there will be no indexes on this table as they increase the number of IOs required per event and are a further source of possible contention. Different storage engines have different support and performance characteristics for un-sequenced, store-it-anywhere-you-can writes.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂