All we need is an easy explanation of the problem, so here it is.
I wrote a function on nodejs to SELECT 150 million records, process the data and UPDATE those same 150 million records using
UPDATE 'table' SET value1='somevalue' WHERE idRo2 = 1;
Single Update for each row concatenate send one single query string allowing multipleStatements querys on the db connection.
I’ve being encountering multiple errors as
- Error: Connection lost: The server closed the connection.
- RangeError: Invalid string length.
- Killed process.
I think i might not be using the right technologies or programing technique
The data process i need to do is to take the ‘wholeApiResponse’ column value, parse that string and then insert the parsed values into new added columns (askPrice, lastPrice, lowPrice, highPrice, volumePrice, platformTimestamp), this end up modifying the existing row by adding new values from an existing string.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Big Updates (and Deletes) are problematic.
Plan A: Avoid the big update. If value1 should always be ‘somevalue’ when idRo2 = 1, then don’t store it in the table; store it elsewhere and use a
JOIN. Then, instead of checking 150M rows, you are changing exactly 1 row.
Plan B: Do the Update in chunks of 1K rows at a time. This avoids timeouts, and a number of other potential problems. Details: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . Even 10K can be problematic, hence my recommendation of 1K. Anyway, going beyond 1K is getting into "diminishing returns".
Plan C: Tells us what
idRo2 are really called. That might lead to some specific suggestions.
Minor issues with datatypes. Shrinking the table size will help performance some.
INT SIGNEDhas a limit of 2 billion. 150M is getting kinda close to it. Keep your eye on any
idCurrencyPairmay be wasting space. (
INTtakes 4 bytes; there are smaller datatypes)
DOUBLEcan lead to rounding errors. And it take 8 bytes. Consider
platformTimestamp DOUBLE— Huh?
- I’ll bet that
LONGTEXTis rarely used? We should discuss better ways to store it.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂