All we need is an easy explanation of the problem, so here it is.
This is my current table schema in MySQL
we are not facing any performance issue in our application but there are operational issue .
The size of this table is 7 TB (6TB data and 1 TB index ) and has 4 Billions rows .
Because of the this one table having this big size we are not able to do any alter table on this table .
We have to use percona which takes 1 week to complete .
so to handle this we have decided to break this table in to 3 .
There are two columns which stores xml file that we want to move to separate table and this two column alone takes 2.5 TB of storage .
DETAILS longtext, and
SUMMARY varchar(4000) DEFAULT NULL,
Along with it we also want to move few more columns to another table so that all three tables will become lighter
like third table we want to move
USES_TYPE varchar(255) NOT NULL,
STEP_TYPE varchar(255) NOT NULL,
NAME varchar(1500) DEFAULT NULL, and
REMARKS varchar(1000) DEFAULT NULL,
CREATE TABLE `app_data` ( `ID` varchar(255) NOT NULL, `USES_TYPE` varchar(255) NOT NULL, `STEP_TYPE` varchar(255) NOT NULL, `CUST_ID` varchar(255) DEFAULT NULL, `DETAILS` longtext, `DATE_TIME` datetime(6) DEFAULT NULL, `GROUP_ID` varchar(255) DEFAULT NULL, `SYSTEM_ID` varchar(255) DEFAULT NULL, `NAME` varchar(1500) DEFAULT NULL, `CUSTOMER_ID` varchar(255) DEFAULT NULL, `REMARKS` varchar(1000) DEFAULT NULL, `SUMMARY` varchar(4000) DEFAULT NULL, PRIMARY KEY (`ID`), KEY `IDX_APP_DATA_CID_OT` (`CUST_ID`,`USES_TYPE`) USING BTREE, KEY `IDX_APP_DATA_SYSTEM_ID` (`SYSTEM_ID`) USING BTREE, ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; Now my final table will be comething like Table one CREATE TABLE `app_data_table1` ( `ID` varchar(255) NOT NULL, `DETAILS` longtext, `SUMMARY` varchar(4000) DEFAULT NULL; ) Table Two would be like CREATE TABLE `app_data_table2` ( `ID` varchar(255) NOT NULL, `USES_TYPE` varchar(255) NOT NULL, `STEP_TYPE` varchar(255) NOT NULL, `NAME` varchar(1500) DEFAULT NULL, `REMARKS` varchar(1000) DEFAULT NULL; ) and table three CREATE TABLE `app_data` ( `ID` varchar(255) NOT NULL, `CUST_ID` varchar(255) DEFAULT NULL, `DATE_TIME` datetime(6) DEFAULT NULL, `GROUP_ID` varchar(255) DEFAULT NULL, `SYSTEM_ID` varchar(255) DEFAULT NULL, `CUSTOMER_ID` varchar(255) DEFAULT NULL, PRIMARY KEY (`ID`), KEY `IDX_APP_DATA_CID_OT` (`CUST_ID`) USING BTREE, KEY `IDX_APP_DATA_SYSTEM_ID` (`SYSTEM_ID`) USING BTREE, ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
I am new to data base but this is what i am coming up i know this is not optmised so my humble request is to pleae guide me on this.
Once we do this we need to use join to dispaly on UI or where ever we fetch so will that be slower ?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
There are other things you could and should do, also.
- Normalization. For example, how many different values are there of
SYSTEM_ID? Rather than using a string, use a
TINYINT UNSIGNED— 1 byte and a range of 0..255, or maybe a
SMALLINT UNSIGNED— 2 bytes and 0.65K. Etc.
- Are any of the columns "rare"? I mean are they NULL in many of the rows? If so, design the split so that a new row is not a full 4 billion rows, but, instead, has only the needed rows. Then use
LEFT JOINwhen you need to get NULLs back.
- How were the UUIDs generated? If they are standard, it is simple to compress down to
BINARY(16)(16 bytes) from 36 bytes.
- Don’t blindly use
(255), find a sensible, but conservative limit. (This has a small impact on performance.)
- What are the main queries now? And after moving to 3 tables? We should make sure the indexes are ‘right’ for all of them. If the split is done ‘wrong’ certain queries will become much slower and cannot be fixed by indexes.
DETAILStakes about 2.5TB? Probably that will shrink to less than 1TB if you compress it. But do it in the client, not the server. And then put the result in a
MEDIUMBLOB. And uncompress in the client after fetching.
PARTITION. It is unlikely to help anything. However, if you purge "old" data, the process is much faster than
DELETE. It will not help with
- What do you use
ALTERfor? There are some cheap tricks to perform for some tasks.
- The PK being varchar(255) sounds awful; let’s discuss alternatives. Does any combination of other columns provide a unique key?
SELECT AVG(LENGTH(...)), AVG(LENGTH(...)), ... FROM tto see what is likely to be the benefit from each column.
"Alter is used mainly for adding column and adding index" —
- MySQL 8.0 has faster ALTER operations. Some are "INSTANT". You need to look carefully at the details to see which are fast versus which need a copy. And some flavors block other actions more than others.
- If the only goal is to add a new column, consider adding a new table instead. This is similar to your original idea but without any of the downtime. It can also take advantage of having rows only where necessary (see my
LEFT JOINcomment above).
- But… If you need to search the new set (with multiple tables), some queries can be efficient, some will be terribly inefficient. Please provide some concrete examples of what you have done in the past or might do in the future; I will explain in more detail. Perhaps start a new Question and show the old
CREATE TABLEand the new one, together with the
SELECTthat involves the new column; then ask about performance.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂