How to break down MySQL big table into multiple to solve operational issue

All we need is an easy explanation of the problem, so here it is.

This is my current table schema in MySQL
we are not facing any performance issue in our application but there are operational issue .
The size of this table is 7 TB (6TB data and 1 TB index ) and has 4 Billions rows .

Because of the this one table having this big size we are not able to do any alter table on this table .
We have to use percona which takes 1 week to complete .

so to handle this we have decided to break this table in to 3 .
There are two columns which stores xml file that we want to move to separate table and this two column alone takes 2.5 TB of storage .

DETAILS longtext, and SUMMARY varchar(4000) DEFAULT NULL,

Along with it we also want to move few more columns to another table so that all three tables will become lighter
like third table we want to move USES_TYPE varchar(255) NOT NULL, STEP_TYPE varchar(255) NOT NULL, NAME varchar(1500) DEFAULT NULL, and REMARKS varchar(1000) DEFAULT NULL,

CREATE TABLE `app_data` (
`ID` varchar(255) NOT NULL,
`USES_TYPE` varchar(255) NOT NULL,
`STEP_TYPE` varchar(255) NOT NULL,
`CUST_ID` varchar(255) DEFAULT NULL,
`DETAILS` longtext,
`DATE_TIME` datetime(6) DEFAULT NULL,
`GROUP_ID` varchar(255) DEFAULT NULL,
`SYSTEM_ID` varchar(255) DEFAULT NULL,
`NAME` varchar(1500) DEFAULT NULL,
`CUSTOMER_ID` varchar(255) DEFAULT NULL,
`REMARKS` varchar(1000) DEFAULT NULL,
`SUMMARY` varchar(4000) DEFAULT NULL,

PRIMARY KEY (`ID`),
KEY `IDX_APP_DATA_CID_OT` (`CUST_ID`,`USES_TYPE`) USING BTREE,
KEY `IDX_APP_DATA_SYSTEM_ID` (`SYSTEM_ID`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Now my final table will be comething like 

Table one 

CREATE TABLE `app_data_table1` (
`ID` varchar(255) NOT NULL,
`DETAILS` longtext,
`SUMMARY` varchar(4000) DEFAULT NULL;
)

Table Two would be like 

CREATE TABLE `app_data_table2` (
`ID` varchar(255) NOT NULL,
`USES_TYPE` varchar(255) NOT NULL,
`STEP_TYPE` varchar(255) NOT NULL,
`NAME` varchar(1500) DEFAULT NULL,
`REMARKS` varchar(1000) DEFAULT NULL;
)

and table three 

CREATE TABLE `app_data` (
`ID` varchar(255) NOT NULL,
`CUST_ID` varchar(255) DEFAULT NULL,
`DATE_TIME` datetime(6) DEFAULT NULL,
`GROUP_ID` varchar(255) DEFAULT NULL,
`SYSTEM_ID` varchar(255) DEFAULT NULL,
`CUSTOMER_ID` varchar(255) DEFAULT NULL,

PRIMARY KEY (`ID`),
KEY `IDX_APP_DATA_CID_OT` (`CUST_ID`) USING BTREE,
KEY `IDX_APP_DATA_SYSTEM_ID` (`SYSTEM_ID`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

I am new to data base but this is what i am coming up i know this is not optmised so my humble request is to pleae guide me on this.

Once we do this we need to use join to dispaly on UI or where ever we fetch so will that be slower ?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

There are other things you could and should do, also.

  • Normalization. For example, how many different values are there of GROUP_ID or SYSTEM_ID? Rather than using a string, use a TINYINT UNSIGNED — 1 byte and a range of 0..255, or maybe a SMALLINT UNSIGNED — 2 bytes and 0.65K. Etc.
  • Are any of the columns "rare"? I mean are they NULL in many of the rows? If so, design the split so that a new row is not a full 4 billion rows, but, instead, has only the needed rows. Then use LEFT JOIN when you need to get NULLs back.
  • How were the UUIDs generated? If they are standard, it is simple to compress down to BINARY(16) (16 bytes) from 36 bytes.
  • Don’t blindly use (255), find a sensible, but conservative limit. (This has a small impact on performance.)
  • What are the main queries now? And after moving to 3 tables? We should make sure the indexes are ‘right’ for all of them. If the split is done ‘wrong’ certain queries will become much slower and cannot be fixed by indexes.
  • DETAILS takes about 2.5TB? Probably that will shrink to less than 1TB if you compress it. But do it in the client, not the server. And then put the result in a MEDIUMBLOB. And uncompress in the client after fetching.
  • Consider PARTITION. It is unlikely to help anything. However, if you purge "old" data, the process is much faster than DELETE. It will not help with ALTER TABLE.
  • What do you use ALTER for? There are some cheap tricks to perform for some tasks.
  • The PK being varchar(255) sounds awful; let’s discuss alternatives. Does any combination of other columns provide a unique key?
  • Do SELECT AVG(LENGTH(...)), AVG(LENGTH(...)), ... FROM t to see what is likely to be the benefit from each column.

"Alter is used mainly for adding column and adding index" —

  • MySQL 8.0 has faster ALTER operations. Some are "INSTANT". You need to look carefully at the details to see which are fast versus which need a copy. And some flavors block other actions more than others.
  • If the only goal is to add a new column, consider adding a new table instead. This is similar to your original idea but without any of the downtime. It can also take advantage of having rows only where necessary (see my LEFT JOIN comment above).
  • But… If you need to search the new set (with multiple tables), some queries can be efficient, some will be terribly inefficient. Please provide some concrete examples of what you have done in the past or might do in the future; I will explain in more detail. Perhaps start a new Question and show the old CREATE TABLE and the new one, together with the SELECT that involves the new column; then ask about performance.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply