All we need is an easy explanation of the problem, so here it is.
I’m using MySQL and I’m wondering if it’s a good strategy to presort my data so when a user accessed the information, it’s not having to sort it on the fly?
Basically, I have an HTML table with is being populated with paginated data from the database, this is ordered by a particular column and can sometimes be a little sluggish – I was thinking about reordering the table on a nightly basis so the
order by can be removed from the query.
Is this general practice or should I avoid this?
My query is as follows:
'select keyword, position, impressions, clicks, ctr from keywords where profile_id=%s order by impressions desc limit %s, %s', (profile_id, start, end))
My table looks like this:
+---------------------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------------+---------------+------+-----+---------+----------------+ | id | bigint(20) | NO | PRI | NULL | auto_increment | | profile_id | int(11) | YES | MUL | NULL | | | landing_page_id | int(11) | YES | MUL | NULL | | | keyword | varchar(2083) | YES | | NULL | | | position | int(11) | YES | MUL | NULL | | | impressions | int(11) | YES | MUL | NULL | | | ctr | float | YES | MUL | NULL | | | clicks | int(11) | YES | MUL | NULL | | | unique_key | varchar(200) | YES | UNI | NULL | | | position_30_days | int(11) | YES | | NULL | | | impressions_30_days | int(11) | YES | | NULL | | | clicks_30_days | int(11) | YES | | NULL | | | ctr_30_days | float | YES | | NULL | | | position_60_days | int(11) | YES | | NULL | | | impressions_60_days | int(11) | YES | | NULL | | | clicks_60_days | int(11) | YES | | NULL | | | ctr_60_days | float | YES | | NULL | | | position_90_days | int(11) | YES | | NULL | | | impressions_90_days | int(11) | YES | | NULL | | | clicks_90_days | int(11) | YES | | NULL | | | ctr_90_days | float | YES | | NULL | | +---------------------+---------------+------+-----+---------+----------------+
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Storing the data in an ordered way maybe useful in some rare cases, but it doesn’t guarantee that the selected rows will be ordered. You will have to use
order by to guarantee the order of the returned rows.
Is it general practice? I don’t think so.
Should I avoid this? Yes, at least for this specific case
To let this query run faster, and reduce the sorting process, add a composite index on profileid ASC and on impressions DESC:
ALTER TABLE keywords ADD INDEX (profile_id ASC, impressions DESC).
IMPORTANT: drop the other index on
profile_id (The name of the index you should drop will be displayed if you run “
SHOW CREATE TABLE keywords“)
Other factors that could affect the performance:
The cardinality, or data distribution. For example, some profiles may have much more entries than others. A useful way to check that is:
`SELECT profile_id, count(*) cc FROM keywords GROUP BY profile_id ORDER BY cc ASC limit 10;` `SELECT profile_id, count(*) cc FROM keywords GROUP BY profile_id ORDER BY cc DESC limit 10;`
If the numbers are hugely different, the same query may vary in performance based on the number of rows a profile has.
If a profile has huge number of rows, using limit x, y will gradually worsen when x (the offset) get’s higher.
DESCRIBE is not as descriptive as
SHOW CREATE TABLE; we can’t see what indexes you have.)
SELECT would benefit from this ‘composite’ index:
INDEX(profile_id, impressions) -- in that order.
Do you have a keyword that is 2083 characters long? If not, why have such a big
Why have both
unique_key some form of UUID? They are notoriously inefficient when the table gets huge.
LIMIT ?, ? ... ($start, $end) — The two numbers in
By using the index, above, and changing to “remember where you left off”, you can make the
ORDER BY...LIMIT work a lot faster. Details . This suggestion, if practical for your application, will be faster (at least after the first ‘page’) than your original question about ordering the data could ever be! Why? Because
OFFSET (the first number in
LIMIT) requires work. My blog show how to get rid of that work.
When you could have multiple rows with the same value, and you need to be deterministic in ordering:
ORDER BY profile_id DESC, impression DESC, id DESC) INDEX (profile_id, impression, id)
- In the
ORDER BY, all the items are in the same direction (
DESCis usually what is wanted).
- Mixing ASC and DESC prevents use of the index (until 8.0).
- Since you are looking for a single
DESCon it have identical effect.
To deal with a ‘compound’ $leftoff, let’s look at the above example. After assuming that
profile_id is constant, we want to somehow remember where you left off as a pair of
$impression, $id, then do
WHERE impression <= $impression AND ( impression < $impression OR id < $id )
alternatively (and it is unclear whether these optimize differently in different versions of mysql):
WHERE ( impression = $impression AND id < $id OR impression < $impression )
You can’t remove the
ORDER BY from the query. SQL is set-based and as such is unordered. What you could do, however, if you can’t optimise your query, is using MySQL’s “poor man’s materialized view”: a secondary table that is regularly updated from your primary table.
How often your secondary table needs to be updated, depends on how often the data in your primary table changes and how quickly you need that change to be reflected on your web page.
DROP TABLE IF EXISTS my_secondary_table; CREATE TABLE my_secondary_table SELECT * FROM my_primary_table ORDER BY my_ordering;
Now you need to schedule this to happen once a night or once an hour, depending on your needs.
Note that the secondary table is inaccessible while it is dropped and created again.
You can specify your columns and create a primary key for your secondary table that aligns with your ordering. Since MySQL stores the data of a table “behind” its primary key, retrieval in that order should be quite fast.
If you cannot use any (combination) of your columns as the primary key, you can emulate a row numbering in your select and use that as the primary key, as follows.
CREATE TABLE my_secondary_table (pk INT, PRIMARY KEY(pk)) SELECT @rownum := @rownum + 1 AS pk, mpt.* FROM ( SELECT * FROM my_primary_table ORDER BY my_ordering ) mpt CROSS JOIN (SELECT @rownum := 0) rn;
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂