Converting a MySQL Subquery to a JOIN for performance

All we need is an easy explanation of the problem, so here it is.

I took a look at my old accounting system, and it seems that performance is taking a role in the daily labor of the employees using it. So, I discovered that using a subquery was the problem, I’ve been reading, testing and, it seems that using a JOIN is like 100x faster as the data that we have in our Databases is huge now.

How do I can convert this subquery into a JOIN?

I’m seeking for help because I’m trying, but I’m being unable to do it, and I’m starting to think that this is not possible.

$sql = "SELECT  orders.order_id, orders.order_time, orders.order_user,
        orders.payment_state, orders.order_state, orders.area_name,
        ( SELECT  COUNT(*)
            FROM  order_item
            WHERE  order_item.order_id = orders.order_id
        ) AS items_number
        FROM  orders
        WHERE  orders.order_state = 1
        AND  order_time BETWEEN DATE_SUB(NOW(), INTERVAL 365 DAY) AND NOW()";

Being specific, the data we are retrieving here is all the rows created in the last year from the orders table AND the number of items purchased in each order, which is called from the subquery as items_number from order_item table WHERE order_id is equal in each table.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I would not do what another Answer suggests — it involves the "explode-implode", which is even slower.

I would start by seeing if this helped much:

 orders:  INDEX(order_state, order_time)

Your correlated subquery is not necessarily inefficient in this case. (That 100X quote is based on too few examples to be trustworthy.)

This avoids the explode-implode and turns the subquery into a "derived table" (which is similar) but it needs to be executed only once.

SELECT  o.order_id, o.order_time, o.order_user,
        o.payment_state, o.order_state, o.area_name,
        i.items_number
    FROM ( SELECT order_id,
                  COUNT(*) AS items_number
               FROM order_item
               GROUP BY order_id ) AS i
    JOIN orders AS o  ON o.order_id = i.order_id
    WHERE  order_state = 1
      AND  order_time >= NOW() - INTERVAL 365 DAY

If you need "zero" values for orders without items, a minor change can achieve that; let me know.

The index above is needed here. Also, if you don’t already have order_id indexed in `order_item, add that.

Method 2

A simple way to do this would be like so:

SELECT ord.order_id, ord.orden_time, ord.orden_user, ord.payment_state, ord.order_state,
       ord.area_name, COUNT(itm.item_id) AS items_number
  FROM orders ord INNER JOIN order_item itm ON ord.order_id = itm.order_id
 WHERE ord.order_state = 1
   and ord.order_time BETWEEN DATE_SUB(NOW(), INTERVAL 365 DAY) AND NOW()
 GROUP BY ord.order_id, ord.orden_time, ord.orden_user, ord.payment_state, ord.order_state,
          ord.area_name

This query assumes that every orders record will contain a minimum of one order_item record, which makes sense. Be sure to change item_id to whatever the primary key is for the order_item table.

As an aside, if you would like your BETWEEN statement to be a little more fixed in time, you might want to use DATE_FORMAT() to ensure the start time is always 00:00:00 rather than the time of day when the query was run:

and ord.order_time BETWEEN DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 365 DAY), '%Y-%m-%d 00:00:00') AND NOW()

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply