How to speed up query with exists clause?

All we need is an easy explanation of the problem, so here it is.

I have requirement to identify customers using multiple emails for the same billing address. There are 414375 records. I let the query run for 20 minutes and it still did not return. How do I speed it up?

select so.customer_email
from sales_order so
inner join sales_order_address soab on soab.parent_id = so.entity_id and soab.address_type ='billing'
where exists
(
    select 1 FROM sales_order so2
    inner join sales_order_address soab2 on soab2.parent_id = so2.entity_id and soab2.address_type ='billing'
    where so2.customer_email <> so.customer_email
    and soab.firstname = soab2.firstname
    and soab.lastname = soab2.lastname
    and soab.street = soab2.street
    and soab.city = soab2.city
    and soab.region = soab2.region
)

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

See if this gives what you need, but faster:

SET @@group_concat_max_len = 8000;

SELECT GROUP_CONCAT(customer_email SEPARATOR '  ')
    FROM sales_order AS so
    WHERE EXISTS ( SELECT 1 FROM sales_order_address 
                       WHERE address_type = 'billing'
                         AND parent_id = so.entity_id )
    GROUP BY first_name, last_name, street, city, region
    HAVING COUNT(*) > 1;

(The output won’t be formatted the same.)

Another possibility:

SELECT GROUP_CONCAT(customer_email SEPARATOR '  ')
    FROM ( SELECT so.first_name, so.last_name, so.street, so.city, so.region,
                  so.customer_email
            FROM sales_order AS so
            JOIN sales_order_address AS soa
            WHERE soa.address_type = 'billing'
              AND soa.parent_id = so.entity_id ) AS x
    GROUP BY first_name, last_name, street, city, region
    HAVING COUNT(*) > 1;

Indexes that may help:

sales_order_address:  INDEX(address_type, parent_id)
sales_order:          INDEX(entity_id)   -- unless it is the PRIMARY KEY

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply