Query with EXISTS-subquery taking about 280 ms in 80 % of cases, and less than 1 ms in 20 % of cases

All we need is an easy explanation of the problem, so here it is.

I noticed a query that seemed to take way too long most of the time, so I investigated it with the Laravel query logger (to get the time and the actual SQL from the query builder).

Excerpt:

>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 278.46,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.72,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.67,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::findAllByNameOrAlias('Frankfurt University of Applied Sciences'); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where (`name` = ? or exists (select * from `company_aliases` where `companies`.`id` = `company_aliases`.`company_id` and `name` = ?)) and `companies`.`deleted_at` is null",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 298.88,
     ],
   ]

Notice the query times.

The use case: I am looking for companies by name, but we want to be able to specify aliases that identify the same company. The companies table has an index on name, and the company_aliases table has an index on name, too. company_aliases.company_id has a foreign key constraint referencing companies.id.

Cherry on top: The table company_aliases is empty.
EXPLAIN for the above query:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY companies ALL companies_name_index NULL NULL NULL 241068 Using where
2 MATERIALIZED company_aliases ref company_aliases_company_id_foreign,company_aliases_name_index company_aliases_name_index 1022 const 1 Using index condition

Leaving out the company_aliases table and searching in companies only, gives sub 1 ms run times consistently.

>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.84,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.95,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.83,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.73,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.6,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.9,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.86,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.64,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.53,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.47,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.55,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.53,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.66,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.79,
     ],
   ]
>>> DB::flushQueryLog(); DB::enableQueryLog(); Company::where('name', 'Frankfurt University of Applied Sciences')->first(); DB::getQueryLog()
=> [
     [
       "query" => "select * from `companies` where `name` = ? and `companies`.`deleted_at` is null limit 1",
       "bindings" => [
         "Frankfurt University of Applied Sciences",
       ],
       "time" => 0.84,
     ],
   ]

A search in company_aliases is as fast as a search in companies, so it’s a viable workaround to query companies and, if no result was found, query company_aliases. The total query time is still below 2 ms. Essentially, I am just moving the lazy evaluation of the OR into my PHP code. But this shouldn’t be faster than executing the same logic entirely in the DB.

What’s the reason?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

"A search in company_aliases is as fast as a search in companies, so it’s a viable workaround to query companies and, if no result was found, query company_aliases. The total query time is still below 2 ms. Essentially, I am just moving the lazy evaluation of the OR into my PHP code. But this shouldn’t be faster than executing the same logic entirely in the DB."
That’s not the same logic. The query is saying give me the rows in companies that either have the same name as this value or can be found in company_aliases with this value. There’s nothing which tells it to stop when it finds a match.

You can expand the OR condition manually:

select * 
from  `companies` 
where  `name` = ? 
and    `companies`.`deleted_at` is null
union all
select * 
from  `companies` 
where  exists (select null 
               from   `company_aliases` 
               where  `companies`.`id` = `company_aliases`.`company_id` 
               and    `name` = ?
              )
and    `companies`.`deleted_at` is null
and   ( `name` <> ? or `name` is null)

This should give you a plan which uses an index on companies.name to satisfy the first part, and it should use an index on company_aliases.name followed by a loop to companies based on id for the second part. Assuming those indexes exist, you should be fine.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply