how to properly index a varchar column that won't use LIKE

All we need is an easy explanation of the problem, so here it is.

I’ve been experimenting with EXPLAIN and noticed that the row count was big for our query so I tried adding an index to the VARCHAR column. First I added FULLTEXT but it ended up just making the query longer so I checked it with EXPLAIN and it was still showing all the rows. but when I tried to do EXPLAIN on the same table but using another indexed column which is an INT the row count was only 1.

I also tried changing from FULLTEXT to just a normal INDEX but it still showed the same result.

Will VARCHAR always result in a high row count even if you INDEX it? Is there any way to improve its performance?

Also, the column is not unique.

Query Time Before FULLTEXT:

'query' => 'select `id` from `pdf_packages` where `private_reference_number` = ? and `pdf_packages`.`deleted_at` is null limit 1',
    'bindings' => 
    array (
      0 => '558376067',
    ),
    'time' => 64.42,

Query Time After FULLTEXT:

'query' => 'select `id` from `pdf_packages` where `private_reference_number` = ? and `pdf_packages`.`deleted_at` is null limit 1',
    'bindings' => 
    array (
      0 => '292314725',
    ),
    'time' => 129.89,

EXPLAIN WITH FULLTEXT:

how to properly index a varchar column that won't use LIKE

EXPLAIN WITH INDEX:

how to properly index a varchar column that won't use LIKE

EXPLAIN WITH INDEX INT COLUMN:

how to properly index a varchar column that won't use LIKE

I’ve already removed the INDEX for the column "private_reference_number" so you won’t see it below.

SHOW CREATE TABLE:

CREATE TABLE `pdf_packages` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `organiser_id` int(11) NOT NULL,
  `pro_event_id` int(11) NOT NULL,
  `pro_order_id` int(11) NOT NULL,
  `pro_order_item_id` int(11) NOT NULL,
  `package_id` int(11) NOT NULL,
  `package_item_item_id` int(11) NOT NULL,
  `private_reference_number` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `reference_index` int(11) NOT NULL,
  `pro_attendee_id` int(11) DEFAULT NULL,
  `member_card_id` int(11) DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `deleted_at` timestamp NULL DEFAULT NULL,
  `is_printed` int(11) NOT NULL DEFAULT '0',
  `has_arrived` tinyint(4) NOT NULL DEFAULT '0',
  `arrival_time` datetime DEFAULT NULL,
  `seat_details` longtext COLLATE utf8_unicode_ci,
  `svg_seat_details` longtext COLLATE utf8_unicode_ci,
  `is_shared` tinyint(1) NOT NULL DEFAULT '0',
  `share_from_id` int(11) DEFAULT NULL,
  `owner_id` int(11) DEFAULT NULL,
  `gates` text COLLATE utf8_unicode_ci,
  `pro_order_edit_history_id` int(11) DEFAULT NULL,
  `hard_ticket_sent` tinyint(4) NOT NULL DEFAULT '0',
  `hard_ticket_sent_on` datetime DEFAULT NULL,
  `sent_by` int(11) DEFAULT NULL,
  `pdf_sent` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `from_access` tinyint(1) NOT NULL DEFAULT '0',
  `pro_promo_code_id` int(11) DEFAULT NULL,
  `printed_at` datetime DEFAULT NULL,
  `bundle_index` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `pdf_packages_organiser_id_pro_event_id_pro_order_id_index` (`organiser_id`,`pro_event_id`,`pro_order_id`),
  KEY `pdf_packages_pro_order_item_id_package_id_index` (`pro_order_item_id`,`package_id`),
  KEY `pdf_packages_package_item_item_id_pro_attendee_id_index` (`package_item_item_id`,`pro_attendee_id`),
  KEY `pdf_packages_member_card_id_share_from_id_index` (`member_card_id`,`share_from_id`),
  KEY `pdf_packages_pro_order_edit_history_id_index` (`pro_order_edit_history_id`),
  KEY `pdf_packages_pro_attendee_id_index` (`pro_attendee_id`),
  KEY `pdf_packages_pro_order_item_id_index` (`pro_order_item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=184980 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Given your query, I would add this index:

ALTER TABLE pdf_packages ADD INDEX (private_reference_number, deleted_at);

That matches the two conditions you have in your WHERE clause, so it should do the best job at reducing the number of examined rows. An index can help searching a VARCHAR column as well as an INT column.

However, if your query searches for a value that happens to occur on most of the rows in the table, the optimizer may reason that using the index isn’t going to help, so it just does a table-scan.

By analogy, this is the reason why common words aren’t included in the index at the back of a book. Why bother to list the word "the" in the index, followed by a list of every page number? It’s quicker to just read the book cover to cover.

In practice, MySQL’s optimizer skips the index if the value you search for occurs on about 20% of the rows or more. This is not a documented threshold, just something I’ve noticed.

If you think the optimizer has made a bad choice, you can override it by using an index hint. But most of the time, it makes the right choice.

A FULLTEXT index will be used only if you use the MATCH() AGAINST() predicate. I don’t see that in your query, so you don’t need a FULLTEXT index.

Method 2

(Focusing on the Title)

LIKE is useful in testing VARCHARs.

x LIKE 'abc'   -- this turns in to  x = 'abc'
x LIKE 'abc%'  -- will _probably_ use INDEX(x)
x LIKE '%abc'  -- cannot use INDEX(x)
x = 123        -- cannot use INDEX(x) because not comparing to string

The general rule about the last example: "Leading wildcard prevents use of an index."

FULLTEXT works only with MATCH(...) AGAINST(...)

(And I agree with Bill’s answer plus magicianiam’s Comment.)

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply