How to improve performance to 12million rows table?

All we need is an easy explanation of the problem, so here it is.

I have performance issues on a query in a big table (12 million records) based on Geonames, that’s a read-only database so NO DELETE, UPDATE or INSERT only SELECT.

There are queries I make every now and then filtering by different columns that are not keys (latitude and longitude, fcode and country, only name, etc..), the thing is that with my server resources it takes more than 30 seconds to complete them.

I have made views and small tables (clone of the big table but with only data from one country) to check how to improve it.

With the views, I get similar results than in the big table and using explain I have seen that views check the same amount of rows as the big table (12million rows)

In one of the small tables I get less than 200 milliseconds, more or less depending on the table size.

I’m not a database expert but duplicating data in small tables feels awkward. I’m not sure if that’s the best approach that can be done there.

All queries are being sent from my backend, so no stored procedures.

The queries done filtering by primary keys works blazing fast though!.

Thanks in advance for any advice!

How to improve performance to 12million rows table?

How to improve performance to 12million rows table?

How to improve performance to 12million rows table?

UPDATE FOR COMMENTS

  • Table definition
CREATE TABLE `geoname` (
  `geonameid` INT,
  `name` VARCHAR(200),
  `asciiname` VARCHAR(200),
  `alternatenames` VARCHAR(4000),
  `latitude` DECIMAL(10,7),
  `longitude` DECIMAL(10,7),
  `fclass` VARCHAR(1),
  `fcode` VARCHAR(10),
  `country` VARCHAR(2),
  `cc2` VARCHAR(60),
  `admin1` VARCHAR(20),
  `admin2` VARCHAR(80),
  `admin3` VARCHAR(20),
  `admin4` VARCHAR(20),
  `population` INT,
  `elevation` INT,
  `gtopo30` INT,
  `timezone` VARCHAR(40),
  `moddate` DATE,
  PRIMARY KEY (geonameid)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • What kind of indexes could I add there?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I think I have been using that table. It is clunky when you want to see states/provinces and other things like that.

Sure break out a country if that is all you need. But don’t plan on breaking out all ~250 countries into separate tables (plus continents, etc).

VIEWs are not performance enhancers. They can hide the clumsy nature a table like that one. (Especially due to the fcode checks.)

This may help:

INDEX(fcode, country_code)

WHERE feature_code LIKE 'PCL%' AND ...
WHERE feature_code = 'ADM1' AND country_code = 'ES'

If you would care to provide the desired queries (not the views) and the desired table(s), I may be able to provide more suggestions.

Lat/lng searches

Lat/lng needs more work than simply a composite index. Suggest you start with a "bounding box" and these two composite indexes:

INDEX(lat, lng),
INDEX(lng, lat)

If such searches are not fast enough, then look at more complex methods in http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

Method 2

Hey I made a couple of changes that have improved my overall select queries performance in big table.

As I said before I’m only doing readings so i found a couple of tips here

I applied some of them and worked!

  • Changed my defaults table engine to MyISAM, it was been set to InnoDB by default

  • Added fields I use as filtering value as KEYs to the table, for example, asciiname, latitude and longitude are my new keys in geoname table

  • Limited size of result set with LIMIT, for example if I know I’m looking for only one result I don’t need the database pointer to move over all 12 million registers, just stop when found the limit number setted

  • I kept with partitioning the big table into smaller tables for focused country queries.

Though the performance over queries using latitude and longitude improved i think its not enough yet

Some said that maybe for coordinates I should use OpenGIS geometry model and Spatial data types which represent some kind of polygons made by latitude and longitude points

Generate a table with that index only and geonameid, then simply look coordinates in that table calculating the polygon to match, and return its geonameid to match on the big table.

What do you think that approach?

Im providing users the list of countries so the country ISO is a data known, i use it to point to the suitable country table desired.

I decided to partition it as im not worried about data growth, and actually its not a CRUD database

Some of my queries now look like this and are blazing fast (couple of hundred millis compared to almost almost 20 seconds)

SELECT * FROM geonames.country_".strtoupper($country_iso)."
WHERE asciiname = ? AND alternatenames <>'' AND fclass = 'P'
GROUP BY country;

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply