Poor Performance when using Spatial Indexes in MySQL

I'm trying a little experiment at pushing a data set which is not geo-spatial but fits it quite well and am finding the results somewhat unsettling. The data set is genomic data e.g. the Human Genome where we have a region of DNA where elements like genes occupy specific start and stop coordinates (our X axis). We have multiple regions of DNA (chromosomes) which occupy the Y axis. The goal is to bring back all the items which intersect two X coordinates along a single Y coordinate e.g. LineString(START 1, END 2).

The theory seemed sound so I pushed it into an existing MySQL based genome project and came up with a table structure like:

CREATE TABLE `spatial_feature` ( `spatial_feature_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `external_id` int(10) unsigned NOT NULL, `external_type` int(3) unsigned NOT NULL, `location` geometry NOT NULL, PRIMARY KEY (`spatial_feature_id`), SPATIAL KEY `sf_location_idx` (`location`) ) ENGINE=MyISAM;

external_id represents the identifier of the entity we have encoded into this table & external_type encodes the source of this. Everything looked good and I pushed in some preliminary data (30,000 rows) which seemed to work well. When this increased past the 3 million row mark MySQL refused to use the spatial index and was slower when it was forced to use it (40 seconds vs. 5 seconds using a full table scan). When more data was added the index started to be used but the performance penalty persisted. Forcing the index off brought the query down to 8 seconds. The query I'm using looks like:

select count(*) from spatial_feature where MBRIntersects(GeomFromText('LineString(7420023 1, 7420023 1)'), location);

The data going into this is be very dense along the Y dimensions (think of it like you've recorded the position of every building, telephone box, post box and pigeon on a very long road). I've done tests of how R-Indexes behave with this data in Java as well as others in the field have applied them to flat-file formats with success. However no one has applied them to databases AFAIK which is the goal of this test.

Has anyone out there seen a similar behaviour when adding large quantities of data to a spatial model which is not very disparate along a particular axis? The problem persists if I reverse the coordinate usage. I'm running the following setup if that's a cause

  • MacOS 10.6.6
  • MySQL 5.1.46

Help!

Also bringing in explain plan in

+----+-------------+-----------------+------+-----------------+------+---------+------+---------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------------+------+-----------------+------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | spatial_feature | ALL | sf_location_idx | NULL | NULL | NULL | 3636060 | 33.33 | Using where | +----+-------------+-----------------+------+-----------------+------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

The re-written SQL looks like the following

select count(0) AS `count(*)` from `arabidopsis_thaliana_core_7_60_9`.`spatial_feature` where intersects(geometryfromtext('LineString(7420023 1, 7420023 1)'),`arabidopsis_thaliana_core_7_60_9`.`spatial_feature`.`location`)

Still not highlighting why this query's performance is so poor

After reading the article posted by @Fraser from rickonrails it seems like the problem is all to do with the index not being in memory. If I apply similar techniques to those mentioned in the article (making key buffer very big indeed) and I then force the query to use the index query times plumet. We still see a lag between querying a region & then searching for a subset of the region but it's all pointing to getting the load of the indexes correct.

What's the moral of the story? R-Indexes in MySQL have quite poor performance until they are in memory and then they have excellent performance. Not really a good solution for what I wanted to do wit them but still it provides an interesting angle on MySQL.

Thanks for all the help people.

-------------Problems Reply------------

From the EXPLAIN planwe see that although the spatial might be used for the query ('possible_keys' column) , it is not used (NULL in 'key' column). I am not sure why it is not selected automatically, but you may explicitly instruct MySql to use the index by specifying it in the query using a 'force index' clause:

select count(*)
from spatial_feature
force index (sf_location_idx) -- <== this is the 'force index' clause
where MBRIntersects(GeomFromText('LineString(7420023 1, 7420023 1)'), location);

The goal is to bring back all the items which intersect two X coordinates along a single Y coordinate

Have you considered using an index wit multiple fields? Like:

CREATE INDEX spacial_search ON spatial_feature(y, x)

If you are working with a limited set of y values this is the way to.

I have a degree in Genetics and I am a programmer, you don't need be using an X and a Y as your nomenclature it'll get far too faffy... you need a start and a stop position (not an "axis") and a chromosome number. You index by the chromosome number first then the position and then you index the position then the chromosome. (Question: Are you dealing in eukaryotes or chromosomes that can have two reading frames?)

EG: (where "x" = position and "y" = chromosome)

CREATE INDEX spatial_index_1 ON spatial_feature(chromosome, position);
CREATE INDEX spatial_index_2 ON spatial_feature(position, chromosome);

Incidentally Chromosomes are very long strings (just like data) you can (to speed things up dump them as such as blobs (i.e. coding genes and junk DNA)

Are you sure a relational database is the way to go? If I were you I'd look at pushing your datasets to Solr or Elastic Search (probably storing the master datasets elsewhere). These engines are built for indexing, you will notice the difference in response times.

Category:mysql Views:3 Time:2011-01-21

Related post

  • Error creating spatial index on MySql BLOB column 2009-08-15

    I am trying to add a spatial index to a table column named Location of type BLOB. If I try this: ALTER TABLE route ADD SPATIAL INDEX(Location); I get: Error: BLOB/TEXT column 'Location' used in key specification without a key length But in the offici

  • Spatial Index in MySQL - ERROR - Cannot get geometry object from data you send to the GEOMETRY field 2011-05-03

    I am new to the whole 'spatial index' thing, but it seems to be the best solution for filtering based on latitude/longitude. So I added a column to my table: So I created a geometry field: ALTER TABLE `addresses` ADD `point` POINT NOT NULL And then I

  • Poorly performing Mysql subquery - can I turn it into a Join? 2010-10-28

    I have a subquery problem that is causing poor performance... I was thinking that the subquery could be re-written using a join, but I'm having a hard time wrapping my head around it. The gist of the query is this: For a given combination of EmailAdd

  • Designing a normalize database for getting 5 nearest neighbor using MySQL spatial index 2011-03-11

    I will be using MySQL with spatial index for a database, then i will query for 5 nearest neighbors from a given point(lat/long), these points will be interest points in Dominican Republic. Once that i get the 5 nearest points i will need the Name of

  • How to use foreign keys and a spatial index inside a MySQL table? 2010-02-02

    Our project keeps world data base in tree-structure inside INNODB table of MySQL db. Earth is the root, then countries, then country regions and the locations are the leaves. A foreign-key is used to provide a fast access to children (for example cit

  • In MySQL, does putting SELECT foo increase in performance if foo is indexed? 2010-09-04

    In MySQL, does putting SELECT foo increase in performance if foo is indexed? At RedditMirror.cc, I have a database with 1.2 million records in the GrabbedSites table, a number that increases by approx 500-2000 per day. Early in my career, I was mento

  • MySQL spatial index doesn't work when equating MBRContains() to TRUE? 2010-10-08

    I have a SQL query that seems to be producing correct results but according to EXPLAIN isn't using the spatial index and so is taking much longer than necessary to return all the rows. SELECT * FROM listings2 WHERE MBRContains( GeomFromText('POLYGON(

  • How can I go about diagnosing a poor performing Rails application running on mysql? 2011-01-21

    I've taken over the maintenance of a Rails 2.3.4 application running on ruby 1.8.7 and mysql. The application is a modified Radiant CMS 0.8.1 site with custom extensions. Currently, the production environment is experiencing frequent periods of slow

  • SQL Server 2008 Performance on nullable geography column with spatial index 2011-02-10

    I'm seeing some strange performance issues on SQL Server 2008 with a nullable geography column with a spatial index. Each null value is stored as a root node within the spatial index. E.g. A table with 5 000 000 addresses where 4 000 000 has a coordi

  • Selecting a good SQL Server 2008 spatial index with large polygons 2010-05-27

    I'm having some fun trying to pick a decent SQL Server 2008 spatial index setup for a data set I am dealing with. The dataset is polygons, representing contours over the whole globe. There are 106,000 rows in the table, the polygons are stored in a g

  • Spatial index slowing down query 2012-03-14

    Background I have a table that contains POLYGONS/MULTIPOLYGONS which represent customer territories: The table contains roughly 8,000 rows Approximately 90% of the polygons are circles The remainder of the polygons represent one or more states, provi

  • indexes in mysql SELECT AS or using Views 2009-02-23

    I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help. Earlier I asked how to get distinct values from a joined query http://stackoverflow.com/questions/508707/mysql-count-only-for-distinct-values-in-joined-que

  • Please can anyone explain How to read a Execution Plan for Poorly Performed Query in SQL Server? 2009-04-12

    Please can anyone explain How to read a Execution Plan for Poorly Performed Query in SQL Server? --------------Solutions------------- Check out some of the MANY resources on execution plans on the web, e.g.: Execution plan basics Graphical execution

  • Spatial Index Query in php 2009-06-18

    I'm trying to write a query in PHP for a spatial index. But I'm having trouble writing the query for the following line of code... SET @p = CONCAT('Polygon((',lat1,'',lon1,',',lat1,'',lon2,',',lat2,'',lon2,',',lat2,'',lon1,',',lat1,'',lon1,'))'); ---

  • Can I create a "Covering, Spatial" index in SQL Server 2008? 2009-08-28

    I currently have a site with a table that has Lat/Long float columns, and an index over those 2 columns plus another one I need to retrieve. I'm constantly querying this table to get the rows that fall within a radius from a certain point (I'm actual

  • What is a SPATIAL INDEX and when should I use it? 2010-02-13

    Like most of the average PHP web developers I use MySql as a RDBMS. MySql (as other RDBMS also) offers SPATIAL INDEX features, but I'm don't get it very well. I have googled for it but didn't find clear real world examples to clarify my bad knowledge

  • Google app engine: Poor Performance with JDO + Datastore 2010-04-17

    I have a simple data model that includes USERS: store basic information (key, name, phone # etc) RELATIONS: describe, e.g. a friendship between two users (supplying a relationship_type + two user keys) COMMENTS: posted by users (key, comment text, us

  • How to add indexes to MySQL tables? 2010-06-09

    I've got a very large MySQL table with about 150,000 rows of data. Currently, when I try and run SELECT * FROM table WHERE id = '1'; the code runs fine as the ID field is the primary index. However, recently for a development in the project, I have t

  • Is it possible to drop a primary key index in MySQL? 2010-08-13

    Is it possible to drop a primary key index in MySQL (version 5.1.47). This is to optimize the insert performance. --------------Solutions------------- If you're sure this is what you want to do (note Paul Tomblin's comment on your question), then fro

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.087 (s). 11 q(s)