Saturday, May 11, 2013

MySQL & GIS & The Haversine formula and distance between two points.

MySQL is not the database that comes to mind first when people think of GIS.  The databases listed below are:

  • Oracle
  • Microsoft SQL Server
  • IBM DB2
  • IBM Informix
  • PostgreSQL

http://webhelp.esri.com/arcgisserver/9.3/java/index.htm#geodatabases/types_of_geodatabases.htm
http://webhelp.esri.com/arcgisserver/9.3/java/index.htm#geodatabases/determ-1479045992.htm


MySQL has been working with GIS for sometime now, it goes back to MySQL 4.1:
Mark Maunder's post above is a great look at GIS. This blog post will focus more on formulas but I wanted to point out Mark's nice post.

The use of a SPATIAL index works with the MyISAM storage engine, so if you are on MySQL 5.5 or above keep that in mind as Innodb is the default storage engine.

> CREATE TABLE geom (
-> lat float(10,7) NOT NULL,
-> lon float(10,7) NOT NULL,
-> g GEOMETRY NOT NULL,
-> SPATIAL INDEX(g)
-> ) ENGINE=Innodb;
ERROR 1464 (HY000): The used table type doesn't support SPATIAL indexes
> CREATE TABLE geom (
-> lat float(10,7) NOT NULL,
-> lon float(10,7) NOT NULL,
-> g GEOMETRY NOT NULL,
-> SPATIAL INDEX(g)
-> ) ENGINE=MyISAM;

So with Mark's schema design I populated the lat/long of some cities in China.
This data was gathered via here http://www.infoplease.com/ipa/A0001769.html

CREATE TABLE china (
cityname varchar(50) DEFAULT NULL,
lat float(10,7) NOT NULL,
lon float(10,7) NOT NULL,
g GEOMETRY NOT NULL,
SPATIAL INDEX(g)
) ENGINE=MyISAM;

INSERT INTO china VALUES ('Beijing', 39.55, 116.25, GeomFromText('POINT(39.55 116.25)'));
INSERT INTO china VALUES ('Canton', 23.7, 113.15, GeomFromText('POINT(23.7 113.15)'));
INSERT INTO china VALUES ('Chongqing', 29.46, 106.34, GeomFromText('POINT(29.46, 106.34)'));
INSERT INTO china VALUES ('Hong Kong', 22.20, 114.11, GeomFromText('POINT( 22.20 114.11)'));
INSERT INTO china VALUES ('Shanghai', 31.10, 121.28, GeomFromText('POINT(31.10 121.28)')); 

Just to make sure it all worked....

select lat, lon from china;
+------------+-------------+
| lat        | lon         |
+------------+-------------+
| 39.5499992 | 116.2500000 |
| 23.7000008 | 113.1500015 |
| 22.2000008 | 114.1100006 |
| 31.1000004 | 121.2799988 |
+------------+-------------+


So lets us check the distance from Beijing to Hong Kong.
A quick look online tells us it is 1963 km but let us check our data.


Now it was time to dig into the world of distance formulas..  I am not a GIS focused DBA, I admit that no problem. I enjoyed more of the range of answers to the math formulas I encountered, otherwise I would have used the table information above more. Instead it is just a reference for you.  I took at look at the widely debated ways to calculate distance between two lat/long points. You will find a lot of different functions, procedures and etc online to calculate this.

First I set some variables because I wanted to test the formulas

SET @lat1 =39.55;
SET @long1 =116.25;
SET @lat2 =22.20;
SET @long2 =114.11;

For example:

This website has a function for Distance. I have added it below to ensure that you can cut and paste it with the Delimiter options in place for you. But does it work?  BTW I also updated the radius of the earth value to 3959.
http://www.sqlexamples.info/SPAT/mysql_distance.htm

delimiter //
CREATE FUNCTION fn_distance
(p_x1 FLOAT, p_y1 FLOAT, p_x2 FLOAT, p_y2 FLOAT)
RETURNS FLOAT
DETERMINISTIC
BEGIN
DECLARE v_dist FLOAT;
DECLARE A FLOAT; DECLARE B FLOAT;
DECLARE C FLOAT; DECLARE D FLOAT;
/*
returns distance calculation between two points in LAT-LONG coordinates
*/

SET v_dist = 0;

-- convert to radians
SET A = p_x1 / 57.29577951;
SET B = p_y1 / 57.29577951;
SET C = p_x2 / 57.29577951;
SET D = p_y2 / 57.29577951;

IF (A = C && B = D) THEN
SET v_dist = 0;
ELSEIF ((sin(A)*sin(C)+cos(A)*cos(C)*cos(B - D)) > 1) THEN
SET v_dist = 3959 * acos(1);
ELSE
SET v_dist = 3959 *acos(sin(A)*sin(C) + cos(A)*cos(C)*cos(B - D));
END IF;

SET v_dist = v_dist * 1.609;

/* return distance in km. */
RETURN v_dist;

END;
//
delimiter ;

> SELECT fn_distance (@lat1, @long1, @lat2 , @long2) AS dist_km;
+-------------------+
| dist_km |
+-------------------+
| 1939.5457763671875 |
+-------------------+

Another query test shows

> SELECT ( GLength( LineString(( PointFromWKB( POINT( @lat1, @long1 ))), ( PointFromWKB( POINT( @lat2, @long2 ) ))))) * 100 AS distance;
+--------------------+
| distance |
+--------------------+
| 1748.1478770401545 |
+--------------------+

Yet another found here http://www.posteet.com/view/1555 :

set log_bin_trust_function_creators=TRUE;
DELIMITER |
CREATE FUNCTION GeoDistKM( lat1 FLOAT, lon1 FLOAT, lat2 FLOAT, lon2 FLOAT ) RETURNS float
BEGIN
DECLARE pi, q1, q2, q3 FLOAT;
DECLARE rads FLOAT DEFAULT 0;
SET pi = PI();
SET lat1 = lat1 * pi / 180;
SET lon1 = lon1 * pi / 180;
SET lat2 = lat2 * pi / 180;
SET lon2 = lon2 * pi / 180;
SET q1 = COS(lon1-lon2);
SET q2 = COS(lat1-lat2);
SET q3 = COS(lat1+lat2);
SET rads = ACOS( 0.5*((1.0+q1)*q2 - (1.0-q1)*q3) );
RETURN 6378.388 * rads;
END;
|
DELIMITER ;
select geodistkm(
@lat1, @long1, @lat2 , @long2) as distance;
+----------------------------------------+
| distance |
+----------------------------------------+
| 1942.0909423828125 |
+----------------------------------------+


Yet this is still wrong.  Because the distance from Beijing to Hong King is 1224.9 miles with his 1963km, according to this website: http://www.timeanddate.com/worldclock/distances.html?n=102 .  So the first and last results are very close.

So after wasting a lot of time, yes I do admit that, I still wanted a result using mathematical formulas that I could easily compare against.

The great circle distance d between two points with coordinates {lat1,lon1} and {lat2,lon2} is given by:
d=acos(sin(lat1)*sin(lat2)+cos(lat1)*cos(lat2)*cos(lon1-lon2))
> SELECT ACOS(
-> SIN(@lat1) * SIN(@lat2) + COS(@lat1) * COS(@lat2) * COS( @long1 - @long2 )
-> ) *1000 as Aviation_forumula_DISTANCE;
+----------------------------+
| Aviation_forumula_DISTANCE |
+----------------------------+
| 1923.0473470093848 |
+----------------------------+


Another formula is the Haversine Formula
> SELECT 3956* 2 * ASIN ( SQRT (POWER(SIN((@lat1 - @lat2)*pi()/180 / 2),2) + COS(@lat1 * pi()/180) * COS(@lat2 *pi()/180) * POWER(SIN((@long1 - @long2) *pi()/180 / 2), 2) ) ) as Haversine_Formula_distance;
+----------------------------+
| Haversine_Formula_distance |
+----------------------------+
| 1204.5222518763514 |
+----------------------------+

Once again you see very different results. I expected better results with the Haversine formula. 
So after  playing with these formulas I went with the GeoDistKM function because it appears to me to be the closest to the  Haversine formula which I believe to be the correct formula to use when implemented correctly. This is followed by a close second to the Aviation formula I wrote based on Williams information. 

While 1942 is not the 1963 result I gathered via the online search, who is to say that they calculated it correctly as well. The curvature of the earth and the lat,lon coming closer together at the poles will allow for some errors in different formulas. So I will stick with this currently:

select geodistkm(@lat1, @long1, @lat2 , @long2) as distance;
+----------------------------------------+
| distance |
+----------------------------------------+
| 1942.0909423828125 |
+----------------------------------------+




As an additional note:
Look into MariaDB if you are curious about more options with GIS  https://kb.askmonty.org/en/gis-functionality/