Monday, January 27, 2014

Use your Index even with a varchar || char

I recently noticed a post on the forums.mysql.com site : How to fast search in 3 millions record?
The example given used a LIKE '%eed'

That will not taken advantage of an index and will do an full table scan.
Below is an example using the world database, so not 3 million records but just trying to show how it works.

> explain select * from City where Name LIKE '%dam' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4188
        Extra: Using where
1 row in set (0.01 sec)

[world]> select count(*) FROM City;
+----------+
| count(*) |
+----------+
|     4079 |
+----------+

> select * from City where Name LIKE '%dam';
+------+------------------------+-------------+----------------+------------+
| ID   | Name                   | CountryCode | District       | Population
+------+------------------------+-------------+----------------+------------+
|    5 | Amsterdam              | NLD         | Noord-Holland  |     731200 |
|    6 | Rotterdam              | NLD         | Zuid-Holland   |     593321 |
| 1146 | Ramagundam             | IND         | Andhra Pradesh |     214384 |
| 1318 | Haldwani-cum-Kathgodam | IND         | Uttaranchal    |     104195 |
| 2867 | Tando Adam             | PAK         | Sind           |     103400 |
| 3122 | Potsdam                | DEU         | Brandenburg    |     128983 |
+------+------------------------+-------------+----------------+------------+
To show the point further

> explain select * from City where Name LIKE '%dam%' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4188
        Extra: Using where
1 row in set (0.00 sec)

> select * from City where Name LIKE '%dam%';
+------+------------------------+-------------+----------------+------------+
| ID   | Name                   | CountryCode | District       | Population |
+------+------------------------+-------------+----------------+------------+<
|    5 | Amsterdam              | NLD         | Noord-Holland  |     731200 |
|    6 | Rotterdam              | NLD         | Zuid-Holland   |     593321 |
|  380 | Pindamonhangaba        | BRA         | São Paulo      |     121904 |<
|  625 | Damanhur               | EGY         | al-Buhayra     |     212203 |
| 1146 | Ramagundam             | IND         | Andhra Pradesh |     214384 |
| 1318 | Haldwani-cum-Kathgodam | IND         | Uttaranchal    |     104195 |
| 1347 | Damoh                  | IND         | Madhya Pradesh |      95661 |
| 2867 | Tando Adam             | PAK         | Sind           |     103400 |
| 2912 | Adamstown              | PCN         | –              |         42 |
| 3122 | Potsdam                | DEU         | Brandenburg    |     128983 |
| 3177 | al-Dammam              | SAU         | al-Sharqiya    |     482300 |
| 3250 | Damascus               | SYR         | Damascus       |    1347000 |
+------+------------------------+-------------+----------------+------------+<
12 rows in set (0.00 sec)

The explain output above shows that no indexes are being used.

CREATE TABLE `City` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB
<

So for grins let us put a key on the varchar field. Notice I do not put a key on the entire range just the first few characters. This is of course dependent on your data.

> ALTER TABLE City ADD KEY name_key(`Name`(5));
Query OK, 0 rows affected (0.54 sec)

CREATE TABLE `City` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  KEY `name_key` (`Name`(5)),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB 

SO will this even matter?

> explain select * from City where Name LIKE '%dam' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4188
        Extra: Using where
1 row in set (0.00 sec)

No it will not matter, because of the LIKE '%dam' will force a full scan regardless.

> EXPLAIN select * from City where Name LIKE '%dam' AND CountryCode = 'IND' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: ref
possible_keys: CountryCode
          key: CountryCode
      key_len: 3
          ref: const
         rows: 341
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

Notice the difference in the explain output above. This query is using an index. It is not using the name as the index but it is using an index. So how can you take advantage of the varchar index?

> EXPLAIN select * from City where Name LIKE 'Ra%'  \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: range
possible_keys: name_key
          key: name_key
      key_len: 5
          ref: NULL
         rows: 35
        Extra: Using where
1 row in set (0.00 sec)

 The above query will use the name_key index.

The point being is that  you have to be careful how you write your SQL query and ensure that you run explains to find the best index of choice for your query. 

Sunday, January 19, 2014

Can MySQL Replication catch up

So replication was recently improved in MySQL 5.6. However, people are still using 5.1 and 5.5 so some of those improvements will have to wait to hit the real world.

I recently helped move in this direction with a geo-located replication solution. One part of the country had a MySQL 5.1 server and the other part of the country had a new MySQL 5.6 server installed.

After dealing with the issues of getting the initial data backup from the primary to the secondary server (took several hours to say the least), I had to decide could replication catch up and keep up. The primary server had some big queries and optimization is always a good place to start. I had to get the secondary server pulling and applying as fast as I could first though.

So here are a few things to check and keep in mind when it comes to replication. I have added some links below that help support my thoughts as I worked on this.

Replication can be very I/O heavy. Depending on your application. A blog site does not have that many writes so the replication I/O is light, but a heavily written and updated primary server is going to lead to a replication server writing a lot of relay_logs and binary_logs if they are enabled. Binary logs can be enabled on the secondary to allow you to run backups or you might want this server to be a primary to others.

I split the logs onto a different data partition from the data directory.
This is set in the my.cnf file - relay-log

The InnoDB buffer pool was already set to a value over 10GB. This was plenty for this server.
The server was over 90,000 seconds behind still.

So I started to make some adjustments to the server and ended up with these settings in the end. Granted every server is different.

mysql> select @@sync_relay_log_info\G
*************************** 1. row ***************************
@@sync_relay_log_info: 0
1 row in set (0.08 sec)

mysql>  select @@innodb_flush_log_at_trx_commit\G
*************************** 1. row ***************************
@@innodb_flush_log_at_trx_commit: 2
1 row in set (0.00 sec)

mysql> select @@log_slave_updates\G
*************************** 1. row ***************************
@@log_slave_updates: 0

mysql> select @@sync_binlog\G
*************************** 1. row ***************************
@@sync_binlog: 0
1 row in set (0.00 sec)

mysql> select @@max_relay_log_size\G
*************************** 1. row ***************************
@@max_relay_log_size: 268435456

 I turned binary logging off as I monitored different settings and options to help replication catch up. It took a while. Some of the settings you see above may or may not have been applied as I worked within this time frame. Yet it did catch up to 0 seconds back. Now you might notice that a lot of these settings above relate in and around the binary logging.  So I ran a little test. So, I restarted and enabled the bin logs. I checked in on the server later and found it 10,000+ seconds behind.  So I once again restarted and disabled the bin logs.  It caught up (0 seconds behind) with the primary server in under 15 minutes. I used Aurimas' tool as I watched it catch up as well. If you have not use it before, it is a very nice and handy tool.

What this all means is the primary server must be ACID compliant. With this set up you are also depending on the OS for cache and clean up. This is server will be used as a primarily read server to feed information to others. It also means that yes, geo-located replication can stay up to date with a primary server.

What if you need to stop the slave, will it still catch up quickly?

How and why do you stop the slave is my first response.  You should get into the habit of using  STOP SLAVE SQL_THREAD; instead of STOP SLAVE;  This allows the relay logs to continue to gather data and just not apply it to your primary server.  So if you can take advantage of that it will help reduce the time it takes for you to populate the relay logs later. 

Some additional reading for you:

Friday, January 3, 2014

Hard work that goes unnoticed ....

I took a moment today and updated one of my Linux distributions.  In this distribution I happen to have Percona 5.6 installed as the MySQL Database. I have mentioned before how you can set up your choice of MySQL via a Yum repository.

My point here is though, how do we ever thank these people for all the work they do?

Many of these repositories are run by companies and these people get paid for what they do. Yet, through survey's and general observation/questions of the Linux (including Debian/Ubuntu) community, a majority of people will not upgrade until it is available in their distribution. I happen to be  one who wants to stay on top of security and bug fixes, so I have the yum repository update from the source as often as possible.

My point is,  a lot of work goes into packaging of these files for distribution and for the most part it looks like a pretty thankless job. I recall the older (not old but older) days of tar and gzip, when you had to dig and find the dependencies yourself. -- ./configure..  nope need something else go download and install that then try again.....

I just upgraded 25 different packages in a few moments, which would have taken some time before. While Yum and Apt Get are far from new, and I sound like an old timer here, I just thought it might be nice to say thanks, to all of the people who work behind the scenes to make all of our Linux experiences, let alone the related MySQL installs, easier and smoother.

I will point out that Oracle does have 5.6 packages now available.


I recall that my previous post mentioned how it had not been. 

Wednesday, January 1, 2014

A MySQL DBA looks at PostgreSQL part3 PostgreSQL To MySQL

So I recently posted: A MySQL DBA looks at PostgreSQL and part 2:  MySQL to PostgreSQL.

This post will explore the migration from PostgreSQL to MySQL. Once again, the longterm goal with these posts are to be able show how the data works within the different databases as well as how to solve similar issues in each database when examples should arise.

MySQL pushes the MySQL Workbench the tool for database migration. I have to admit that I am curious why the MySQL Utilities does not provide a command line option. The previous blog post (part 2) showed how easy the migration via command line was for MySQL to PostgreSQL.  Keep in mind though that when bringing data back to MySQL the Data Engine has to be considered.

To put it simply, if you are going to bring data back into MySQL from PostgreSQL a fast option is likely the MySQL Workbench. But that is not an overall solution since we often prefer to stay within our terminal windows.  So you will be doing the following:
  • Dump the Schema from PostgreSQL into a file
    • Review and edit the file for MySQL. 
  • Per desired Schema and tables export out as a csv file. 
  • Import back into MySQL

Anyway, First we still have the data in PostgreSQL from the World database example.

world=> \dt
              List of relations
 Schema |      Name       | Type  |  Owner
--------+-----------------+-------+----------
 public | city            | table | testuser
 public | country         | table | testuser
 public | countrylanguage | table | testuser
(3 rows)

world=> select count(ID) from City;
 count
-------
  4079
(1 row)

world=>


Dump out the schema:

$ pg_dump -s  world > world_schema.pgsql


Using pg_dump take the --data-only and --inserts to simply build standard SQL file of data.

pg_dump --data-only --inserts world > world_data.pgsql

You will see later that doing a dump per table would be better but this works.

Creating a database in MySQL to place the data back as well as test your new schema.

mysql> CREATE DATABASE world_back;
Query OK, 1 row affected (0.01 sec)


Edit your schema file: vi world_schema.pgsql
You have the new MySQL Database created so you can test them as you go.


CREATE TABLE city (
    id integer DEFAULT nextval('city_id_seq'::regclass) NOT NULL,
    name character(35) DEFAULT ''::bpchar NOT NULL,
    countrycode character(3) DEFAULT ''::bpchar NOT NULL,
    district character(20) DEFAULT ''::bpchar NOT NULL,
    population integer DEFAULT 0 NOT NULL
);

CREATE TABLE country (
    code character(3) DEFAULT ''::bpchar NOT NULL,
    name character(52) DEFAULT ''::bpchar NOT NULL,
    continent character varying DEFAULT 'Asia'::character varying NOT NULL,
    region character(26) DEFAULT ''::bpchar NOT NULL,
    surfacearea double precision DEFAULT 0::double precision NOT NULL,
    indepyear smallint,
    population integer DEFAULT 0 NOT NULL,
    lifeexpectancy double precision,
    gnp double precision,
    gnpold double precision,
    localname character(45) DEFAULT ''::bpchar NOT NULL,
    governmentform character(45) DEFAULT ''::bpchar NOT NULL,
    headofstate character(60) DEFAULT NULL::bpchar,
    capital integer,
    code2 character(2) DEFAULT ''::bpchar NOT NULL,
    CONSTRAINT country_continent_check CHECK (((continent)::text = ANY ((ARRAY['Asia'::character varying, 'Europe'::character varying, 'North America'::character varying, 'Africa'::character varying, 'Oceania'::character varying, 'Antarctica'::character varying, 'South America'::character varying])::text[])))
);
ALTER TABLE ONLY city
    ADD CONSTRAINT city_pkey PRIMARY KEY (id);

CREATE INDEX city_countrycode_idx ON city USING btree (countrycode);


You will need to review the file for all related keys so you can create valid statements.
You will need to understand MySQL so you can create  valid Create table statements.


CREATE TABLE city (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` char(35) NOT NULL DEFAULT '',
  `countrycode` char(3) NOT NULL DEFAULT '',
  `district` char(20) NOT NULL DEFAULT '',
  `population` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
);

CREATE TABLE `country` (
  `code` char(3) NOT NULL DEFAULT '',
  `name` char(52) NOT NULL DEFAULT '',
  `continent`  char(5) NOT NULL DEFAULT '',
  `region` char(26) NOT NULL DEFAULT '',
  `surfaceArea` float(10,2) NOT NULL DEFAULT '0.00',
  `indepyear` smallint(6) DEFAULT NULL,
  `population` int(11) NOT NULL DEFAULT '0',
  `lifeexpectancy` float(3,1) DEFAULT NULL,
  `gnp` float(10,2) DEFAULT NULL,
  `gnpold` float(10,2) DEFAULT NULL,
  `localname` char(45) NOT NULL DEFAULT '',
  `governmentform` char(45) NOT NULL DEFAULT '',
  `headofstate` char(60) DEFAULT NULL,
  `capital` int(11) DEFAULT NULL,
  `code2` char(2) NOT NULL DEFAULT '',
  PRIMARY KEY (`code`)
);

It is of course up to you. but once you work out the   PRIMARY KEY per table,  I would create alter statements to update the new schemas so you can ensure you catch everything.  While they can all be added directly into the first Create statement for the most part as you process the Postgresql dump file making alters can keep you in check.

Some examples of the needed Alter statements:

ALTER TABLE city ENGINE=InnoDB;
ALTER TABLE country ENGINE=InnoDB;
ALTER TABLE countrylanguage ENGINE=InnoDB;

ALTER TABLE country DROP  continent;
ALTER TABLE country ADD continent enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL DEFAULT 'Asia' AFTER name;

ALTER TABLE city ADD KEY `countrycode` (`countrycode`),
ALTER TABLE city ADD  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`countrycode`) REFERENCES `country` (`code`)


Once all your schema is updated and valid. you can put the saved data back.

vi world_data.pgsql   to remove the SET statements at the top.
--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

SET search_path = public, pg_catalog;

Copy the files out per table in this case because of the CONSTRAINTs. Edit accordingly so each file only has the data per table. I should have just dumped like that or just dump again per table. 

$ cp world_data.pgsql world_data_city.pgsql
$ cp world_data.pgsql world_data_countrylanguage.pgsql
$ cp world_data.pgsql world_data_country.pgsql

$ mysql -u root -p world_back < world_data_country.pgsql
Enter password:
$ mysql -u root -p world_back < world_data_countrylanguage.pgsql
Enter password:
$ mysql -u root -p world_back < world_data_city.pgsql


So simply put it is not as easy, I should say automated, to migrate into MySQL via the command line because of the schema changes that will require your attention but it can be done.

mysql> select count(id) from city;
+-----------+
| count(id) |
+-----------+
|      4079 |
+-----------+
1 row in set (0.14 sec)

MySQL Workbench database migration of course can do the same process and you can learn more about that tool here - http://www.mysql.com/products/workbench/migrate/