Thursday, July 4, 2013

UPDATE OPTIONS with LIKE REGEXP SUBSTRING and LOCATE

A recent forum post made me stop and think for a moment..
http://forums.mysql.com/read.php?10,589573,589573#msg-589573

The problem was the user wanted to update just the word audi and not the word auditor.
It was solved by taking advantage of the period easily once I stopped trying to use SUBSTRING and LOCATE.  They wanted a quick and easy fix after all.


root@localhost [test]> CREATE TABLE `forumpost` (
    ->   `name` varchar(255) DEFAULT NULL
    -> ) ENGINE=InnoDB;

root@localhost [test]> insert into forumpost value ('An auditor drives an audi.'),('An auditor drives a volvo.');

root@localhost [test]> select * from forumpost;
+----------------------------+
| name                       |
+----------------------------+
| An auditor drives an audi. |
| An auditor drives a volvo. |
+----------------------------+


So now let us update it the quick and easy way by taking advantage of the period

root@localhost [test]>UPDATE forumpost SET name = REPLACE(name, 'audi.', 'toyota.') WHERE name LIKE '%audi.';
Query OK, 1 row affected (0.20 sec)
Rows matched: 1  Changed: 1  Warnings: 0

root@localhost [test]> select * from forumpost;
+------------------------------+
| name                         |
+------------------------------+
| An auditor drives an toyota. |
| An auditor drives a volvo.   |
+------------------------------+


But... what about the valid options of SUBSTRING and LOCATE.....


root@localhost [test]> insert into forumpost value ('An auditor drives an audi.');
root@localhost [test]> insert into forumpost value ('An auditor drives an audi car');
root@localhost [test]> select * from forumpost;
+-------------------------------+
| name                          |
+-------------------------------+
| An auditor drives an toyota.  |
| An auditor drives a volvo.    |
| An auditor drives an audi.    |
| An auditor drives an audi car |
+-------------------------------+


First test your options so you make sure you can find what you are after..


root@localhost [test]> SELECT * FROM forumpost WHERE name REGEXP 'audi car$';
+-------------------------------+
| name                          |
+-------------------------------+
| An auditor drives an audi car |
+-------------------------------+

root@localhost [test]> SELECT * FROM forumpost WHERE name LIKE '%audi car%';
+-------------------------------+
| name                          |
+-------------------------------+
| An auditor drives an audi car |
+-------------------------------+


That really did not do too much since we just changed the period for the word car.  So keep going....

We need to pull just the word audi from the line with audi car.

root@localhost [test]> SELECT SUBSTRING(name,-8,4), name FROM forumpost WHERE SUBSTRING(name,-8,4) = 'audi';
+----------------------+-------------------------------+
| SUBSTRING(name,-8,4) | name                          |
+----------------------+-------------------------------+
| audi                 | An auditor drives an audi car |
+----------------------+-------------------------------+


The SUBSTRING allowed me to pull the first 4 characters after I counted back 8 characters from the end.

So what if you do not know the location of the characters?
To start with you should review you data to make sure you know what you are after. But the characters might move around your string so lets work with LOCATE.

I will add another row just for tests.

root@localhost [test]>  insert into forumpost value ('An auditor drives an red audi car');
Query OK, 1 row affected (0.04 sec)

root@localhost [test]> select * from forumpost;
+------------------------------------+
| name                               |
+------------------------------------+
| An auditor drives an toyota.       |
| An auditor drives a volvo.         |
| An auditor drives an audi.         |
| An auditor drives an audi car      |
| An auditor drives an audi blue car |
| An auditor drives an red audi car  |
+------------------------------------+


So regardless of the ending we can see that audi always after auditor so we just need to skip over that word. The word auditor is in the first 8 character so skip those.

root@localhost [test]> SELECT LOCATE('audi', name,8), name FROM forumpost WHERE LOCATE('audi', name,8) > 0 ;
+------------------------+------------------------------------+
| LOCATE('audi', name,8) | name                               |
+------------------------+------------------------------------+
|                     22 | An auditor drives an audi.         |
|                     22 | An auditor drives an audi car      |
|                     22 | An auditor drives an audi blue car |
|                     26 | An auditor drives an red audi car  |
+------------------------+------------------------------------+


OK so we found the ones we are after. Now we need to write the update statement.

We cannot use the replace this time.

 UPDATE forumpost SET name = REPLACE(name, LOCATE('audi', name,8), 'mercedes') WHERE LOCATE('audi', name,8) > 0 ;
Query OK, 0 rows affected (0.02 sec)
Rows matched: 4  Changed: 0  Warnings: 0

Notice it found the rows but did not change anything.

So try again and I do not want to assume the location 8. I want the 2nd value of audi.
So  a test shows that with SUBSTRING_INDEX I can skip the 1st one and USE CONCAT

SELECT name , CONCAT ( SUBSTRING_INDEX(name, 'audi', 2) , ' mercedes ' , SUBSTRING_INDEX(name, 'audi', -1) )  as newvalue
FROM forumpost
WHERE LOCATE('audi', name,10) > 0 ;
+-----------------------------------+-----------------------------------------+
| name                              | newvalue                                |
+-----------------------------------+-----------------------------------------+
| An auditor drives an audi.        | An auditor drives an  mercedes .        |
| An auditor drives an audi.        | An auditor drives an  mercedes .        |
| An auditor drives an audi car     | An auditor drives an  mercedes  car     |
| An auditor drives an red audi car | An auditor drives an red  mercedes  car |
+-----------------------------------+-----------------------------------------+

root@localhost [test]> UPDATE forumpost SET name = CONCAT(SUBSTRING_INDEX(name, 'audi', 2) , ' mercedes ' , SUBSTRING_INDEX(name, 'audi', -1) )
WHERE LOCATE('audi', name,10) > 0 ;
Query OK, 4 rows affected (0.03 sec)
Rows matched: 4  Changed: 4  Warnings: 0

root@localhost [test]> select * from forumpost;
+-----------------------------------------+
| name                                    |
+-----------------------------------------+
| An auditor drives an  mercedes .        |
| An auditor drives a volvo.              |
| An auditor drives an  mercedes .        |
| An auditor drives an  mercedes  car     |
| An auditor drives an red  mercedes  car |
+-----------------------------------------+
5 rows in set (0.00 sec)


Now, granted the grammer with the use of "an" is invalid but that is another story.

More information on these can be found here: