Friday, December 26, 2014

Find/parse a string from within a string

So I noticed a few different questions and posts about parsing a string out of another string recently. While some solutions included creating new functions and etc it can also be done within a single query in some cases.

For example, let us say that we are looking to pull out the domain from a URL. I will try to go into detail as to why and how this works.
We have the following table.

CREATE TABLE `parse_example` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `urldemo` varchar(150) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
+----+----------------------------+
| id | urldemo                    |
+----+----------------------------+
|  1 | http://www.mysql.com/      |
|  2 | http://www.percona.com/    |
|  3 | https://tools.percona.com/ |
|  4 | https://mariadb.com/       |
|  5 | http://planet.mysql.com/   |
|  6 | http://dev.mysql.com/doc/  |
+----+----------------------------+


The goal for this example is to disregard the http:// or https:// and anything after the .com. So we use LOCATE to find the locations.

The .com reference is easy since that is constant so we can start with that.

SELECT LOCATE('.com', urldemo), urldemo FROM parse_example;
+-------------------------+----------------------------+
| LOCATE('.com', urldemo) | urldemo                    |
+-------------------------+----------------------------+
|                      17 | http://www.mysql.com/      |
|                      19 | http://www.percona.com/    |
|                      22 | https://tools.percona.com/ |
|                      16 | https://mariadb.com/       |
|                      20 | http://planet.mysql.com/   |
|                      17 | http://dev.mysql.com/doc/  |
+-------------------------+----------------------------+


OK so we want to remove the / , what location is that?

SELECT LOCATE('.com', urldemo) as start, LOCATE('.com', urldemo) +4 as end, SUBSTRING(urldemo FROM LOCATE('.com', urldemo) + 4 )  AS resulting , urldemo FROM parse_example;
+-------+-----+-----------+----------------------------+
| start | end | resulting | urldemo                    |
+-------+-----+-----------+----------------------------+
|    17 |  21 | /         | http://www.mysql.com/      |
|    19 |  23 | /         | http://www.percona.com/    |
|    22 |  26 | /         | https://tools.percona.com/ |
|    16 |  20 | /         | https://mariadb.com/       |
|    20 |  24 | /         | http://planet.mysql.com/   |
|    17 |  21 | /doc/     | http://dev.mysql.com/doc/  |
+-------+-----+-----------+----------------------------+

This gives us our end position, I only put the field aliases in to make the results easier to follow.

Now sorting out after http and https is actually very easy as well as they both have :// after them, so we just need the location of the second / in the string.


SELECT LOCATE('/', urldemo) as first, LOCATE('/', urldemo) +1 as second, urldemo
FROM parse_example;
+-------+--------+----------------------------+
| first | second | urldemo                    |
+-------+--------+----------------------------+
|     6 |      7 | http://www.mysql.com/      |
|     6 |      7 | http://www.percona.com/    |
|     7 |      8 | https://tools.percona.com/ |
|     7 |      8 | https://mariadb.com/       |
|     6 |      7 | http://planet.mysql.com/   |
|     6 |      7 | http://dev.mysql.com/doc/  |
+-------+--------+----------------------------+


These queries are just showing what the different aspects of the final query will be doing. So let us put it all together.


SELECT
TRIM(TRAILING SUBSTRING(urldemo FROM LOCATE('.com', urldemo) + 4 )
FROM SUBSTRING(urldemo FROM LOCATE('/', urldemo) + 2 ) )  AS parsed_domain ,
urldemo as original_url
FROM parse_example;
+-------------------+----------------------------+
| parsed_domain     | original_url               |
+-------------------+----------------------------+
| www.mysql.com     | http://www.mysql.com/      |
| www.percona.com   | http://www.percona.com/    |
| tools.percona.com | https://tools.percona.com/ |
| mariadb.com       | https://mariadb.com/       |
| planet.mysql.com  | http://planet.mysql.com/   |
| dev.mysql.com     | http://dev.mysql.com/doc/  |
+-------------------+----------------------------+


Now hopefully that helps you be able to parse out whatever you need. This example is limited to a url. But since some examples of functions already here is my example of a function that you can use to parse whatever you need.



CREATE FUNCTION PARSE_STRING(delimiterA VARCHAR(50), delimiterB VARCHAR(50),  passed_string VARCHAR(255) )
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
TRIM(TRAILING SUBSTRING(passed_string FROM LOCATE(delimiterB, passed_string)  )
FROM SUBSTRING(passed_string FROM LOCATE(delimiterA, passed_string) + CHAR_LENGTH(delimiterA) ) ) ;

SELECT PARSE_STRING('//','.com', urldemo) FROM parse_example;
+------------------------------------+
| PARSE_STRING('//','.com', urldemo) |
+------------------------------------+
| www.mysql                          |
| www.percona                        |
| tools.percona                      |
| mariadb                            |
| planet.mysql                       |
| dev.mysql                          |
+------------------------------------+


Pull a last name from a full name field:

SELECT PARSE_STRING('John ','', 'John Smith') ;
+----------------------------------------+
| PARSE_STRING('John ','', 'John Smith') |
+----------------------------------------+
| Smith                                  |
+----------------------------------------+


Pull the first name

SELECT PARSE_STRING('',' Smith', 'John Smith') ;
+-----------------------------------------+
| PARSE_STRING('',' Smith', 'John Smith') |
+-----------------------------------------+
| John                                    |
+-----------------------------------------+


Granted with the name examples you would need to know the delimiter values. But this is just an example you can build on.

Thursday, December 18, 2014

A MySQL PARTITION and SUBPARTITION Example

So this is just a simple example of how to set up a PARTITION and a SUBPARTITION in MySQL. The concept here is that you have data in a table with numerous values in a datetime field. You might have data that is spread across numerous years  (most likely you do). So one way to partition this data is to sort it by year but then also sort it by month within that yearly partition.

Below is an example that you can use for consideration.

Consider the test table. Your table with have many more fields of course.

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `date_time` datetime DEFAULT NOW(),
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;


First I will populate the test table with random values for the  date_time field.

delimiter //
CREATE PROCEDURE populate_t1( IN rowsofdata INT )
BEGIN

SET @A = 1;
SET @B = 25 - @A;

  WHILE rowsofdata > 0 DO
    SELECT  FLOOR( @A + (RAND() * @B )) INTO @randvalue;
    INSERT INTO t1
    SELECT NULL, NOW() - INTERVAL @randvalue MONTH;
    SET rowsofdata = rowsofdata - 1;
  END WHILE;
 END//
delimiter ;
call populate_t1(1000);


Check to see what kind of values I ended up with:

> SELECT COUNT(*) FROM t1 WHERE date_time BETWEEN '2012-01-01 00:00:00' AND '2013-01-01 00:00:00'\G
*************************** 1. row ***************************
COUNT(*): 43
1 row in set (0.00 sec)

> SELECT COUNT(*) FROM t1 WHERE date_time BETWEEN '2013-01-01 00:00:00' AND '2014-01-01 00:00:00'\G
*************************** 1. row ***************************
COUNT(*): 529
1 row in set (0.00 sec)

> SELECT COUNT(*) FROM t1 WHERE date_time BETWEEN '2014-01-01 00:00:00' AND NOW() \G
*************************** 1. row ***************************
COUNT(*): 428
1 row in set (0.00 sec)


Now I can alter the table so I can add my partitions and then test the values counts via the partition.

ALTER TABLE t1  DROP PRIMARY KEY, ADD PRIMARY KEY (`id`,`date_time`), LOCK=SHARED;
ALTER TABLE t1
 PARTITION BY RANGE( YEAR(date_time) )
    SUBPARTITION BY HASH(MONTH(date_time) ) (

        PARTITION p2012 VALUES LESS THAN (2013) (
            SUBPARTITION dec_2012,
            SUBPARTITION jan_2012,
            SUBPARTITION feb_2012,
            SUBPARTITION mar_2012,
            SUBPARTITION apr_2012,
            SUBPARTITION may_2012,
            SUBPARTITION jun_2012,
            SUBPARTITION jul_2012,
            SUBPARTITION aug_2012,
            SUBPARTITION sep_2012,
            SUBPARTITION oct_2012,
            SUBPARTITION nov_2012
        ),

        PARTITION p2013 VALUES LESS THAN (2014) (
            SUBPARTITION dec_2013,
            SUBPARTITION jan_2013,
            SUBPARTITION feb_2013,
            SUBPARTITION mar_2013,
            SUBPARTITION apr_2013,
            SUBPARTITION may_2013,
            SUBPARTITION jun_2013,
            SUBPARTITION jul_2013,
            SUBPARTITION aug_2013,
            SUBPARTITION sep_2013,
            SUBPARTITION oct_2013,
            SUBPARTITION nov_2013

        ),
        PARTITION p2014 VALUES LESS THAN (2015) (
            SUBPARTITION dec_2014,
            SUBPARTITION jan_2014,
            SUBPARTITION feb_2014,
            SUBPARTITION mar_2014,
            SUBPARTITION apr_2014,
            SUBPARTITION may_2014,
            SUBPARTITION jun_2014,
            SUBPARTITION jul_2014,
            SUBPARTITION aug_2014,
            SUBPARTITION sep_2014,
            SUBPARTITION oct_2014,
            SUBPARTITION nov_2014
        ),

        PARTITION pmax VALUES LESS THAN MAXVALUE (
            SUBPARTITION dec_max,
            SUBPARTITION jan_max,
            SUBPARTITION feb_max,
            SUBPARTITION mar_max,
            SUBPARTITION apr_max,
            SUBPARTITION may_max,
            SUBPARTITION jun_max,
            SUBPARTITION jul_max,
            SUBPARTITION aug_max,
            SUBPARTITION sep_max,
            SUBPARTITION oct_max,
            SUBPARTITION nov_max
        )
 );


My Show create table is very different now.

> show create table t1;
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `date_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`,`date_time`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE ( YEAR(date_time))
SUBPARTITION BY HASH (MONTH(date_time))
(PARTITION p2012 VALUES LESS THAN (2013)
 (SUBPARTITION dec_2012 ENGINE = InnoDB,
  SUBPARTITION jan_2012 ENGINE = InnoDB,
  SUBPARTITION feb_2012 ENGINE = InnoDB,
  SUBPARTITION mar_2012 ENGINE = InnoDB,
  SUBPARTITION apr_2012 ENGINE = InnoDB,
  SUBPARTITION may_2012 ENGINE = InnoDB,
  SUBPARTITION jun_2012 ENGINE = InnoDB,
  SUBPARTITION jul_2012 ENGINE = InnoDB,
  SUBPARTITION aug_2012 ENGINE = InnoDB,
  SUBPARTITION sep_2012 ENGINE = InnoDB,
  SUBPARTITION oct_2012 ENGINE = InnoDB,
  SUBPARTITION nov_2012 ENGINE = InnoDB),
 PARTITION p2013 VALUES LESS THAN (2014)
 (SUBPARTITION dec_2013 ENGINE = InnoDB,
  SUBPARTITION jan_2013 ENGINE = InnoDB,
  SUBPARTITION feb_2013 ENGINE = InnoDB,
  SUBPARTITION mar_2013 ENGINE = InnoDB,
  SUBPARTITION apr_2013 ENGINE = InnoDB,
  SUBPARTITION may_2013 ENGINE = InnoDB,
  SUBPARTITION jun_2013 ENGINE = InnoDB,
  SUBPARTITION jul_2013 ENGINE = InnoDB,
  SUBPARTITION aug_2013 ENGINE = InnoDB,
  SUBPARTITION sep_2013 ENGINE = InnoDB,
  SUBPARTITION oct_2013 ENGINE = InnoDB,
  SUBPARTITION nov_2013 ENGINE = InnoDB),
 PARTITION p2014 VALUES LESS THAN (2015)
 (SUBPARTITION dec_2014 ENGINE = InnoDB,
  SUBPARTITION jan_2014 ENGINE = InnoDB,
  SUBPARTITION feb_2014 ENGINE = InnoDB,
  SUBPARTITION mar_2014 ENGINE = InnoDB,
  SUBPARTITION apr_2014 ENGINE = InnoDB,
  SUBPARTITION may_2014 ENGINE = InnoDB,
  SUBPARTITION jun_2014 ENGINE = InnoDB,
  SUBPARTITION jul_2014 ENGINE = InnoDB,
  SUBPARTITION aug_2014 ENGINE = InnoDB,
  SUBPARTITION sep_2014 ENGINE = InnoDB,
  SUBPARTITION oct_2014 ENGINE = InnoDB,
  SUBPARTITION nov_2014 ENGINE = InnoDB),
 PARTITION pmax VALUES LESS THAN MAXVALUE
 (SUBPARTITION dec_max ENGINE = InnoDB,
  SUBPARTITION jan_max ENGINE = InnoDB,
  SUBPARTITION feb_max ENGINE = InnoDB,
  SUBPARTITION mar_max ENGINE = InnoDB,
  SUBPARTITION apr_max ENGINE = InnoDB,
  SUBPARTITION may_max ENGINE = InnoDB,
  SUBPARTITION jun_max ENGINE = InnoDB,
  SUBPARTITION jul_max ENGINE = InnoDB,
  SUBPARTITION aug_max ENGINE = InnoDB,
  SUBPARTITION sep_max ENGINE = InnoDB,
  SUBPARTITION oct_max ENGINE = InnoDB,
  SUBPARTITION nov_max ENGINE = InnoDB))


So can we still count our value as expected?

> SELECT count(*) FROM t1 PARTITION (p2012) \G
*************************** 1. row ***************************
count(*): 43
> SELECT count(*) FROM t1 PARTITION (p2013) \G
*************************** 1. row ***************************
count(*): 529
> SELECT count(*) FROM t1 PARTITION (p2014) \G
*************************** 1. row ***************************
count(*): 428


So far so good, all the values matched up to the count we had before. So we can also count or select per the subpartition.


> SELECT * FROM t1 PARTITION (dec_2012) limit 5;
+-----+---------------------+
| id  | date_time           |
+-----+---------------------+
|  59 | 2012-12-19 00:59:57 |
|  68 | 2012-12-19 00:59:58 |
|  93 | 2012-12-19 00:59:59 |
| 105 | 2012-12-19 00:59:59 |
| 111 | 2012-12-19 00:59:59 |
+-----+---------------------+

> SELECT * FROM t1 PARTITION (jan_2013) limit 5;
+-----+---------------------+
| id  | date_time           |
+-----+---------------------+
|   6 | 2013-01-19 00:59:55 |
|  29 | 2013-01-19 00:59:56 |
|  55 | 2013-01-19 00:59:57 |
|  79 | 2013-01-19 00:59:58 |
| 100 | 2013-01-19 00:59:59 |
+-----+---------------------+

> SELECT * FROM t1 PARTITION (jan_2014) limit 5;
+-----+---------------------+
| id  | date_time           |
+-----+---------------------+
|  16 | 2014-01-19 00:59:55 |
| 190 | 2014-01-19 01:00:04 |
| 191 | 2014-01-19 01:00:04 |
| 229 | 2014-01-19 01:00:05 |
| 234 | 2014-01-19 01:00:06 |
+-----+---------------------+

> SELECT * FROM t1 PARTITION (jun_2014) limit 5;
+-----+---------------------+
| id  | date_time           |
+-----+---------------------+
|  13 | 2014-06-19 00:59:55 |
| 189 | 2014-06-19 01:00:04 |
| 221 | 2014-06-19 01:00:05 |
| 222 | 2014-06-19 01:00:05 |
| 238 | 2014-06-19 01:00:06 |
+-----+---------------------+

> SELECT * FROM t1 PARTITION (dec_2013) limit 5;
+-----+---------------------+
| id  | date_time           |
+-----+---------------------+
|  50 | 2013-12-19 00:59:57 |
|  74 | 2013-12-19 00:59:58 |
|  98 | 2013-12-19 00:59:59 |
| 107 | 2013-12-19 00:59:59 |
| 167 | 2013-12-19 01:00:02 |
+-----+---------------------+


This is great and handy but what happens when the year 2015 or  2016 comes around? All of that data would be in the pmax partition. So how do we add a new partition in between p2014 and pmax?

If you had no data in pmax you could drop it and add a new partition onto the end. But it is just as easy to reorganize the partition. This will take the pmax partition and alter it into our new partitions.


ALTER TABLE t1 REORGANIZE PARTITION pmax INTO (
     PARTITION p2015 VALUES LESS THAN (2016) (
            SUBPARTITION dec_2015,
            SUBPARTITION jan_2015,
            SUBPARTITION feb_2015,
            SUBPARTITION mar_2015,
            SUBPARTITION apr_2015,
            SUBPARTITION may_2015,
            SUBPARTITION jun_2015,
            SUBPARTITION jul_2015,
            SUBPARTITION aug_2015,
            SUBPARTITION sep_2015,
            SUBPARTITION oct_2015,
            SUBPARTITION nov_2015
        ),
       PARTITION pmax VALUES LESS THAN MAXVALUE (
            SUBPARTITION dec_max,
            SUBPARTITION jan_max,
            SUBPARTITION feb_max,
            SUBPARTITION mar_max,
            SUBPARTITION apr_max,
            SUBPARTITION may_max,
            SUBPARTITION jun_max,
            SUBPARTITION jul_max,
            SUBPARTITION aug_max,
            SUBPARTITION sep_max,
            SUBPARTITION oct_max,
            SUBPARTITION nov_max
        )
);


Hope this helps, best of luck.