Goal was to create a empty table and poplulate it with optimizer table and index stats from a poplulated table to try and get the optimizer to give the same explain plan and costs on the empty table as the populated table. The purpose of this was to be able to modify the empty table by adding indexes quickly and cheaply and still be able to get correct optimizer costs. This worked in the case of table access but unfortunately didn’t work when bring in an index.
Create procedure to populate test table
DROP PROCEDURE IF EXISTS populate_data;
DELIMITER //
CREATE PROCEDURE populate_data(max_val int)
BEGIN
DECLARE int_val INT DEFAULT 0;
test_loop : LOOP
IF (int_val = max_val) THEN
LEAVE test_loop;
END IF;
INSERT INTO source(id) VALUES (FLOOR(RAND()*(1000000+1)+1));
SET int_val = int_val +1;
END LOOP;
END;//
DELIMITER ;
Create source table and populate it. Create source_clone that will be empty but we want to manually fill in optimizer statistics to mimic source table
CREATE TABLE source(id integer);
call populate_data(1000000);
CREATE TABLE source_clone (id integer) ;
comparing table and index stats on source and source_clone we can see as expected they are different:
select * from mysql.innodb_table_stats where table_name like 'sourc%';
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| kyle | source | 2020-01-22 18:49:48 | 9980 | 21 | 0 |
| kyle | source_clone | 2020-01-22 18:49:44 | 0 | 1 | 0 |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
select * from mysql.innodb_index_stats where table_name like 'sourc%';
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_diff_pfx01 | 9980 | 20 | DB_ROW_ID |
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_leaf_pages | 20 | NULL | Number of leaf pages in the index |
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | size | 21 | NULL | Number of pages in the index |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:49:44 | n_diff_pfx01 | 0 | 1 | DB_ROW_ID |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:49:44 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:49:44 | size | 1 | NULL | Number of pages in the index |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
running the explain plan we get different rows returned as expected
--
mysql> explain select id from kyle.source where id > 2056;
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | source | ALL | NULL | NULL | NULL | NULL | 9980 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.05 sec)
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | source_clone | ALL | NULL | NULL | NULL | NULL | 1 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.04 sec)
Now let’s copy the stats from the populated source table to the empty source_clone data
-- TABLE STATS
update mysql.innodb_table_stats AS `dest`,
(SELECT n_rows , clustered_index_size
FROM mysql.innodb_table_stats
WHERE
table_name = 'source'
) AS `src`
SET dest.n_rows = src.n_rows ,
dest.clustered_index_size = src.clustered_index_size
WHERE
table_name = 'source_clone';
-- INDEX STATS
UPDATE mysql.innodb_index_stats AS `dest`,
(SELECT stat_value, stat_name
FROM mysql.innodb_index_stats
WHERE
table_name = 'source'
and index_name = 'GEN_CLUST_INDEX'
) AS `src`
SET dest.stat_value = src.stat_value
WHERE dest.table_name = 'source_clone'
and dest.index_name = 'GEN_CLUST_INDEX'
and dest.stat_name = src.stat_name;
The optimizer table and index statistics are now the same
mysql> select * from mysql.innodb_table_stats where table_name like 'sourc%';
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| kyle | source | 2020-01-22 18:49:48 | 9980 | 21 | 0 |
| kyle | source_clone | 2020-01-22 18:54:31 | 9980 | 21 | 0 |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
--
mysql> select * from mysql.innodb_index_stats where table_name like 'sourc%';
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_diff_pfx01 | 9980 | 20 | DB_ROW_ID |
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_leaf_pages | 20 | NULL | Number of leaf pages in the index |
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | size | 21 | NULL | Number of pages in the index |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:54:46 | n_diff_pfx01 | 9980 | 1 | DB_ROW_ID |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:54:46 | n_leaf_pages | 20 | NULL | Number of leaf pages in the index |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:54:46 | size | 21 | NULL | Number of pages in the index |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
Running explain we are still getting different rows returned on the source clone
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | source_clone | ALL | NULL | NULL | NULL | NULL | 1 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
But by doing a flush table on the source_clone we not get the same rows returned in the explain as the source
FLUSH TABLE source_clone;
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | source_clone | ALL | NULL | NULL | NULL | NULL | 9980 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
Now let’s create an index and try to manually update the optimizer index stats
CREATE INDEX source_id ON source (id);
CREATE INDEX source_clone_id ON source_clone (id);
--
mysql> explain select id from kyle.source where id > 2056;
+----+-------------+--------+-------+---------------+-----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-----------+---------+------+------+--------------------------+
| 1 | SIMPLE | source | range | source_id | source_id | 5 | NULL | 4990 | Using where; Using index |
+----+-------------+--------+-------+---------------+-----------+---------+------+------+--------------------------+
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | source_clone | range | source_clone_id | source_clone_id | 5 | NULL | 1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
--
UPDATE mysql.innodb_index_stats AS `dest`,
(SELECT stat_value, stat_name
FROM mysql.innodb_index_stats
WHERE
table_name = 'source'
and index_name = 'source_id'
) AS `src`
SET dest.stat_value = src.stat_value
WHERE dest.table_name = 'source_clone'
and dest.index_name = 'source_clone_id'
and dest.stat_name = src.stat_name;
--
mysql> select * from mysql.innodb_index_stats where table_name like 'sourc%';
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_diff_pfx01 | 9980 | 20 | DB_ROW_ID |
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_leaf_pages | 20 | NULL | Number of leaf pages in the index |
| kyle | source | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | size | 21 | NULL | Number of pages in the index |
| kyle | source | source_id | 2020-01-22 18:56:46 | n_diff_pfx01 | 9950 | 11 | id |
| kyle | source | source_id | 2020-01-22 18:56:46 | n_diff_pfx02 | 10000 | 11 | id,DB_ROW_ID |
| kyle | source | source_id | 2020-01-22 18:56:46 | n_leaf_pages | 11 | NULL | Number of leaf pages in the index |
| kyle | source | source_id | 2020-01-22 18:56:46 | size | 12 | NULL | Number of pages in the index |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_diff_pfx01 | 9980 | 1 | DB_ROW_ID |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_leaf_pages | 20 | NULL | Number of leaf pages in the index |
| kyle | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | size | 21 | NULL | Number of pages in the index |
| kyle | source_clone | source_clone_id | 2020-01-22 18:57:38 | n_diff_pfx01 | 9950 | 1 | id |
| kyle | source_clone | source_clone_id | 2020-01-22 18:57:38 | n_diff_pfx02 | 10000 | 1 | id,DB_ROW_ID |
| kyle | source_clone | source_clone_id | 2020-01-22 18:57:38 | n_leaf_pages | 11 | NULL | Number of leaf pages in the index |
| kyle | source_clone | source_clone_id | 2020-01-22 18:57:38 | size | 12 | NULL | Number of pages in the index |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | source_clone | range | source_clone_id | source_clone_id | 5 | NULL | 1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
--
FLUSH TABLE source_clone;
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | source_clone | range | source_clone_id | source_clone_id | 5 | NULL | 1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
--
Flush tables;
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | source_clone | range | source_clone_id | source_clone_id | 5 | NULL | 1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
-- REBOOT database
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE | source_clone | range | source_clone_id | source_clone_id | 5 | NULL | 1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
The optimizer index stats are the same, we have flushed the tables, and even did a reboot of the database, but the explain plan still shows different results on the source_clone. The optimizer must be basing costs in this case on things other than mysql.innodb_index_stats and mysql.innodb_table_stats
Comments