Finding duplicate MySQL indexes using Percona Toolkit

Why are duplicate indexes bad?

  • They slow down insert, update, and delete queries (DML queries).
  • The storage engine needs to maintain more index statistics.
  • More disk space is needed to store the redundant indexes.
  • The optimizer phase gets slower because more query plans are examined.
  • The buffer pool for InnoDB might be used for useless index data.
  • Duplicate indexes are deprecated (as of MySQL 5.6) and will be removed in a future version of MySQL. Version 5.6 issues a note if an index is created that duplicates an existing index, version 5.7 issues a warning.

Percona Toolkit to the rescue

The Percona Toolkit provides a tool called pt-duplicate-key-checker. This tool examines MySQL tables for duplicate or redundant indexes and foreign keys.

Note: The MySQL utilities provided by Oracle include a tool called mysqlindexcheck to identify potentially redundant table indexes. It is similar to Percona’s tool, but it does not detect all cases.

Installing Percona Toolkit

Installing the Percona Toolkit on a Mac using Homebrew is a simple task, but you might need to install a Perl module from the CPAN repository first:

sudo cpan -i DBD::mysql
brew install percona-toolkit

Using Percona Toolkit

Imagine a database called “cms”. To run the duplicate key checker on your localhost, try the following command. (If you do not use a password for your local database, you can omit the --ask-pass parameter.)

pt-duplicate-key-checker --user root --ask-pass --databases cms

Running the command, you will get something like this:

# ########################################################################
# cms.user_organization                                 
# ########################################################################

# user_id is a left-prefix of PRIMARY
# Key definitions:
#   KEY `user_id` (`user_id`),
#   PRIMARY KEY (`user_id`,`organization_id`),
# Column types:
#	  `user_id` int(10) unsigned not null
#	  `organization_id` int(10) unsigned not null
# To remove this duplicate index, execute:
ALTER TABLE `cms`.`user_organization` DROP INDEX `user_id`;

# ########################################################################
# Summary of indexes                                                      
# ########################################################################

# Size Duplicate Indexes   59044
# Total Duplicate Indexes  1
# Total Indexes            543

If a table has a multiple-column index, any leftmost prefix of the index can be used by the MySQL optimizer to find rows. Because the column user_id is a leftmost prefix of the primary key in our example, we do not need to add a separate index and we can drop the duplicate index.

Basically, if you have a three-column index on (A, B, C), you have indexed search capabilities on (A), (A, B), and (A, B, C) covered.

If you are not familiar with multiple-column indexes, you should have a look at the chapter “Multiple-Column Indexes” in the MySQL reference manual.

Do manual checks

Always do a manual check before removing indexes. For example, you might have set the index to the wrong column, but actually need the index:

# ########################################################################
# energy.performancedata                                               
# ########################################################################

# speed is a duplicate of height
# Key definitions:
#   KEY `speed` (`height`),
#   KEY `height` (`height`),
# Column types:
#	  `height` float default null
# To remove this duplicate index, execute:
ALTER TABLE `energy`.`performancedata` DROP INDEX `speed`;

In the example above, there are two indexes set to the column height, but the name of the index speed already suggests that it should be an index over the column speed. So do not blindlessly remove indexes as suggested by the toolkit, but always do a manual check first!