MySQL Query Optimization
Optimizing at the Database Level
The most important factor in making a database application fast is its basic design:
- Are the tables structured properly? For example, applications that perform frequent updates often have many tables with few columns, while applications that analyze large amounts of data often have few tables with many columns.
- Are the right indexes in place to make queries efficient?
- Are you using the appropriate storage engine for each table, and taking advantage of the strengths and features of each storage engine you use? In particular, the choice of a transactional storage engine such as
InnoDB
or a nontransactional one such asMyISAM
can be very important for performance and scalability. - Are all memory areas used for caching sized correctly? That is, large enough to hold frequently accessed data, but not so large that they overload physical memory and cause paging. The main memory areas to configure are the
InnoDB
buffer pool and theMyISAM
key cache.
Optimizing at the Hardware Level
Any database application eventually hits hardware limits as the database becomes more and more busy. A DBA must evaluate whether it is possible to tune the application or reconfigure the server to avoid these bottlenecks, or whether more hardware resources are required. System bottlenecks typically arise from these sources:
- Disk seeks.
- Disk reading and writing.
- CPU cycles.
- Memory bandwidth.
8.2.1 Optimizing SELECT Statements
- To make a slow
SELECT ... WHERE
query faster, the first thing to check is whether you can add an index. Set up indexes on columns used in theWHERE
clause, to speed up evaluation, filtering, and the final retrieval of results. To avoid wasted disk space, construct a small set of indexes that speed up many related queries used in your application. - Indexes are especially important for queries that reference different tables, using features such as joins and foreign keys. You can use the
EXPLAIN
statement to determine which indexes are used for aSELECT
. - Isolate and tune any part of the query, such as a function call, that takes excessive time. Depending on how the query is structured, a function could be called once for every row in the result set, or even once for every row in the table, greatly magnifying any inefficiency.
- Minimize the number of full table scans in your queries, particularly for big tables.
- Keep table statistics up to date by using the
ANALYZE TABLE
statement periodically, so the optimizer has the information needed to construct an efficient execution plan. - You can optimize single-query transactions for
InnoDB
tables, using the technique in Section 8.5.3, “Optimizing InnoDB Read-Only Transactions”. - Avoid transforming the query in ways that make it hard to understand, especially if the optimizer does some of the same transformations automatically.
- If a performance issue is not easily solved by one of the basic guidelines, investigate the internal details of the specific query by reading the
EXPLAIN
plan and adjusting your indexes,WHERE
clauses, join clauses, and so on. (When you reach a certain level of expertise, reading theEXPLAIN
plan might be your first step for every query.)
Optimizing InnoDB Read-Only Transactions
The transaction is started with the START TRANSACTION READ ONLY
statement.
WHERE Clause Optimization
This section discusses optimizations that can be made for processing WHERE
clauses. The examples use SELECT
statements, but the same optimizations apply for WHERE
clauses in DELETE
and UPDATE
statements.
COUNT(*)
on a single table without aWHERE
is retrieved directly from the table information forMyISAM
andMEMORY
tables. This is also done for anyNOT NULL
expression when used with only one table.- Early detection of invalid constant expressions. MySQL quickly detects that some
SELECT
statements are impossible and returns no rows. HAVING
is merged withWHERE
if you do not useGROUP BY
or aggregate functions (COUNT()
,MIN()
, and so on).\- If you use the
SQL_SMALL_RESULT
modifier, MySQL uses an in-memory temporary table.
Some examples of queries that are very fast:
SELECT COUNT(*) FROM tbl_name;SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;SELECT MAX(key_part2) FROM tbl_name
WHERE key_part1=constant;SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... LIMIT 10;SELECT ... FROM tbl_name
ORDER BY key_part1 DESC, key_part2 DESC, ... LIMIT 10;
MySQL resolves the following queries using only the index tree, assuming that the indexed columns are numeric:
SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;SELECT COUNT(*) FROM tbl_name
WHERE key_part1=val1 AND key_part2=val2;SELECT MAX(key_part2) FROM tbl_name GROUP BY key_part1;
The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:
SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... ;SELECT ... FROM tbl_name
ORDER BY key_part1 DESC, key_part2 DESC, ..
Index Merge Optimization
The Index Merge access method retrieves rows with multiple range
scans and merges their results into one. This access method merges index scans from a single table only, not scans across multiple tables. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans.
Example queries for which Index Merge may be used:
SELECT * FROM tbl_name WHERE key1 = 10 OR key2 = 20;SELECT * FROM tbl_name
WHERE (key1 = 10 OR key2 = 20) AND non_key = 30;SELECT * FROM t1, t2
WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE 'value%')
AND t2.key1 = t1.some_col;SELECT * FROM t1, t2
WHERE t1.key1 = 1
AND (t2.key1 = t1.some_col OR t2.key2 = t1.some_col2);
8.2.1.5 Engine Condition Pushdown Optimization
This optimization improves the efficiency of direct comparisons between a nonindexed column and a constant. In such cases, the condition is “pushed down” to the storage engine for evaluation. This optimization can be used only by the NDB
storage engine.
Condition Filtering
In EXPLAIN
output, the rows
column indicates the row estimate for the chosen access method, and the filtered
column reflects the effect of condition filtering. filtered
values are expressed as percentages. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering.
The prefix row count (the number of rows estimated to be passed from the current table in a join to the next) is the product of the rows
and filtered
values. That is, the prefix row count is the estimated row count, reduced by the estimated filtering effect. For example, if rows
is 1000 and filtered
is 20%, condition filtering reduces the estimated row count of 1000 to a prefix row count of 1000 × 20% = 1000 × .2 = 200.
ORDER BY Optimization
SELECT * FROM t1 ORDER BY key_part1, key_part2;
avoids sorting if the WHERE
clause is selective enough to make an index range scan cheaper than a table scan:
GROUP BY Optimization
The most important preconditions for using indexes for GROUP BY
are that all GROUP BY
columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE
index, but not for a HASH
index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.
Partition Pruning
The optimization known as partition pruning is based on a relatively simple concept which can be described as “Do not scan partitions where there can be no matching values”. Suppose a partitioned table t1
is created by this statement:
CREATE TABLE t1 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATE NOT NULL
)
PARTITION BY RANGE( region_code ) (
PARTITION p0 VALUES LESS THAN (64),
PARTITION p1 VALUES LESS THAN (128),
PARTITION p2 VALUES LESS THAN (192),
PARTITION p3 VALUES LESS THAN MAXVALUE
Suppose that you wish to obtain results from a SELECT
statement such as this one:
SELECT fname, lname, region_code, dob
FROM t1
WHERE region_code > 125 AND region_code < 130;
It is easy to see that none of the rows which ought to be returned are in either of the partitions p0
or p3
; that is, we need search only in partitions p1
and p2
to find matching rows. By limiting the search, it is possible to expend much less time and effort in finding matching rows than by scanning all partitions in the table. This “cutting away” of unneeded partitions is known as pruning. When the optimizer can make use of partition pruning in performing this query, execution of the query can be an order of magnitude faster than the same query against a nonpartitioned table containing the same column definitions and data.
The optimizer can perform pruning whenever a WHERE
condition can be reduced to either one of the following two cases:
partition_column = constant
partition_column IN (constant1, constant2, ..., constantN)
Pruning can also be applied to short ranges, which the optimizer can convert into equivalent lists of values.f
Queries
EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c2)\G