MySQL Query Optimization

Abhishek Rathore
6 min readDec 5, 2023

--

Optimizing at the Database Level

The most important factor in making a database application fast is its basic design:

  • Are the tables structured properly? For example, applications that perform frequent updates often have many tables with few columns, while applications that analyze large amounts of data often have few tables with many columns.
  • Are the right indexes in place to make queries efficient?
  • Are you using the appropriate storage engine for each table, and taking advantage of the strengths and features of each storage engine you use? In particular, the choice of a transactional storage engine such as InnoDB or a nontransactional one such as MyISAM can be very important for performance and scalability.
  • Are all memory areas used for caching sized correctly? That is, large enough to hold frequently accessed data, but not so large that they overload physical memory and cause paging. The main memory areas to configure are the InnoDB buffer pool and the MyISAM key cache.

Optimizing at the Hardware Level

Any database application eventually hits hardware limits as the database becomes more and more busy. A DBA must evaluate whether it is possible to tune the application or reconfigure the server to avoid these bottlenecks, or whether more hardware resources are required. System bottlenecks typically arise from these sources:

  • Disk seeks.
  • Disk reading and writing.
  • CPU cycles.
  • Memory bandwidth.

8.2.1 Optimizing SELECT Statements

  • To make a slow SELECT ... WHERE query faster, the first thing to check is whether you can add an index. Set up indexes on columns used in the WHERE clause, to speed up evaluation, filtering, and the final retrieval of results. To avoid wasted disk space, construct a small set of indexes that speed up many related queries used in your application.
  • Indexes are especially important for queries that reference different tables, using features such as joins and foreign keys. You can use the EXPLAIN statement to determine which indexes are used for a SELECT.
  • Isolate and tune any part of the query, such as a function call, that takes excessive time. Depending on how the query is structured, a function could be called once for every row in the result set, or even once for every row in the table, greatly magnifying any inefficiency.
  • Minimize the number of full table scans in your queries, particularly for big tables.
  • Keep table statistics up to date by using the ANALYZE TABLE statement periodically, so the optimizer has the information needed to construct an efficient execution plan.
  • You can optimize single-query transactions for InnoDB tables, using the technique in Section 8.5.3, “Optimizing InnoDB Read-Only Transactions”.
  • Avoid transforming the query in ways that make it hard to understand, especially if the optimizer does some of the same transformations automatically.
  • If a performance issue is not easily solved by one of the basic guidelines, investigate the internal details of the specific query by reading the EXPLAIN plan and adjusting your indexes, WHERE clauses, join clauses, and so on. (When you reach a certain level of expertise, reading the EXPLAIN plan might be your first step for every query.)

Optimizing InnoDB Read-Only Transactions

The transaction is started with the START TRANSACTION READ ONLY statement.

WHERE Clause Optimization

This section discusses optimizations that can be made for processing WHERE clauses. The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements.

  • COUNT(*) on a single table without a WHERE is retrieved directly from the table information for MyISAM and MEMORY tables. This is also done for any NOT NULL expression when used with only one table.
  • Early detection of invalid constant expressions. MySQL quickly detects that some SELECT statements are impossible and returns no rows.
  • HAVING is merged with WHERE if you do not use GROUP BY or aggregate functions (COUNT(), MIN(), and so on).\
  • If you use the SQL_SMALL_RESULT modifier, MySQL uses an in-memory temporary table.

Some examples of queries that are very fast:

SELECT COUNT(*) FROM tbl_name;SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;SELECT MAX(key_part2) FROM tbl_name
WHERE key_part1=constant;
SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... LIMIT 10;
SELECT ... FROM tbl_name
ORDER BY key_part1 DESC, key_part2 DESC, ... LIMIT 10;

MySQL resolves the following queries using only the index tree, assuming that the indexed columns are numeric:

SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;SELECT COUNT(*) FROM tbl_name
WHERE key_part1=val1 AND key_part2=val2;
SELECT MAX(key_part2) FROM tbl_name GROUP BY key_part1;

The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:

SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... ;
SELECT ... FROM tbl_name
ORDER BY key_part1 DESC, key_part2 DESC, ..

Index Merge Optimization

The Index Merge access method retrieves rows with multiple range scans and merges their results into one. This access method merges index scans from a single table only, not scans across multiple tables. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans.

Example queries for which Index Merge may be used:

SELECT * FROM tbl_name WHERE key1 = 10 OR key2 = 20;SELECT * FROM tbl_name
WHERE (key1 = 10 OR key2 = 20) AND non_key = 30;
SELECT * FROM t1, t2
WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE 'value%')
AND t2.key1 = t1.some_col;
SELECT * FROM t1, t2
WHERE t1.key1 = 1
AND (t2.key1 = t1.some_col OR t2.key2 = t1.some_col2);

8.2.1.5 Engine Condition Pushdown Optimization

This optimization improves the efficiency of direct comparisons between a nonindexed column and a constant. In such cases, the condition is “pushed down” to the storage engine for evaluation. This optimization can be used only by the NDB storage engine.

Condition Filtering

In EXPLAIN output, the rows column indicates the row estimate for the chosen access method, and the filtered column reflects the effect of condition filtering. filtered values are expressed as percentages. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering.

The prefix row count (the number of rows estimated to be passed from the current table in a join to the next) is the product of the rows and filtered values. That is, the prefix row count is the estimated row count, reduced by the estimated filtering effect. For example, if rows is 1000 and filtered is 20%, condition filtering reduces the estimated row count of 1000 to a prefix row count of 1000 × 20% = 1000 × .2 = 200.

ORDER BY Optimization

SELECT * FROM t1 ORDER BY key_part1, key_part2;

avoids sorting if the WHERE clause is selective enough to make an index range scan cheaper than a table scan:

GROUP BY Optimization

The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a BTREE index, but not for a HASH index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.

Partition Pruning

The optimization known as partition pruning is based on a relatively simple concept which can be described as “Do not scan partitions where there can be no matching values”. Suppose a partitioned table t1 is created by this statement:

CREATE TABLE t1 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATE NOT NULL
)
PARTITION BY RANGE( region_code ) (
PARTITION p0 VALUES LESS THAN (64),
PARTITION p1 VALUES LESS THAN (128),
PARTITION p2 VALUES LESS THAN (192),
PARTITION p3 VALUES LESS THAN MAXVALUE

Suppose that you wish to obtain results from a SELECT statement such as this one:

SELECT fname, lname, region_code, dob
FROM t1
WHERE region_code > 125 AND region_code < 130;

It is easy to see that none of the rows which ought to be returned are in either of the partitions p0 or p3; that is, we need search only in partitions p1 and p2 to find matching rows. By limiting the search, it is possible to expend much less time and effort in finding matching rows than by scanning all partitions in the table. This “cutting away” of unneeded partitions is known as pruning. When the optimizer can make use of partition pruning in performing this query, execution of the query can be an order of magnitude faster than the same query against a nonpartitioned table containing the same column definitions and data.

The optimizer can perform pruning whenever a WHERE condition can be reduced to either one of the following two cases:

  • partition_column = constant
  • partition_column IN (constant1, constant2, ..., constantN)

Pruning can also be applied to short ranges, which the optimizer can convert into equivalent lists of values.f

https://dev.mysql.com/doc/refman/8.0/en/explain.html

Queries

EXPLAIN FORMAT=TREE -> SELECT * -> FROM t1 -> JOIN t2 -> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) -> JOIN t3 -> ON (t2.c1 = t3.c1)\G

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c2)\G

--

--

Abhishek Rathore
Abhishek Rathore

Written by Abhishek Rathore

Software Engineer, constant learner and observer.

No responses yet