论count使用不当的罪名和分页的优化

作者

digoal

日期

2016-05-06

背景

分页是一个非常常见的应用场景，然而恐怕没有多少人想过其优化方法。

确一味的责怪为什么数据库用count(*)计算分页数是如此的慢。

很多开发人员喜欢用count先算一下结果集的大小，然后就知道需要排多少页。

然后再从数据库取出对应的数据，并展示给用户。

问题1

count会扫一遍数据，然后取数据又扫一遍数据。重复劳动。

问题2，很多人喜欢用order by offset limit来展示分页。

其实也是一个非常大的问题，因为扫描的数据也放大了，即使在order by 的列上用到了索引也会放大扫描的数据量。

因为offset的row也是需要扫的。

分页总数count的优化手段

使用评估行数，方法如下

创建一个函数，从explain中抽取返回的记录数

CREATE FUNCTION count_estimate(query text) RETURNS INTEGER AS  
$func$  
DECLARE  
    rec   record;  
    ROWS  INTEGER;  
BEGIN  
    FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP  
        ROWS := SUBSTRING(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');  
        EXIT WHEN ROWS IS NOT NULL;  
    END LOOP;  
  
    RETURN ROWS;  
END  
$func$ LANGUAGE plpgsql;

评估的行数和实际的行数相差不大，精度和柱状图有关。

PostgreSQL autovacuum进程会根据表的数据量变化比例自动对表进行统计信息的更新。

而且可以配置表级别的统计信息更新频率以及是否开启更新。

评估行数

postgres=# select count_estimate('select * from sbtest1 where id between 100 and 100000');  
 count_estimate   
----------------  
         102166  
(1 row)  
  
postgres=# explain select * from sbtest1 where id between 100 and 100000;  
                                      QUERY PLAN                                         
---------------------------------------------------------------------------------------  
 Index Scan using sbtest1_pkey on sbtest1  (cost=0.43..17398.14 rows=102166 width=190)  
   Index Cond: ((id >= 100) AND (id <= 100000))  
(2 rows)

实际的行数

postgres=# select count(*) from sbtest1 where id between 100 and 100000;  
 count   
-------  
 99901  
(1 row)

也就是说，应用程序完全可以使用评估的记录数来评估分页数。

这样做就不需要扫描表了，性能提升尤为可观。

分页数据获取的优化手段

问题2其实表现在数据可能被多次扫描，使用游标就能解决。

未优化的情况，取前面的记录很快。

postgres=# explain analyze select * from sbtest1 where id between 100 and 1000000 order by id offset 0 limit 100;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..9.74 rows=100 width=190) (actual time=0.019..0.088 rows=100 loops=1)  
   ->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.43..93450.08 rows=1003938 width=190) (actual time=0.018..0.051 rows=100 loops=1)  
         Index Cond: ((id >= 100) AND (id <= 1000000))  
 Planning time: 0.152 ms  
 Execution time: 0.125 ms  
(5 rows)

取后面的记录，因为前面的记录也要扫描，所以明显变慢。

postgres=# explain analyze select * from sbtest1 where id between 100 and 1000000 order by id offset 900000 limit 100;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=83775.21..83784.52 rows=100 width=190) (actual time=461.941..462.009 rows=100 loops=1)  
   ->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.43..93450.08 rows=1003938 width=190) (actual time=0.025..308.865 rows=900100 loops=1)  
         Index Cond: ((id >= 100) AND (id <= 1000000))  
 Planning time: 0.179 ms  
 Execution time: 462.053 ms  
(5 rows)

如果有很多个分页，效率下降可想而知。

优化手段

postgres=# begin;  
BEGIN  
Time: 0.152 ms  
postgres=# declare cur1 cursor for select * from sbtest1 where id between 100 and 1000000 order by id;  
DECLARE CURSOR  
Time: 0.422 ms  
postgres=# fetch 100 from cur1;

。。。
获取到数据末尾时，效率也是一样的不会变化。

Count

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20160506_01.md

20160506_01.md

论count使用不当的罪名和分页的优化

作者

日期

标签

背景

分页总数count的优化手段

分页数据获取的优化手段

Files

20160506_01.md

Latest commit

History

20160506_01.md

File metadata and controls

论count使用不当的罪名 和 分页的优化

作者

日期

标签

背景

分页总数count的优化手段

分页数据获取的优化手段

论count使用不当的罪名和分页的优化