Planner's choice 
Author Message
 Planner's choice


Quote:
> where chat_post_timeuser_idx is defined on the columns (time,poster_id)
> and chat_post_usertime_idx is defined on the columns (poster_id,time)
> Why is the planner not choosing the user_time index [for]
> avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc limit 2;
> NOTICE:  QUERY PLAN:
> Limit  (cost=0.00..32.40 rows=2 width=46) (actual time=96204.53..96204.71 rows=2 loops=1)
>   ->  Index Scan Backward using chat_post_time_idx on chat_post  (cost=0.00..42370.93 rows=2616 width=46) (actual time=96204.49..96204.64 rows=3 loops=1)
> Total runtime: 96205.18 msec

If you'd said "order by poster_id desc, time desc" then that index would be
considered to match the ORDER BY clause, and so would be usable in this
same type of plan.  As-is, the index is only useful for matching
poster_id and not for obtaining the required order, so the only plan
type considered for it involves an explicit sort step, which isn't
considered a win for the estimated number of rows matching the poster_id.

Quote:
> My plan now is to maintain my own set of poster_id stats and use one
> of several query variants depending on what they say but this requires
> at least some understanding of the choices made by the planner.

Rather than maintaining your own stats, consider boosting the statistics
target for the poster_id column.  You probably want the pg_stats info to
cover all the poster_ids that account for more than 1% of the entries.
The n_distinct value should improve too, producing a better estimate for
the infrequent poster_ids even though they're not explicitly stored.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.***.com/



Tue, 03 May 2005 11:27:20 GMT
 Planner's choice

Ok, I'm not sure why the planner is making the choices it is.

Given:

From pg_stats:

 tablename |   attname   | null_frac | avg_width | n_distinct |                                                        most_common_vals                                                        |                                           most_common_freqs                                           |                                                                                                                                   histogram_bounds                                                                                                                                   | correlation
-----------+-------------+-----------+-----------+------------+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 chat_post | poster_id   |         0 |         2 |        341 | {2149,1130,731,2595,1879,2473,1842,688,521,1656}                                                                               | {0.066,0.039,0.0276667,0.0256667,0.0253333,0.023,0.0226667,0.022,0.021,0.021}                         | {4,252,582,896,1162,1526,1747,1907,2114,2472,2946}                                                                                                                                                                                                                                   |    0.036261
 chat_post | time        |         0 |         8 |  -0.417335 | {"1998-07-08 15:09:00-04","1999-02-26 19:31:00-05","2000-01-27 12:07:00-05","2001-05-24 14:30:00-04","2002-01-22 10:04:00-05"} | {0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}                                         | {"1998-03-05 22:47:00-05","1998-08-29 11:49:00-04","1999-03-19 19:39:00-05","1999-08-23 09:26:00-04","2000-03-07 13:45:00-05","2000-11-29 15:48:00-05","2001-04-11 11:32:00-04","2001-08-31 15:43:00-04","2002-02-01 12:40:00-05","2002-06-19 06:32:00-04","2002-11-12 04:44:00-05"} |           1

with the table definition:

                 Table "chat_post"
   Column    |           Type           | Modifiers
-------------+--------------------------+-----------
 session_id  | smallint                 | not null
 poster_id   | smallint                 | not null
 time        | timestamp with time zone | not null
 post_number | smallint                 | not null
 fts         | txtidx                   |
Indexes: chat_post_text_idx,
         chat_post_time_idx,
         chat_post_timeuser_idx,
         chat_post_user_idx,
         chat_post_usertime_idx
Primary key: chat_post_pkey
Triggers: RI_ConstraintTrigger_23712080,
          RI_ConstraintTrigger_23706474

where chat_post_timeuser_idx is defined on the columns (time,poster_id)
and chat_post_usertime_idx is defined on the columns (poster_id,time)

Why is the planner not choosing the user_time index? For the first of these I
suspect it's the sky high correlation number on the time column but for the
other two?

avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc limit 2;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..32.40 rows=2 width=46) (actual time=96204.53..96204.71 rows=2 loops=1)
  ->  Index Scan Backward using chat_post_time_idx on chat_post  (cost=0.00..42370.93 rows=2616 width=46) (actual time=96204.49..96204.64 rows=3 loops=1)
Total runtime: 96205.18 msec

EXPLAIN

avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by user, time desc limit 2;
NOTICE:  QUERY PLAN:

Limit  (cost=10262.07..10262.07 rows=2 width=46) (actual time=17400.89..17400.95 rows=2 loops=1)
  ->  Sort  (cost=10262.07..10262.07 rows=2616 width=46) (actual time=17400.85..17400.88 rows=3 loops=1)
        ->  Index Scan using chat_post_user_idx on chat_post  (cost=0.00..10113.60 rows=2616 width=46) (actual time=99.53..16327.97 rows=3372 loops=1)
Total runtime: 17450.35 msec

EXPLAIN
avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc, user limit 2;
NOTICE:  QUERY PLAN:

Limit  (cost=10262.07..10262.07 rows=2 width=46) (actual time=1109.68..1109.73 rows=2 loops=1)
  ->  Sort  (cost=10262.07..10262.07 rows=2616 width=46) (actual time=1109.65..1109.67 rows=3 loops=1)
        ->  Index Scan using chat_post_user_idx on chat_post  (cost=0.00..10113.60 rows=2616 width=46) (actual time=1.47..598.56 rows=3372 loops=1)
Total runtime: 1166.65 msec
               (don't forget this is a bogus time due to the caching)

EXPLAIN

This is part of a larger query/issue. By redoing a query sprinkling 'limit's
throughout I managed to make a >90s query complete in 1s (with the loss of the
total number of results that would be returned). However, I then noticed that
was a feature of the particular poster_id I'd been testing with. Picking a far
less frequent poster_id pushed the time up to >300s. The reason for this was
the planner's choice of chat_post_time_idx which for low numbers of occurances
of poster_id was effectively kicking off a seqscan through an indexscan. My
plan now is to maintain my own set of poster_id stats and use one of several
query variants depending on what they say but this requires at least some
understanding of the choices made by the planner.

BTW, the largest number of occurances of a single poster_id is still only 6% of
the entire table. So changing the stats gathering shouldn't make any
difference? There's also >2000 distinct values of poster_id, not the 341
estimated by the stats, but again this shouldn't matter?

--
Nigel J. Andrews

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Tue, 03 May 2005 11:14:11 GMT
 
 [ 2 post ] 

 Relevant Pages 

1. Planner choices

2. Planner making poor choices?

3. Puzzling planner choice (non-urgent)

4. Visual Basic Programmer's Journal 1997 Reader's Choice Award

5. Visual Basic Programmer's Journal 1997 Reader's Choice Award

6. Visual Basic Programmer's Journal 1997 Reader's Choice Award

7. Visual Basic Programmer's Journal 1997 Reader's Choice Award

8. Visual Basic Programmer's Journal 1997 Reader's Choice Award

9. Visual Basic Programmer's Journal 1997 Reader's Choice Award

10. Visual Basic Programmer's Journal 1997 Reader's Choice Award

11. Visual Basic Programmer's Journal 1997 Reader's Choice Award

12. Visual Basic Programmer's Journal 1997 Reader's Choice Award


 
Powered by phpBB® Forum Software