Retrieve index statistics (estimated results for search term)? 
Author Message
 Retrieve index statistics (estimated results for search term)?

Hi,

I know this must be possible since Query Analyzer offers such
information. If I look at the execution plan of a CONTAINSTABLE query,
the execution plan displays a "remote scan" with an estimated number
of rows, which would be a valuable information for me.

I require this since my queries usually have a date range term plus a
fulltext search term. I limit the fulltext search results to 10.000
with a join like

JOIN FULLTEXTABLE(mytable, myfield, '"searchword"', 10000) ft ON
ft.[KEY] = mytable.id

Now if "searchword" appears 200.000 times and I retrieve only the top
10000 ordered by RANK as in this example and limit the search to the
last month, I might not get a single hit (although there are many)
since the TOP 10000 BY RANK hits may be older than a month. And sadly
- as far as I know - there is no other ordering possible here than BY
RANK.

In this situation, I cannot even tell the user to limit his search,
since I wouldn't know that he really did get no results since his
searchword is too common.

If however I could peek the index statistics, I could inform the user
of the problem with his fulltext search.

Any hint would be great.

Regards

DC



Tue, 17 May 2005 19:18:07 GMT
 Retrieve index statistics (estimated results for search term)?

the Index Tuning Wizard gets its info by running the
query.  

So other than knowing this info in advance and storing it
in a table, I can't think of a way to do this.

Quote:
>-----Original Message-----
>Hi,

>I know this must be possible since Query Analyzer offers
such
>information. If I look at the execution plan of a

CONTAINSTABLE query,
Quote:
>the execution plan displays a "remote scan" with an
estimated number
>of rows, which would be a valuable information for me.

>I require this since my queries usually have a date range
term plus a
>fulltext search term. I limit the fulltext search results
to 10.000
>with a join like

>JOIN FULLTEXTABLE(mytable, myfield, '"searchword"',
10000) ft ON
>ft.[KEY] = mytable.id

>Now if "searchword" appears 200.000 times and I retrieve
only the top
>10000 ordered by RANK as in this example and limit the
search to the
>last month, I might not get a single hit (although there
are many)
>since the TOP 10000 BY RANK hits may be older than a
month. And sadly
>- as far as I know - there is no other ordering possible
here than BY
>RANK.

>In this situation, I cannot even tell the user to limit
his search,
>since I wouldn't know that he really did get no results
since his
>searchword is too common.

>If however I could peek the index statistics, I could
inform the user
>of the problem with his fulltext search.

>Any hint would be great.

>Regards

>DC
>.



Tue, 17 May 2005 21:15:12 GMT
 Retrieve index statistics (estimated results for search term)?
Hi Hillary,

running the query in one of my examples takes a few minutes. However,
the query analyzer displays the execution plan within a few seconds.
So it must take some shortcut.

I also know no way to get the information that the query analyzer
gathers about the ft index statistics via T-SQL. So it probably uses a
different method to look up this information. If there is a T-SQL
method (like querying the systables) to get the estimated amount of
hits for a fulltext search phrase, then this would already solve my
problem.

Kind Regards

DC

Quote:

> the Index Tuning Wizard gets its info by running the
> query.  

> So other than knowing this info in advance and storing it
> in a table, I can't think of a way to do this.
> >-----Original Message-----
> >Hi,

> >I know this must be possible since Query Analyzer offers
>  such
> >information. If I look at the execution plan of a
>  CONTAINSTABLE query,
> >the execution plan displays a "remote scan" with an
>  estimated number
> >of rows, which would be a valuable information for me.

> >I require this since my queries usually have a date range
>  term plus a
> >fulltext search term. I limit the fulltext search results
>  to 10.000
> >with a join like

> >JOIN FULLTEXTABLE(mytable, myfield, '"searchword"',
>  10000) ft ON
> >ft.[KEY] = mytable.id

> >Now if "searchword" appears 200.000 times and I retrieve
>  only the top
> >10000 ordered by RANK as in this example and limit the
>  search to the
> >last month, I might not get a single hit (although there
>  are many)
> >since the TOP 10000 BY RANK hits may be older than a
>  month. And sadly
> >- as far as I know - there is no other ordering possible
>  here than BY
> >RANK.

> >In this situation, I cannot even tell the user to limit
>  his search,
> >since I wouldn't know that he really did get no results
>  since his
> >searchword is too common.

> >If however I could peek the index statistics, I could
>  inform the user
> >of the problem with his fulltext search.

> >Any hint would be great.

> >Regards

> >DC
> >.



Thu, 19 May 2005 20:13:10 GMT
 Retrieve index statistics (estimated results for search term)?
DC,
While the Index Tuning Wizard (ITW) is useful for providing helpful hints
about "normal" SQL Server indexes (clustered and non-clustered) via
"workloads", the ITW knows nothing at all about FT Indexes, nor will the
Query Analyzer (QA) be of much help as all FT Search queries reference
"remote scan" process, i.e., the remote "Microsoft Search" (mssearch.exe)
service scanning the related FT Catalog files.

Since, you're using CONTAINSTABLE along with the Top_N_Rank parameter, I'm
assuming you've read the KB Article , Q240833 "FIX: Full-Text Search
Performance Improved via Support for TOP"
http://support.microsoft.com/directory/article.asp?ID=KB;EN-US;Q240833

You are correct, in regards to "...there is no other ordering possible here
than BY RANK" and if you want to "limit the search to the
'last' [or current] month [first]" and still use CONTAINSTABLE with
Top_N_Rank and TOP, you will need to 'partition your table into multiple
tables by month/year and then depending upon the FTS query, you would return
the "latest" year/month table results first and perhaps ask your user do
they want more past history results.

Regards,
John


Quote:
> Hi Hillary,

> running the query in one of my examples takes a few minutes. However,
> the query analyzer displays the execution plan within a few seconds.
> So it must take some shortcut.

> I also know no way to get the information that the query analyzer
> gathers about the ft index statistics via T-SQL. So it probably uses a
> different method to look up this information. If there is a T-SQL
> method (like querying the systables) to get the estimated amount of
> hits for a fulltext search phrase, then this would already solve my
> problem.

> Kind Regards

> DC




- Show quoted text -

Quote:
> > the Index Tuning Wizard gets its info by running the
> > query.

> > So other than knowing this info in advance and storing it
> > in a table, I can't think of a way to do this.
> > >-----Original Message-----
> > >Hi,

> > >I know this must be possible since Query Analyzer offers
> >  such
> > >information. If I look at the execution plan of a
> >  CONTAINSTABLE query,
> > >the execution plan displays a "remote scan" with an
> >  estimated number
> > >of rows, which would be a valuable information for me.

> > >I require this since my queries usually have a date range
> >  term plus a
> > >fulltext search term. I limit the fulltext search results
> >  to 10.000
> > >with a join like

> > >JOIN FULLTEXTABLE(mytable, myfield, '"searchword"',
> >  10000) ft ON
> > >ft.[KEY] = mytable.id

> > >Now if "searchword" appears 200.000 times and I retrieve
> >  only the top
> > >10000 ordered by RANK as in this example and limit the
> >  search to the
> > >last month, I might not get a single hit (although there
> >  are many)
> > >since the TOP 10000 BY RANK hits may be older than a
> >  month. And sadly
> > >- as far as I know - there is no other ordering possible
> >  here than BY
> > >RANK.

> > >In this situation, I cannot even tell the user to limit
> >  his search,
> > >since I wouldn't know that he really did get no results
> >  since his
> > >searchword is too common.

> > >If however I could peek the index statistics, I could
> >  inform the user
> > >of the problem with his fulltext search.

> > >Any hint would be great.

> > >Regards

> > >DC
> > >.



Fri, 20 May 2005 05:26:36 GMT
 Retrieve index statistics (estimated results for search term)?
Hi,

to better illustrate my point, I have uploaded two Query Analyzer
screenshots.

Here is the execution plan of a simple full text query:

http://www.upsize.de/qa_ft1.gif

Sorry that this is the German version. As you can see in the lower
right corner, creating this execution plan took about 1 second. I
point my mouse on the remote scan and get the yellow info box, which
in the line "Zeilenanzahl (geschaetzt)" (= estimated number of rows)
display a count of "18.098". This estimated hit count for the
expression "bank*" is what I would like to retrieve from witin a
stored procedure.

http://www.upsize.de/qa_ft1.gif

As you can see here, the estimated number of rows is accurate - as
long as there is no ongoing maintenance of the fulltext index I guess:

http://www.upsize.de/qa_ft2.gif

Also note, that this query took about 4 seconds to execute. A complex
fulltext query may take minutes to execute, still the execution plan
with the estimated number of hits is available within seconds.

If I can query this estimated number, then I can give the user the
option to search within two fulltext indexed columns. I limit both
(with top-n-by-rank) queries to 10.000 hits. I let the user know, that
the first term would reveals e.g. about 30.000 hits and the second
term about 200 hits, and that he should insert a better search term
for the first term since he may not get a result otherwise (although
there may be up to 200 documents matching both terms BUT these 200
documents did accidentally not go through the top-n-by-rank filter of
term one).

Best regards,

DC

Quote:

> DC,
> While the Index Tuning Wizard (ITW) is useful for providing helpful hints
> about "normal" SQL Server indexes (clustered and non-clustered) via
> "workloads", the ITW knows nothing at all about FT Indexes, nor will the
> Query Analyzer (QA) be of much help as all FT Search queries reference
> "remote scan" process, i.e., the remote "Microsoft Search" (mssearch.exe)
> service scanning the related FT Catalog files.

> Since, you're using CONTAINSTABLE along with the Top_N_Rank parameter, I'm
> assuming you've read the KB Article , Q240833 "FIX: Full-Text Search
> Performance Improved via Support for TOP"
> http://support.microsoft.com/directory/article.asp?ID=KB;EN-US;Q240833

> You are correct, in regards to "...there is no other ordering possible here
> than BY RANK" and if you want to "limit the search to the
> 'last' [or current] month [first]" and still use CONTAINSTABLE with
> Top_N_Rank and TOP, you will need to 'partition your table into multiple
> tables by month/year and then depending upon the FTS query, you would return
> the "latest" year/month table results first and perhaps ask your user do
> they want more past history results.

> Regards,
> John



> > Hi Hillary,

> > running the query in one of my examples takes a few minutes. However,
> > the query analyzer displays the execution plan within a few seconds.
> > So it must take some shortcut.

> > I also know no way to get the information that the query analyzer
> > gathers about the ft index statistics via T-SQL. So it probably uses a
> > different method to look up this information. If there is a T-SQL
> > method (like querying the systables) to get the estimated amount of
> > hits for a fulltext search phrase, then this would already solve my
> > problem.

> > Kind Regards

> > DC



> > > the Index Tuning Wizard gets its info by running the
> > > query.

> > > So other than knowing this info in advance and storing it
> > > in a table, I can't think of a way to do this.
> > > >-----Original Message-----
> > > >Hi,

> > > >I know this must be possible since Query Analyzer offers
>  such
> > > >information. If I look at the execution plan of a
>  CONTAINSTABLE query,
> > > >the execution plan displays a "remote scan" with an
>  estimated number
> > > >of rows, which would be a valuable information for me.

> > > >I require this since my queries usually have a date range
>  term plus a
> > > >fulltext search term. I limit the fulltext search results
>  to 10.000
> > > >with a join like

> > > >JOIN FULLTEXTABLE(mytable, myfield, '"searchword"',
>  10000) ft ON
> > > >ft.[KEY] = mytable.id

> > > >Now if "searchword" appears 200.000 times and I retrieve
>  only the top
> > > >10000 ordered by RANK as in this example and limit the
>  search to the
> > > >last month, I might not get a single hit (although there
>  are many)
> > > >since the TOP 10000 BY RANK hits may be older than a
>  month. And sadly
> > > >- as far as I know - there is no other ordering possible
>  here than BY
> > > >RANK.

> > > >In this situation, I cannot even tell the user to limit
>  his search,
> > > >since I wouldn't know that he really did get no results
>  since his
> > > >searchword is too common.

> > > >If however I could peek the index statistics, I could
>  inform the user
> > > >of the problem with his fulltext search.

> > > >Any hint would be great.

> > > >Regards

> > > >DC
> > > >.



Sat, 21 May 2005 21:14:50 GMT
 
 [ 5 post ] 

 Relevant Pages 

1. Retrieving Row Index as part of Result set??

2. exp: why have statistics=estimate option?

3. Estimate or analyze statistics

4. Estimate statistics

5. Estimate statistics - response

6. estimate statistics

7. searches and indexing...too many results

8. Retrieving specific matched terms

9. Searching within Search Results

10. Searching previous search results

11. Checking query results against selectivity estimate

12. How to remember search results and search within them


 
Powered by phpBB® Forum Software