Item10149: versions query w/SEARCH, RCS store = hang
| Priority: |
CurrentState: |
AppliesTo: |
Component: |
WaitingFor: |
| Enhancement |
Confirmed |
Engine |
SEARCH |
|
I've not had much luck with the
versions query in
SEARCH.
It works fine for individual topics with
QUERY, but
SEARCH is just too slow unless you only have a few dozen topics.
I think we need a strategy before we ship this. At a minimum, disable
versions in
SEARCH with RCS stores, or alternatively we introduce a new feature to make SEARCHes abort long-running queries in a graceful manner (can we just return a shorter-than-normal resultset, and log/emit an error?).
The query I ran on trunk.foswiki.org was something like:
%SEARCH{
"versions[author='PaulHarvey']"
web="Development"
limit="50"
}%
Which ran for several minutes before Koen killed the process for me.
--
PaulHarvey - 11 Dec 2010
Yeah, well, TBH I'm not that surprised. A versions query has to load a shitload of information just to query, and that's not efficient. As you say, the RCS store just isn't built for this kind of query.
Should this be a fix just for a versions query, or is there a more general problem, that a user should be able to put a limit on the amount of time spent on a query? A general mechanism would work for other types of bad query.
--
CrawfordCurrie - 11 Apr 2011
I don't have any good ideas on how to control a hypothetical "search timeout" feature. Configure setting? Macro param? URL param? What should be the default?
On my own wiki, I have some reports that just take 10s of seconds, and a dot graph that takes minutes - I need to be able to run those from a cron job, where I save the output back into "cache" topic which is refreshed every hour.
Let's say we have a default
SEARCH timeout of 10s - I need a way for my cron-job scenario to override it so that it will run to completion from CLI.
Additionally: I don't know about general Foswiki practice, but the biggest troubles on my wiki are nested searches. If there's 200 SEARCHes as a result of an outer
SEARCH, how do we apply a "timeout" in that case, if each individual
SEARCH is still on the order of ~2s?
Hmm, so at first glance it seems that a general solution is a can of worms.
If I get time to invest in this, I think I'd rather try to ship a SearchAlgorithm which continues to work with RCS store but perhaps caches just the
%META part of every topic version in
working somehow (reproducing the
data/ directory layout, but
Topic.txt,v would be a directory)
--
PaulHarvey - 11 Apr 2011
I have added a
caveat emptor to the
QuerySearch topic to warn of the performance risks. Having done this I think it is valid to regrade this report from 'Urgent' to 'Enhancement'.
--
CrawfordCurrie - 21 Jan 2013
I think that caching of the RCS log information plus the %META would be good as a basic feature of the RCS based Store. Anything that needs to dip into the rcs log records, for ex. the Attachment display of the comment field, is horribly slow. Avoiding RCS to access the topic metadata history without a full rcs pass would be very helpful. Maybe store it along side the
file, and
file,v as
file,meta.
--
GeorgeClark - 21 Jan 2013