Feature Proposal: Next steps in improving the data store abstraction and search support
Motivation
Searching is the major bottleneck for any large wiki application. Many times we've said "WIBNIF we could use SQL", the idea being that you could then search over a database store (or a database cache of the store) using SQL queries.
Description and Documentation
Impact
Core changes are required to introduce the concept of a "Store cache". This can be done entirely within VC.
Need to add cache update triggers in the core.
Implementation
I have implemented an algorithm that analyses Foswiki %SEARCH statements and hoists SQL statements out of them. It then runs the queries over the database, narrowing down the set of topics to which the un-hoistable parts of the query have to be applied. This is very similar in approach to the existing
BruteForce query algorithm (unsurprising, since I wrote that as well).
The implementation - call it a "reference implementation" - uses sqlite, which does not scale well to large data, but the performance is pretty good on relatively small data. Moving to another SQL DB should be fairly trivial - it's just a case of mapping SQL syntax differences.
The schema used in the DB is extracted directly from
Foswiki::Meta::VALIDATE
, with the addition of a field
tid
to every generated table, and the following table:
CREATE TABLE 'topic' (
'tid' TEXT,
'web' TEXT,
'name' TEXT,
'text' TEXT,
UNIQUE (tid)
);
Note that the extraction is done
before any plugin or extension has a chance to call
registerMETA
, and the tables currently do
not auto-extend (scope for further work there).
--
Contributors: CrawfordCurrie - 15 Sep 2010
Discussion
Out of Off topic. Why does rev 1 of this topic only have 4 fields visible?
--
KennethLavrsen - 15 Sep 2010
Looks like empty fields are omitted.
--
ArthurClemens - 15 Sep 2010
I look forward to seeing what you've done - I managed something similar in the
MongoDBPlugin - but using the core code as it is in 1.1 - I'm curious what more improvements you've made, and how I can leverage them
wrt cache triggers - I'm not sure you need to - foswiki already has them in spades - see
MongoDBPlugin - the Plugins API....
mmm, mind you, I suspect it'd be better to have the cache triggers in Store / Meta - as VC can be bypassed already? (i'm reaching into deep memory atm the girls are screaming)
--
SvenDowideit - 16 Sep 2010
Yes, I read, the
MongoDBPlugin code before starting, thanks ;-). I also started out trying to use the plugin triggers - specifically the afterSaveHandler - but stopped because the afterSaveHandler is only called from
save
, not from
saveAs
- this is a legacy problem - so not all save paths are covered by the handler. It may be possible to connect to the cache via the afterSaveHandler, but I haven't tried it (yet; that's my strategy for providing this on Foswiki 1.1 and older). And yes, I wanted the hooks as close to the store as possible. There's really no need for the core to know anything about this cache, in any way; it should be entirely behind the Store interface.
Note that I decided not to attempt to implement any part of the store except the search. My thinking was KISS. I've also been using sqlite, which I acknowledge is nowhere near as efficient as other DB impls, but has the advantage of being simple(r) and easy to use at this early stage.
One thing I'm acutely conscious of is that the update step I currently have is monstrous. Basically I'm doing a "remove" of the existing object and "insert" of the new object. When the DB has indexes (there are none there at the moment) this could come to be a pain.
--
CrawfordCurrie - 16 Sep 2010
A number of pre-store2 prep steps, and the already-deprecated deep store listeners were implemented under this proposal. I'm switching it to
UnderConstruction and slotting it into 1.2. Please open a "Phase 2" proposal if another one is needed, and doesn't already exist.
--
GeorgeClark - 23 Feb 2012
The early work I did on this has since been subsumed by other efforts, such as MongoDB. I don't think there's anything more useful that can be done here. So I'm rejecting it.
--
CrawfordCurrie - 09 Mar 2012