Solr Plugin
Enterprise Search Engine for Foswiki based on
Solr
About Solr
Solr is an open source enterprise search server based on the Lucene Java search
library, with XML/HTTP and JSON APIs, hit highlighting, faceted search,
caching, replication, and a web administration interface. It runs in a Java
servlet container such as Tomcat.
This extension comes with a preconfigured jetty configurd to run a Solr webapp
right away.
Installation
First follow the normal plugin instructions as follows.
You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.
Open configure, and open the "Extensions" section. Use "Find More Extensions" to get a list of available extensions. Select "Install".
If you have any problems, or if the extension isn't available in
configure, then you can still install manually from the command-line. See
http://foswiki.org/Support/ManuallyInstallingExtensions for more help.
Downloading SolrPlugin-bin
SolrPlugin is distributed in two parts:
- SolrPlugin: the Foswiki specific part and
- SolrPlugin-bin: java binary package containing the latest stable solr, jetty and other required jar packages
Both have to be downloaded and installed. They are available at
Foswiki:Extensions/SolrPlugin.
After downloading both have to be unpacked in your Foswiki installation directory (<foswiki-dir>).
The reason for this split is to provide one light-weighted Foswiki plugin in terms of download size, whereas the
SolrPlugin-bin part comprises the most current stable release of solr. The bin part will most likely only
change when there is a new release of the solr software itself, whereas the Foswiki part might change more often.
So you won't need to download SolrPlugin-bin yet again when upgrading SolrPlugin.
Starting the Solr webservice
The SolrPlugin will send all content to be indexed to a Solr webservice via HTTP.
So you will need to install a Solr webservice in a servlet container
of your choice, e.g.
tomcat or
jetty.
SolrPlugin comes with a ready-to use jetty engine configured
to start a Solr server on the host the Foswiki engine is running.
This can be either started manually using the
solrstart tool or be launched automatically
if SolrPlugin can't ping the server.
By default Solr listens to port 8983 on the localhost and is configured to only
allow connections from localhost for security reasons. You can change these
settings in
<foswiki-dir>/solr-run/etc/jetty.xml
After checking these settings start the Solr daemon in the background using
<foswiki-dir>/tools/solrstart
Make sure that Foswiki is configured to contact the Solr server at the correct
url in
configure.
Now test that
SolrSearch? is working fine. Note that it won't show any search results
by now as we didn't index any content yet.
Indexing existing content offline
Before using
SolrSearch? you will need to index your content completely. Afterwards
Foswiki will keep the index up-to-date whenever you change a topic or upload another attachment.
Let's first test if the indexer is working fine by indexing a single topic:
<foswiki-dir>/tools/solrindex topic=Main.WebHome
Now check if this topic shows up in
SolrSearch? .
If that worked out fine go for the complete thing:
<foswiki-dir>/tools/solrindex mode=full optimize=on
This will crawl all webs, topics and attachment and submit them to the Solr server which in turn
will build up the search index. This can take a while depending on the amount of content and number of users
registered to your site already. During this process attachments are "stringified" using the
StringifierContrib, that is
they are converted into a plain text format that Solr can read. SolrPlugin will cache
the stringified version of all attachments and only process them again if the corresponding
binary version has changed. Thus a next full index run will process significant faster.
SolrPlugin does read the access rights of all users to a document while indexing it.
This access control information is indexed together with the document to secure entries
in the database. Any request will take these under consideration so that only users with
view rights to a document can retrieve it using SolrSearch.
Setting up immediate indexing
Whenever a topic or attachment in Foswiki changes, solr has to read the changed documents and update
its index. This can either be done immediately when a topic is saved, an attachment is uploaded or moved
to a different location. This can be configured to your needs:
Enable/disable updates on save:
$Foswiki::cfg{SolrPlugin}{EnableOnSaveUpdates} = 0;
Enable/disable updates when a new attachment has been uploaded:
$Foswiki::cfg{SolrPlugin}{EnableOnUploadUpdates} = 0;
Enable/disable updates when a topic or attachment has been moved or deleted:
$Foswiki::cfg{SolrPlugin}{EnableOnRenameUpdates} = 1;
All but the latter are disabled by default. That's because updating solr's index might take a noticable amount of time
when clicking on "save" in the wiki editor, even more when the saved topic has got a lot of attachments as when the topic updates all of its attachments are reindexed as well.
The
EnableOnRenameUpdates is
enabled by default as this does influence the solr index and the way it displays search results significantly.
Setting up offline indexing
In any case it is strongly recommended to
fully reindex all of your documents regularly, presumably every 24h at night.
To achieve this install a cron-job like this one:
0 0 * * * <foswiki-dir>/tools/solrjob full
This will read all existing webs one by one and re-index all topics and attachments. Afterwards the index is optimized for size and performance.
In addition you'd like to set up
delta indexing in a rather short term, by reindexing all documents that changed since
the last time the delta indexing was performed. This is set up using a cron job like this:
0-59/5 * * * * <foswiki-dir>/tools/solrjob delta
This will start
solrindex in delta mode every 5 minutes. You could even think about shortening this interval even more. However 5 minutes seems to be a good tradeoff of wasting resources vs. having all content updated in a timely manner.
Instead of waiting for cron to trigger the delta indexing job,
iwatch is a much better alternative giving you near-realtime indexing. Iwatch is available on linux systems that implement the inotify kernel service. That way the underlying operating system will trigger the
solrjob script as soon as a file has changed. An example of an
iwatch.xml file triggering a delta index job looks like this:
<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
<config>
<guard email="root@localhost" name="IWatch"/>
<watchlist>
<title>Foswiki</title>
<contactpoint email="root@localhost" name="Administrator"/>
<path type="recursive" alert="off" syslog="on" exec="su -c <foswiki-dir>/tools/solrjob <httpd-user>"><foswiki-dir>/data</path>
<path type="regexception">\.svn|\.lease|\.lock|,$|\.changes|,v|^_[0-9]</path>
</watchlist>
</config>
Make sure to replace <foswiki-dir> and <httpd-user> with the appropriate values on your platform.
Note that delta indexing is strongly recommended when one of the settings
EnableOnSaveUpdates,
EnableOnUploadUpdates or
EnableOnRenameUpdates is disabled.
Usage
Faceted search interface
Macros
SOLRSEARCH
SOLRFORMAT
SOLRSIMILAR
Rest inteface
search
terms
similar
autocomplete
Commandline tools
solrstart
solrindex
solrdelete
Perl interface
registerIndexTopicHandler()
registerIndexAttachmentHandler()
Templates
Structure of SolrSearchBaseTemplate
Replacing WebSearch and WebChanges
Creating custom search interfaces
Solr indexing schema
Screenshots
Plugin Info
| Author: |
Foswiki:Main.MichaelDaum |
| Copyright: |
© 2009-2010, Michael Daum http://michaeldaumconsulting.com |
| License: |
GPL (GNU General Public License) |
| Release: |
1.10 |
| Version: |
10340 (2010-12-16) |
| Home: |
Foswiki:Extensions/SolrPlugin |
| Support: |
Foswiki:Support/SolrPlugin |
| Change History: |
|
| 16 Dec 2010: |
added state field to schema used for approval workflows; added solrjob to ease cronjobbing indexing; added docu how to use iwatch for almost-realtime indexing; fixed dependencies to include Foswiki:Extensions/FilterPlugin as well; fixed mapping facet values to their display title in search interface; fixed delta updates not properly removing outdated attachment entries when these where moved/renamed; and some minor html improvements |
| 03 Dec 2010: |
fixed solr-based WebChanges and SiteChanges using PatternSkin |
| 01 Dec 2010: |
adjustments due to changes in stringifier api; fixed removal of deleted webs from search index |
| 22 Nov 2010: |
fixes integration with pattern skin |
| 18 Nov 2010: |
initial public release |