Out of the box, the Lucid query parser comes with a basic plural stemmer that translates most plural words to their singular form. This should be sufficient for most applications. The stemming rules are all rule-based in an easy to read and write text file format that permits the addition of new rules and permits words to be protected or mapped specially. This permits flexibility for many more specialized applications.
If for some reason the administrator wishes to use an alternative stemmer, the change can be made manually in the schema.xml file. Any arbitrary stemming filter can be specified, but Lucid KStem is a typical alternative.
If you edit schema.xml, and search for the text_en field type, you should see that both its index and query analyzers have XML entries for the stemming filter that appear as follows:
The com.lucid.analysis.LucidPluralStemFilterFactory class represents the default plural stemmer. The rules parameter specifies the name of the text file that contains the plural stemming rules.
The com.lucid.analysis.LucidKStemFilterFactory class represents the Lucid KStem stemmer, which is disabled by default using the standard <!
- and -> comment markers.
To disable the default plural stemmer and enable Lucid KStem, simply remove the comment markers from the latter and add them to the former. Do this same thing for both the index and query analyzers. The edited lines should now appear as follows:
Be sure that you have chosen the same stemmer class for both the index and query analyzers. If the stemmer classes do not match, the result can be that some queries can fail if terms were indexed according to different rules than those used by the Lucid query parser.
In general, it is best to delete the index and do a full re-indexing of the data collection whenever an index analyzer is radically changed, such as is the case when stemming filters or rules are changed.
Other alternative stemming filters, such as Snowball and Porter, can be used by using a similar technique as described above.