|
LucidWorks Search includes several integration points to plug in business rules engines as well as a default implementation of a rules engine based on the open source Drools from Red Hat. LucidWorks implemented these integration points using Drools 5.3.0. The integration should allow engines to alter a variety of things, such as search results, facets, sorting and filtering criteria, and the documents themselves, both at index time and query time. Many users will not need to know all the details of the topics covered in this section, but they will show up in their configuration and it may be helpful to be able to customize how the rules work on queries or documents as they are indexed. More information on the LucidWorks Search integration with Drools is available in the section on Drools Integration, which provides information on writing rules files themselves. |
Topics covered in this section: |
As mentioned previously, all business rules engines depend on facts to analyze and take actions. In LucidWorks Search the FactCollector collects and adds these facts to the knowledge session. LucidWorks Search comes with two FactCollector implementations, but the default one is going to be used most of the times. The section on Drools Integration has more information about the default FactCollector, but we've also included details on the default facts for each approach below.
Search Integration
Search integration consists of two main functionalities:
- RulesComponent: The ability to modify incoming parameters for searching, faceting, sorting, etc., as well as to modify the intermediate results.
- LandingPageComponent: The ability to bypass the rest of the processing and return a landing page based on some matching criteria.
Applying Rules at Query Time (RulesComponent)
The primary mechanism for applying rules at query time (i.e., is not a document indexing request) is via a Solr searchComponent called RulesComponent. The RulesComponent can be configured to occur anywhere in the searchComponent, but it is typically best to configure it to be the first item in the chain after the filter by role component, since it is often the case that you want rules to make decisions based on the application's input parameters (such as the query, sort, etc.) and you want the rules to make changes before they get processed by the other components. For instance, you may have a rule that fires when the user query is equal to "title:dogs" and you want the rule to change the query to be "title:dogs AND category:pets". By configuring the component first in the chain, you will be able to change the query before it is parsed, thus saving extra rule writing involving re-arranging complex Query objects.
Configuration
searchComponents are configured in the solrconfig.xml file for each collection (found in $LWE_HOME/conf/solr/cores/collection/conf/ where collection is the name of the specific collection). The default configuration for the RulesComponent is:
<searchComponent class="com.lucid.rules.RulesComponent" name="firstRulesComp"> <str name="requestHandler">/rulesMgr</str> <str name="engine">first</str> <!-- The handle can be used to turn on or off explicit rules components in the case when you have multiple rules at different stages of the component ordering--> <str name="handle">first</str> </searchComponent> <searchComponent class="com.lucid.rules.RulesComponent" name="lastRulesComp"> <str name="requestHandler">/rulesMgr</str> <str name="engine">last</str> <str name="handle">last</str> </searchComponent>
This can then be hooked into the requestHandler using the first-components, components or last-components capability. For example, here is the component list for the default /lucid requestHandler in solrconfig.xml:
<arr name="components"> <str>filterbyrole</str> <str>landingPage</str> <str>firstRulesComp</str> <str>query</str> <str>mlt</str> <str>stats</str> <str>feedback</str> <!-- Note: highlight needs to be after feedback --> <str>highlight</str> <!-- Note: facet also needs to be after feedback --> <str>facet</str> <str>spellcheck</str> <str>lastRulesComp</str> <str>debug</str> </arr>
Input Parameters
There is a fair amount of control around exactly when rules will be fired.
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
| rules | boolean | Turn on or off the RulesComponent | false | &rules=false |
| rules.<handle name> | boolean | Turn on or off a specific RulesComponent instance using the handle name | true | &rules.first=false |
| rules.prepare | boolean | Turn off rule processing as part of the prepare phase | true | &rules.prepare=false |
| rules.process | boolean | Turn off rule processing as part of the process phase | true | &rules.process=false |
| rules.finishStage | boolean | Turn off rule processing as part of the finishStage phase | true | &rules.finishStage=false |
The system does not currently allow you to turn off individual phases of an instance (unless it is the only instance that is configured). In other words, if two RulesComponent-s are configured, it is not possible to turn off the process phase of only one.
Facts Collected for the RulesComponent
The facts collected for the RulesComponent are:
- The ResponseBuilder object
- The SolrQueryRequest object
- The schema for the index
- The context information of the request (including the phase of processing, like “process” or “prepare)
- The SolrQueryResponse object
- The query response NamedList
- The request parameters map as a ModifiableSolrParams instance (can be edited by rules)
- The generated query object, which is the same as the parsed query. In some cases, clauses of the query will be added to the knowledge session to allow the rules engine to evaluate any part of the query.
- The filter queries
- Response results (the DocListAndSet instance)
- The sort spec
- The grouping spec
- Facet counts
Some of the items on this list will only be available to the rules engine if the RulesComponent is placed after the associated searchComponent for the fact. For example, in order to have facet information available to the rules engine, the RulesComponent has to be placed after that component in the searchComponents chain for the requestHandler.
Landing Pages (LandingPageComponent)
The LandingPageComponent is very similar to the RulesComponent. The LandingPageComponent does not turn off other components in the chain, but it is generally possible for the rules engine to do so. For example, if you wanted to disable faceting, you would add a rule such as facet=false. For the query, you could add query=false. The exact methods you need are dependent on the search components you have enabled. See also the section Search Components API for one approach to finding enabled search components for the requestHandler in use.
Placing the landing page in the output is also the responsibility of the rule writer. In essence, all the LandingPageComponent does is guarantee that it is called as part of rules and fact preparation and that the rules used can be configured separately from other rules.
Configuration
The LandingPageComponent is also configured in the solrconfig.xml file for each collection. The default LucidWorks Search integration includes this section:
<searchComponent class="com.lucid.rules.LandingPageComponent" name="landingPage"> <str name="requestHandler">/rulesMgr</str> <str name="engine">landing</str> <!-- The handle can be used to turn on or off explicit rules components in the case when you have multiple rules at different stages of the component ordering --> <str name="handle">landing</str> </searchComponent>
As with the RulesComponent, this can then be hooked into the requestHandler using the first-components, components or last-components capability. See also the example in the section Applying Rules at Query Time (RulesComponent).
Input Parameters
Like the RulesComponent, the LandingPageComponent has several parameters. One thing to note is that the LandingPageComponent is only executed in the prepare phase of rules execution so the other available parameters will likely not be required for your implementation.
| Parameter | Type | Description | Default | Example |
|---|---|---|---|---|
| landing | boolean | Turn on or off the LandingPageComponent | false | &landing=false |
| landing.<handle name> | boolean | Turn on or off a specific LandingPageComponent instance using the handle name | true | &landing.first=false |
| landing.prepare | boolean | Turn off rule processing as part of the prepare phase | true | &landing.prepare=false |
| landing.process | boolean | Turn off rule processing as part of the process phase | true | &landing.process=false |
| landing.finishStage | boolean | Turn off rule processing as part of the finishStage phase | true | &landing.finishStage=false |
Facts Collected for the LandingPageComponent
The facts collected for the LandingPageComponent are:
- The ResponseBuilder object
- The SolrQueryRequest object
- The schema for the index
- The context information of the request (including the phase of processing, like “process” or “prepare)
- The SolrQueryResponse object
- The query response NamedList
- The request parameters map as a ModifiableSolrParams instance (can be edited by rules)
- The generated query object, which is the same as the parsed query. In some cases, clauses of the query will be added to the knowledge session to allow the rules engine to evaluate any part of the query.
- The filter queries
- Response results (the DocListAndSet instance)
- The sort spec
- The grouping spec
- Facet counts
Some of the items on this list will only be available to the rules engine if the {LandingPageComponent is placed after the associated searchComponent for the fact. For example, in order to have facet information, the LandingPageComponent has to be placed after that component in the searchComponents chain for the requestHandler.
Altering Documents
Just as it is possible to alter queries and sort parameters, it is also possible to alter documents, both at index time and at query time.
Alter Documents at Index time (RulesUpdateProcessor)
The RulesUpdateProcessor(Factory) is a Solr UpdateRequestProcessorFactory that can be configured to be a part of the update processor chain to alter documents as they are about to be indexed. Like any UpdateRequestProcessorFactory, it has access to the AddUpdateCommand, which can then be exposed to the RulesEngine via the prepareDocument() method.
By default, the RulesUpdateProcessor is configured in the lucid-update-chain and can be enabled or disabled by passing in the name of the handle, prefixed by rules.. For instance, if the Processor has a handle of docProc, then &rules.docProc=false would disable the processor and processing would continue down the chain. Rule processing is on by default.
Like the query-related rules processing, altering documents relies on facts during the knowledge session. The section on Drools Integration has more information about the default FactCollector, including the types of facts available to the rules engine.
Configuration
Here is the default configuration for the lucid-update-chain in the solrconfig.xml file for each collection:
<updateRequestProcessorChain name="lucid-update-chain"> <processor class="com.lucid.update.CommitWithinUpdateProcessorFactory" /> <processor class="com.lucid.update.FieldMappingUpdateProcessorFactory" /> <processor class="com.lucid.rules.RulesUpdateProcessorFactory"> <str name="requestHandler">/rulesMgr</str> <!-- we re-use the engine, but we could have an independent one--> <str name="engine">docs</str> <!-- Each one should have it's own handle, as you can have multiple in the chain --> <str name="handle">docProc</str> </processor> <processor class="com.lucid.update.DistributedUpdateProcessorFactory"> <!-- example configuration... "shards should be in the *same* order for every server in a cluster. Only "self" should change to represent what server *this* is. <str name="self">localhost:8983/solr</str> <arr name="shards"> <str>localhost:8983/solr</str> <str>localhost:7574/solr</str> </arr> --> </processor> <processor class="solr.LogUpdateProcessorFactory"> <int name="maxNumToLog">10</int> </processor> <processor class="solr.DistributedUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
To disable rules processing, you can either remove or comment out the section that defines the com.lucid.rules.RulesUpdateProcessorFactory parameters.
Facts Collected for the RulesUpdateProcessor
The facts collected for the RulesUpdateProcessor are:
- The AddUpdateCommand as received in the UpdateRequestProcessor.processAdd(AddUpdateCommand) method
- The SolrInputDocument being added
- The schema for the index
Altering Documents at Query time (RulesDocTransformer)
In Solr, a DocTransformer is invoked as part of response writing to allow applications to inject and/or modify the fields on a document before being returned to the client. For instance, it can be used to include a new value for a field calculated by a rule or to do things like alter a price for a specific user.
Again, altering documents relies on facts during the knowledge session. The section on Drools Integration has more information about the default FactCollector, including the types of facts available to the rules engine.
| Altering a field will not cause an item to be resorted If, for example, you are sorting by price and you change one of the document's prices, this will not cause a re-sort. If you want to do that, we suggest you use Solr's Sort by Function capability. |
Facts Collected for the RulesDocTransformer
The facts collected for the RulesDocTransformer are:
- The SolrInputDocument being transformed
- The docId of the document being transformed (the Lucene internal docId, not Solr’s uniqueKey)
- The schema for the index
The RulesEngineManagerHandler
The RulesEngineManagerHandler is the Solr requestHandler that holds on to references to the various rules engine instances specified in the Solr configuration. The manager maintains a map of engines to their names. Most components are set up to take in the name of this RequestHandler and then go ask it for the engine by name.
<requestHandler class="com.lucid.rules.RulesEngineManagerHandler" name="/rulesMgr"> <!-- Engines can be shared, but they don't have to be. A SearchComponent or other consumer can specify the engine they want by name. --> <lst name="engines"> <lst name="engine"> <str name="name">first</str> <str name="class">com.lucid.rules.drools.stateless.StatelessDroolsRulesEngine</str> <lst name="rules"> <str name="file">rules/defaultFirst.drl</str> </lst> <!-- The fact collector defines what facts get injected into the rules engines working memory --> <!--<lst name="factCollector"> <str name="class">com.lucid.rules.drools.FactCollector</str> </lst>--> </lst> <lst name="engine"> <str name="name">landing</str> <str name="class">com.lucid.rules.drools.stateless.StatelessDroolsRulesEngine</str> <lst name="rules"> <str name="file">rules/defaultLanding.drl</str> </lst> </lst> <!-- Engine is using rules that are designed to be called after all other components --> <lst name="engine"> <str name="name">last</str> <str name="class">com.lucid.rules.drools.stateless.StatelessDroolsRulesEngine</str> <lst name="rules"> <str name="file">rules/defaultLast.drl</str> </lst> </lst> <lst name="engine"> <str name="name">docs</str> <str name="class">com.lucid.rules.drools.stateless.StatelessDroolsRulesEngine</str> <lst name="rules"> <str name="file">rules/defaultDocs.drl</str> </lst> </lst> </lst> </requestHandler>