Support Resources

LucidWorks Forum
KnowledgeBase

LucidWorks Search v2.5

PDF Versions

Older Versions

LucidWorks 2.1
LucidWorks 2.0
LucidWorks 1.8

This is the documentation for the LucidWorks Search v2.5, the latest release. Go here for LucidWorks 2.1.

Skip to end of metadata
Go to start of metadata

The Settings API allows for accessing and modifying settings for a given collection. Note that some of the settings listed below cannot be changed by customers with LucidWorks Search hosted on AWS or Azure.

API Entry Points

/api/collections/collection/settings: get all settings for a collection or update settings.

/api/collections/collection/settings/name: get a particular setting

Get All Settings for a Collection

GET /api/collections/collection/settings

Input

Path Parameters

Key Description
collection The collection name.

Query Parameters

None.

Output

Output Content

Key Type Description
auto_complete boolean Is true if auto-complete is enabled for use in the LucidWorks Search default search interface. Note that this also requires setting the auto-complete activity to run at regular intervals. For more information, see Auto-Complete of User Queries.
boosts Solr function query Defines the boost to apply to each query. The default boost for the Lucid Query Parser prefers more recent documents.
boost_recent boolean Is true if the lucid request handler should boost recent documents.
click_enabled boolean Is true if Click Scoring is enabled. If enabling this feature with NearRealTime (NRT) search (the update_handler_autosoftcommit_* parameters discussed below), please refer to the Click Scoring Relevance Framework section for more information about how Click Scoring and NRT impact document updates. This feature is available in LucidWorks Search on-premise only.
click_boost_data string The path to Click Scoring boost data. This feature is available in LucidWorks Search on-premise only.
click_boost_field string The field name prefix used by Click fields. This feature is available in LucidWorks Search on-premise only.
click_index_location string The path to Click boost index (LucidWorks Search on-premise only).
de_duplication string In LucidWorks Search, duplicates can be identified by calculating a hash that identifies very similar documents.

While this setting enables de-duplication generally, specific fields should be selected as being used for de-duplication, which can be done with the Fields API or the Field Configuration screen. If no fields are selected as being the basis for determining duplicate documents, then all fields of a document are used as the basis for judging duplicate documents.

You can choose from three possible methods of handling duplicates:
  • Off does not identify duplicate documents within the index. 
  • Tag identifies duplicates with a unique tag stored in the signatureField, but does not remove duplicate documents from the index. This approach is recommended, although it does require using field collapsing or another method to remove duplicates from the search results for users.
  • Overwrite overwrites duplicate documents with incoming documents. This should only be used if you are sure that the duplicate detection is working the way you expect.

    Note that de-duplication does not work properly in SolrCloud mode.
default_sort string Default sort method - valid values are: relevance, date, random.
display_facets boolean Is true if the LucidWorks Search default search interface should display facets.
display_fields string Defines the fields to use for display of results to users. Primarily used to add pseudo-fields to documents, but could be used with "real" fields also. This parameter only applies when using the lucid handler (query parser).
elevations JSON map Defines the documents to be elevated or excluded from results for a specific query. It uses Solr's QueryElevationComponent, which is enabled by default in LucidWorks Search. This API is an interface to manage the elevate.xml file, which stores the elevation definitions that are used for queries. The elevations file is located in the conf directory for each collection ($LWE_HOME/conf/solr/cores/collection/conf).

The structure of the elevate.xml file is an XML file defining the query and the IDs of the documents that are to be elevated or excluded. The API uses a JSON map to write to this file with a structure of:

{"elevations":
   {"query":
      [{"doc":"docID","exclude":false}]
   }
}

It is also possible to define elevations or exclusions using the built-in Search UI, which includes a "pin" or a "minus" next to every result to allow you to add it to the elevations list as either a required document or an excluded document.

The list of documents included or excluded is not synchronized with the index, which means that if a document is listed in the elevate.xml file, and then is removed from the index, it does not get removed from the file.

main_index_
lock_type
string Defines which Lucene LockFactory to use. When applying changes to an index the IndexWriter requires a lock on the directory. The options are:
  • single: using the SingleInstanceLockFactory. This is suggested for a read-only index or when there is no possibility of another process trying to modify the index.
  • native: using the NativeFSLockFactory. This uses the OS native file locking. It should not be used when there are multiple Solr web applications in the same JVM that are attempting to share a single index.
  • simple: using the SimpleFSLockFactory. This uses a plain lock file for locking.
    More information is available in the Lucene Wiki: http://wiki.apache.org/lucene-java/AvailableLockFactories.
main_index_
max_buffered_docs
integer Allows setting the maxBufferedDocs parameter in the solrconfig.xml file for the collection, which sets the number of document updates to buffer in memory before they are flushed to disk and added to the current index segment. It is generally preferred to use the main_index_ram_buffer_size_mb, but if both settings are defined, a flush will occur when either limit is reached.
main_index_
max_merge_docs
integer Allows setting the maxMergeDocs parameter in the solrconfig.xml file for the collection, which sets the maximum number of documents for a single segment. Once this limit is reached, the segment is closed and a new one is created. A segment merge, as defined by main_index_merge_factor may also occur at this time.
main_index_
merge_factor
integer Allows setting the mergeFactor parameter in the solrconfig.xml file for the collection, which defines how many segments the index is allowed to have before they are coalesced into one segment. When the index is updated, the new data is added to the most recently opened segment. When that segment is full, a new segment is created and subsequent updates are placed there (defining when a segment is full is done with the main_index_max_buffered_docs and main_index_ram_buffer_size_mb settings). When the the main_index_merge_factor is reached, the segments are merged into a single larger segment. See the section on mergeFactor in the Solr Reference Guide for more information.
main_index_
ram_buffer_size_mb
integer Allows setting the ramBufferSizeMb parameter in the solrconfig.xml file for the collection, which sets the amount of memory space (in megabytes) document updates can use before they are flushed to the current index segment. This setting is generally preferable to main_index_max_buffered_docs, but if both settings are defined, a flush will occur when either limit is reached.
main_index_
term_index_interval
integer Allows setting the TermIndexInterval for the index and determines the amount of computation required per query term, regardless of the number of documents. This allows some level of control over the time query processing takes. Large values cause less memory to be used by the IndexReader, but slows random-access to terms. Smaller values cause more memory to be used by the IndexReader, but will speed random-access to terms. A large index with user-entered queries may benefit from a larger main_index_term_index_interval because query processing is dominated by frequency and positional data processing and not by term lookup. A system that experiences a great deal of wildcard queries may benefit from a smaller value for this setting.
main_index_
use_compound_file
boolean Allows you to set the UseCompoundFile parameter in the solrconfig.xml file for the collection. Setting this to true combines the multiple index files on disk to a single file. This setting would help avoid hitting an open file limit on those systems which restrict the number of open files allowed per process. See the section on UseCompoundFile in the Solr Reference Guide for more information.
main_index_
write_lock_timeout
integer Defines the maximum time to wait for a write lock.
query_parser string Which query parser the lucid search request handler will use - valid values are: lucid, dismax, extended dismax, lucene.
query_time_stopwords boolean Is true if stopwords will be removed at query time.
query_time_synonyms boolean Is true if synonyms should be added to queries. This will only be used if the 'lucid' query parser is selected as the default or used in the query request.
search_server_list list:string A list of Solr core URLs that the lucid request handler will use for distributed search - pass an empty list to disable distributed search.
show_similar boolean Is true if a "Find Similar" link should be displayed next to user's search results.
spellcheck boolean Is true if the LucidWorks Search default search interface should suggest spelling corrections.
stopword_list list:string A list of stopwords that will be used if 'query_time_stopwords' is enabled.
synonym_list list:string A list of synonym rules that will be used if 'query_time_synonyms' is enabled.
unknown_type_handling string A valid field type from the core's schema to use for unrecognized fields - default is text_en.
unsupervised_feedback boolean Is true if unsupervised feedback is enabled
unsupervised_
feedback_emphasis
string Defines if unsupervised feedback should emphasize "relevancy" which does an "AND" of the original query which neither includes nor excludes additional documents, or "recall" which does an "OR" of the original query which permits the feedback terms to expand the set of documents matched - default is "relevancy".
update_handler_
autocommit_max_docs
integer Allows setting the maxDocs parameter for autocommit definitions in the solrconfig.xml file for the collection. This setting defines the number of documents to queue before pushing them to the index. It works in conjunction with the update_handler_autocommit_max_time parameter in that if either limit is reached, the pending updates will be pushed to the index.
update_handler_
autocommit_max_time
integer Allows setting the maxTime parameter for autocommit definitions in the solrconfig.xml file for the collection. This setting defines the number of milliseconds to wait before pushing documents to the index. It works in conjunction with the update_handler_autocommit_max_docs parameter in that if either limit is reached, the pending updates will be pushed to the index.
update_handler_
autocommit_open_searcher
boolean Provides the option to not open a searcher on hard commit. This may be useful to minimize the size of transaction logs that keep track of uncommitted updates. The default is true, change this to false to not open a searcher.
update_handler_
autosoftcommit_max_docs
integer Allows setting the maxDocs parameter for autosoftcommit definitions in the solrconfig.xml file for the collection. "Soft" commits are used in Solr's Near RealTime search. This setting defines the number of documents to queue before pushing them to the index. It works in conjunction with the {{update_handler_
autosoftcommit_max_time}} parameter in that if either limit is reached, the documents will be pushed to the index.
update_handler_
autosoftcommit_max_time
integer Allows setting the maxTime parameter for autosoftcommit definitions in the solrconfig.xml file for the collection. "Soft" commits are used in Solr's Near RealTime search. This setting defines the number of milliseconds to wait before pushing documents to the index. It works in conjunction with the update_handler_autosoftcommit_max_docs parameter in that if either limit is reached, the documents will be pushed to the index.
update_server_list complex A map that contains two keys: 'server_list' and 'self'. 'server_list' is list:string of servers that the lucid update chain will use for distributed updates and 'self' should either be null if this server will not receive updates, or it should be a string value containing this server address if this server will receive updates - pass an empty list of servers to disable distributed update.

Response Codes

200: OK

Examples

Get the existing settings for the collection:

Input

curl http://localhost:8888/api/collections/collection1/settings

Output

{
    "auto_complete": true,
    "boost_recent": true,
    "boosts": [
        "recip(rord(lastModified),1,1000,1000)"
    ],
    "click_boost_data": "click-data",
    "click_boost_field": "click",
    "click_enabled": false,
    "de_duplication": "off",
    "default_sort": "relevance",
    "display_facets": true,
    "display_fields": [
        "id","url","author","data_source_type","lastModified",
        "mimeType","pageCount","title"],
    "elevations": {},
    "main_index_lock_type": "native",
    "main_index_max_buffered_docs": -1,
    "main_index_max_merge_docs": 2147483647,
    "main_index_merge_factor": 10,
    "main_index_ram_buffer_size_mb": 64.0,
    "main_index_term_index_interval": 32,
    "main_index_use_compound_file": false,
    "main_index_write_lock_timeout": 1000,
    "query_parser": "lucid",
    "query_time_stopwords": true,
    "query_time_synonyms": true,
    "search_server_list": [],
    "show_similar": true,
    "spellcheck": true,
    "stopword_list": [
        "a","an","and","are","as","at","be","but","by","for","if","in","into",
        "is","it","no","not","of","on","or","s","such","t","that","the",
        "their","then","there","these","they","this","to","was","will","with"],
    "synonym_list": [
        "lawyer, attorney","one, 1","two, 2","three, 3","ten, 10",
        "hundred, 100","thousand, 1000","tv, television"],
    "unknown_type_handling": "text_en",
    "unsupervised_feedback": false,
    "unsupervised_feedback_emphasis": "relevancy",
    "update_handler_autocommit_max_docs": null,
    "update_handler_autocommit_max_time": 3600000,
    "update_handler_autocommit_open_searcher": true,
    "update_handler_autosoftcommit_max_docs": null,
    "update_handler_autosoftcommit_max_time": null,
    "update_server_list": null
}

Back to Top

Get a Particular Setting

GET /api/collections/collection/settings/name

Returns a map of settings to values for a given setting.

Input

Path Parameters

Key Description
collection The collection name.
name The name of the setting to return.

Query Parameters

None.

Output

Return Codes

200: OK

Examples

Determine the default parser for the collection.

Input

curl 'http://localhost:8888/api/collections/collection1/settings/query_parser'

Output:

{
   "query_parser":"lucid",
}

Back to Top

Update Settings

PUT /api/collections/collection/settings

Input

Path Parameters

Key Description
collection The collection name.

Query Parameters

None

Input Content

JSON block with values for keys to be updated.

Key Type Description
auto_complete boolean Is true if auto-complete is enabled for use in the LucidWorks Search default search interface. Note that this also requires setting the auto-complete activity to run at regular intervals. For more information, see Auto-Complete of User Queries.
boosts Solr function query Defines the boost to apply to each query. The default boost for the Lucid Query Parser prefers more recent documents.
boost_recent boolean Is true if the lucid request handler should boost recent documents.
click_enabled boolean Is true if Click is enabled (LucidWorks Search on-premise only).
click_boost_data string The path to Click boost data (LucidWorks Search on-premise only).
click_boost_field string The field name prefix used by Click fields (LucidWorks Search on-premise only).
click_index_location string The path to Click boost index (LucidWorks Search on-premise only).
de_duplication string The valid values are: off, do not de-duplicate; overwrite duplicate documents; tag duplicated with a unique signature. Note that de-duplication does not work properly in SolrCloud mode.
default_sort string Default sort method - valid values are: relevance, date, random.
display_facets boolean Is true if the LucidWorks Search default search interface should display facets.
display_fields string Defines the fields to use for display of results to users. Primarily used to add pseudo-fields to documents, but could be used with "real" fields also. This parameter only applies when using the lucid handler (query parser).
elevations JSON map Defines the documents to be elevated or excluded from results for a specific query. It uses Solr's QueryElevationComponent, which is enabled by default in LucidWorks. This API is an interface to manage the elevate.xml file, which stores the elevation definitions that are used for queries. The elevations file is located in the conf directory for each collection ($LWE_HOME/conf/solr/cores/collection/conf)
The structure of the elevate.xml file is an XML file defining the query and the IDs of the documents that are to be elevated or excluded. The API uses a JSON map to write to this file with a structure of:

{"elevations":
   {"query":
      [{"doc":"docID", "exclude":true}]
   }
}


The exclude attribute allows using the QueryElevationComponent to explicitly omit certain documents for specific queries. The default is false, so if this attribute is not defined for a document, the document will be elevated to the top of the result set for the specified query. Multiple documents elevated for a single query are elevated in the order they are listed in the elevate.xml file. Set exclude to true to exclude the document from the specified query.

It is also possible to define elevations or exclusions using the built-in Search UI, which includes a "pin" or a "minus" next to every result to allow you to add it to the elevations list as either a required document or an excluded document.

The list of documents included or excluded is not synchronized with the index, which means that if a document is listed in the elevate.xml file, and then is removed from the index, it does not get removed from the file.

main_index_
lock_type
string Defines which Lucene LockFactory to use. When applying changes to an index the IndexWriter requires a lock on the directory. The options are:
  • single: using the SingleInstanceLockFactory. This is suggested for a read-only index or when there is no possibility of another process trying to modify the index.
  • native: using the NativeFSLockFactory. This uses the OS native file locking. It should not be used when there are multiple Solr web applications in the same JVM that are attempting to share a single index.
  • simple: using the SimpleFSLockFactory. This uses a plain lock file for locking.
    More information is available in the Lucene Wiki: http://wiki.apache.org/lucene-java/AvailableLockFactories.
main_index_
max_buffered_docs
integer Allows setting the maxBufferedDocs parameter in the solrconfig.xml file for the collection, which sets the number of document updates to buffer in memory before they are flushed to disk and added to the current index segment. It is generally preferred to use the main_index_ram_buffer_size_mb, but if both settings are defined, a flush will occur when either limit is reached.
main_index_
max_merge_docs
integer Allows setting the maxMergeDocs parameter in the solrconfig.xml file for the collection, which sets the maximum number of documents for a single segment. Once this limit is reached, the segment is closed and a new one is created. A segment merge, as defined by main_index_merge_factor may also occur at this time.
main_index_
merge_factor
integer Allows setting the mergeFactor parameter in the solrconfig.xml file for the collection, which defines how many segments the index is allowed to have before they are coalesced into one segment. When the index is updated, the new data is added to the most recently opened segment. When that segment is full, a new segment is created and subsequent updates are placed there (defining when a segment is full is done with the main_index_max_buffered_docs and main_index_ram_buffer_size_mb settings). When the the main_index_merge_factor is reached, the segments are merged into a single larger segment. See the section on mergeFactor in the Solr Reference Guide for more information.
main_index_
ram_buffer_size_mb
integer Allows setting the ramBufferSizeMb parameter in the solrconfig.xml file for the collection, which sets the amount of memory space (in megabytes) document updates can use before they are flushed to the current index segment. This setting is generally preferable to main_index_max_buffered_docs, but if both settings are defined, a flush will occur when either limit is reached.
main_index_
term_index_interval
integer Allows setting the TermIndexInterval for the index and determines the amount of computation required per query term, regardless of the number of documents. This allows some level of control over the time query processing takes. Large values cause less memory to be used by the IndexReader, but slows random-access to terms. Smaller values cause more memory to be used by the IndexReader, but will speed random-access to terms. A large index with user-entered queries may benefit from a larger main_index_term_index_interval because query processing is dominated by frequency and positional data processing and not by term lookup. A system that experiences a great deal of wildcard queries may benefit from a smaller value for this setting.
main_index_
use_compound_file
boolean Allows you to set the UseCompoundFile parameter in the solrconfig.xml file for the collection. Setting this to true combines the multiple index files on disk to a single file. This setting would help avoid hitting an open file limit on those systems which restrict the number of open files allowed per process. See the section on UseCompoundFile in the Solr Reference Guide for more information.
main_index_
write_lock_timeout
integer Defines the maximum time to wait for a write lock.
query_parser string Which query parser the lucid search request handler will use - valid values are: lucid, dismax, extended dismax, lucene.
query_time_stopwords boolean Is true if stopwords will be removed at query time.
query_time_synonyms boolean Is true if synonyms should be added to queries. This will only be used if the 'lucid' query parser is selected as the default or used in the query request.
search_server_list list:string A list of Solr core URLs that the lucid request handler will use for distributed search - pass an empty list to disable distributed search.
show_similar boolean Is true if a "Find Similar" link should be displayed next to user's search results.
spellcheck boolean Is true if the LucidWorks Search default search interface should suggest spelling corrections.
stopword_list list:string A list of stopwords that will be used if 'query_time_stopwords' is enabled.
synonym_list list:string A list of synonym rules that will be used if 'query_time_synonyms' is enabled.
unsupervised_feedback boolean Is true if unsupervised feedback is enabled
unsupervised_
feedback_emphasis
string Defines if unsupervised feedback should emphasize "relevancy" which does an "AND" of the original query which neither includes nor excludes additional documents, or "recall" which does an "OR" of the original query which permits the feedback terms to expand the set of documents matched - default is "relevancy".
unknown_type_handling string A valid field type from the core's schema to use for unrecognized fields - default is text_en.
update_handler_
autocommit_max_docs
integer Allows setting the maxDocs parameter for autocommit definitions in the solrconfig.xml file for the collection. This setting defines the number of documents to queue before pushing them to the index. It works in conjunction with the update_handler_autocommit_max_time parameter in that if either limit is reached, the pending updates will be pushed to the index.
update_handler_
autocommit_max_time
integer Allows setting the maxTime parameter for autocommit definitions in the solrconfig.xml file for the collection. This setting defines the number of milliseconds to wait before pushing documents to the index. It works in conjunction with the update_handler_autocommit_max_docs parameter in that if either limit is reached, the pending updates will be pushed to the index.
update_handler_
autocommit_open_searcher
boolean Provides the option to not open a searcher on hard commit. This may be useful to minimize the size of transaction logs that keep track of uncommitted updates. The default is true, change this to false to not open a searcher.
update_handler_
autosoftcommit_max_docs
integer Allows setting the maxDocs parameter for autosoftcommit definitions in the solrconfig.xml file for the collection. "Soft" commits are used in Solr's Near RealTime searching. This setting defines the number of documents to queue before pushing them to the index. It works in conjunction with the {{update_handler_
autosoftcommit_max_time}} parameter in that if either limit is reached, the documents will be pushed to the index.
update_handler_
autosoftcommit_max_time
integer Allows setting the maxDocs parameter for autosoftcommit definitions in the solrconfig.xml file for the collection. "Soft" commits are used in Solr's Near RealTime searching. This setting defines the number of milliseconds to wait before pushing documents to the index. It works in conjunction with the update_handler_autosoftcommit_max_docs parameter in that if either limit is reached, the documents will be pushed to the index.
update_server_list complex A map that contains two keys: 'server_list' and 'self'. 'server_list' is list:string of servers that the lucid update chain will use for distributed updates and 'self' should either be null if this server will not receive updates, or it should be a string value containing this server address if this server will receive updates - pass an empty list of servers to disable distributed update.

Output

Output Content

None.

Return Codes

204: No Content

Examples

Turn on spell-checking for the collection.

Input

curl -X PUT -H 'Content-type: application/json'
-d '{"spellcheck":true}'
http://localhost:8888/api/collections/collection1/settings

Output

None. Check properties to confirm changes.

Back to Top

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.