The Indexing Settings screen allows you to configure a few general options to fine-tune your index. This screen also allows you to schedule, start, and stop index-related activities.
Several options are available at the top of the page. These settings can also be modified with the Settings API, and the related API attribute names are provided below.
| Parameter | Settings API Attribute name | Description |
|---|---|---|
| De-duplication | de_duplication | In LucidWorks Search, duplicates can be identified by calculating a hash that identifies very similar documents. While this setting enables de-duplication generally, specific fields should be selected as being used for de-duplication, which can be done on the Field Configuration screen or with the Fields API. If no fields are selected as being the basis for determining duplicate documents, then all fields of a document are used as the basis for judging duplicate documents You can choose from three possible methods of handling duplicates:
|
| Default field type | unknown_type_handling | If fields are found in documents that do not correspond with defined LucidWorks fields, this setting assigns a default field type in order to parse text found in that field. |
| Auto-commit max docs | update_handler_autocommit_max_docs | This setting defines the number of documents to queue before pushing them to the index, also known as the the maxDocs parameter for autocommit definitions in the solrconfig.xml file for the collection. It works in conjunction with the "Auto-commit max time" parameter in that if either limit is reached, the pending updates will be pushed to the index. |
| Auto-commit max time (ms) | update_handler_autocommit_max_time | This setting defines the number of milliseconds to wait before pushing documents to the index, also known as the maxTime parameter for autocommit definitions in the solrconfig.xml file for the collection. It works in conjunction with the "Auto-commit max docs" parameter in that if either limit is reached, the pending updates will be pushed to the index. |
| Auto-soft-commit max docs | update_handler_autosoftcommit_max_docs | This setting defines the number of documents to queue before performing a "soft commit", used with Solr's NearRealTime searching, and pushing the documents to the index. This setting is also known as maxDocs parameter for autosoftcommit definitions in the solrconfig.xml file for the collection. It works in conjunction with the "Auto-soft-commit max time" parameter in that if either limit is reached, the documents will be pushed to the index. |
| Auto-soft-commit max time (ms) | update_handler_autosoftcommit_max_time | This setting defines the number of milliseconds to wait before performing a "soft commit", used with Solr's NearRealTime searching, and pushing the documents to the index. This setting is also known as maxTime parameter for autosoftcommit definitions in the solrconfig.xml file for the collection. It works in conjunction with the "Auto-soft-commit max docs" parameter in that if either limit is reached, the documents will be pushed to the index. |
The second half of the page allows configuration and monitoring of some essential system processes, called Activities. Next to each process name, the status of the process is shown, along with when it was last run and how long the run took. If a schedule has been set for the activity, that will be shown, and the Edit button will allow setting a new schedule or editing an existing one.
You can schedule these activities:
| Activity | Description |
|---|---|
| Optimize index | Optimizes the internal Apache Lucene data structures for better performance in searching. Optimizing a large index can take a long time, so it should be done judiciously based on when indexing completes. |
| Process click logs | Processes the click.log to create a file for Click Scoring to use in relevancy ranking calculations. |
| Generate Auto-Complete index | Creates the index required for implementing automatic suggestions for user queries as they type. |
| Times shown in the drop-down menu will be saved in GMT, but will display in the Index-Settings page in the time zone configured for each user (so, if different users have different time zones configured via the User Management screen, the schedule for the activity will be shown in their local time. |
These activities can also be scheduled using the Activities API.