The Data Sources Overview page shows all configured data sources and allows you to configure new data sources of several types. Information for each data source includes the name, how many documents are in the index from that source, the current status and length of the most recent crawl (time and duration). Clicking on the data source name show the Data Source Details screen. Start Crawl or Stop Crawl will initiate a crawl or end one that's already running. Delete will remove the data source and any indexed documents.
Several statuses may be displayed:
- Finished: the crawl was allowed to run to completion and is now done.
- Running: the crawl is in process.
- Stopped: the crawl was aborted before finishing.
- Error: the crawl encountered a fatal error before finishing completely.
When working with data sources and their content, it helps to understand how content is handled during the initial crawl and in subsequent re-crawls to update the index with new, updated, or removed content. Most of the crawlers keep track of documents that have been "seen" before which helps speed later crawls by not processing unchanged content, but it can be confusing if the configuration settings change between crawls.
In some cases, you may need to remove the crawl history in order to get the results you want; an example of this would be the "Add failed docs" setting in the "Edit Mapping" screen - if it is not set for the initial crawl of a repository, it will be skipped on subsequent crawls unless it has been modified in some way. Other examples include (but aren't limited to) settings to map fields from the incoming documents to another field, options to add LucidWorks Search-specific fields to the documents, as well as changes to fields themselves and any dynamic field rules. If making changes to a data source configuration after content has already been crawled and indexed, review the options in the section on Reindexing Content for possible approaches.
Information about the settings for each data source is available in the following data source specific Help pages: