Support Resources

LucidWorks Forum
KnowledgeBase

LucidWorks Platform v2.0

PDF Version

Older Versions

LWE Guide 1.8
LWE Guide 1.7
LWE Guide 1.6

This is the documentation for LucidWorks Platform v2.0, the latest release is v2.1.

Skip to end of metadata
Go to start of metadata

Errors during crawling will be recorded in the core.<date>.log file. You can find the core.<date>.log file in the $LWE_HOME/data/logs directory. Serious exceptions will be reported to the LucidWorksLogs collection, which you can search as you can any other collection. You can also view log events on the Server Log page (Status -> Server Log).

Documents may be skipped because there is not an extractor available for that file type, or because the file size exceeds the maximum set during crawl configuration. Skipped documents will not be recorded in the LucidWorksLogs collection. These would be found in the log file with a format like this:

INFO filesystem.FileSystemCrawler - File <file-URL> exceeds the maximum size specified for this data source. Skipping.

WARN No extractor for <file format>; Skipping: <document-URI>

Possible Errors

With each of the errors below, the exact cause cannot be determined. This information is provided to help you find the errors in the log file; precise troubleshooting requires information about the documents and system environment. If a document causes an error (besides being too large or the system being out of memory), it may be helpful to try to isolate it and try again to be sure it is the document causing the problem and not some other system error that may have occurred at the same time.

In each of the errors below, the document URI will be listed. For files this will be the path and filename, for websites it would be the URL; for other data sources it will be whatever you have assigned as the document URI when the data source was configured.

Exception

WARN Exception while crawling: <document-URI> <exception-with-stack-trace>
WARN Doc failed: <exception-with-stack-trace>
WARN Doc failed: <document-URI> - cause: <exception-cause-message>

PDF files are notorious for causing exceptions in their processing, but that is primarily for file system crawls.

Out of memory

WARN File caused an Out of Memory Exception, skipping: <document-URI> <exception-with-stack-trace>
WARN Doc failed: <exception-with-stack-trace>
WARN Doc failed: <document-URI> - cause: <OOM-exception-message>

SubCrawlerException

WARN Doc failed: <exception-with-stack-trace>
WARN Doc failed: <document-URI> - cause: <exception-message>

Unknown file type

WARN Doc failed: Could not find extractor: <document-URI>

In this case, this warning will be seen in the logs but will not be reported in the LucidWorksLogs collection.

I/O error

WARN IO Exception processing: <document-URI> <exception-with-stack-trace>
WARN Doc failed: <exception-with-stack-trace>
WARN Doc failed: <document-URI> - cause: <exception-message>

HTML/XML/XHTML parsing errors

WARN Doc failed: <exception-with-stack-trace>
WARN Doc failed: <document-URI> - cause: <exception-cause-message>

This is another case where a warning will be seen in the logs but will not be reported in the LucidWorksLogs collection.

Labels

lwdg lwdg Delete
copy copy Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.