Support Resources

LucidWorks Forum
KnowledgeBase

LucidWorks Platform v2.0

PDF Version

Older Versions

LWE Guide 1.8
LWE Guide 1.7
LWE Guide 1.6

This is the documentation for LucidWorks Platform v2.0, the latest release is v2.1.

Skip to end of metadata
Go to start of metadata

When LucidWorks Platform crawls a data source, it extracts the target data and stores it in fields in the index. The specific mapping from the source data to the indexed fields is determined by the crawler you are using, which is in turn determined by the data source type. For a list of file types supported by LWE, see Supported Filetypes. Let us consider two common file types, both processed by the Aperture crawler: a website and a Microsoft Word document.

For the website, consider a case where you have crawled http://lucidimagination.com with a crawl depth of zero, which means that only the first page is indexed. The Aperture crawler maps the web page as follows (note that this example is not complete or exhaustive):

Data Source Field Mapping Field Content
url url http://lucidimagination.com
content-type mimeType html/text
title title The Company for Apache Lucene Solr Open Source Search | Lucid Imagination
body body The Future Of Search

And so on.

For the Microsoft Word document, consider this document, included here in its entirety:

Data Source Field Mapping Field Content
mimetype mimeType application/vnd.openxmlformats-officedocument.wordprocessingml
title title Example Word Doc
author author Drew Wheeler
body body This Is The Heading This is some text. It is very interesting.

For information on which crawlers handle which data source types, see the Data Source REST API documentation.  If using the Admin UI, you don't need to worry about the crawler type.  The UI also includes screens for modifying how documents are mapped to fields, or the Data Sources API can be used. For more information on fields in LucidWorks Enterprise, see the Table of Fields in the section Customizing the Schema.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.