Support Resources

LucidWorks Forum
KnowledgeBase

LucidWorks Platform v2.0

PDF Version

Older Versions

LWE Guide 1.8
LWE Guide 1.7
LWE Guide 1.6

This is the documentation for LucidWorks Platform v2.0, the latest release is v2.1.

Skip to end of metadata
Go to start of metadata

Hyphenated terms, such as plug-in or CD-ROM, are indexed without their hyphens, both as a sequence of sub-words and as a single, combined term which is the catenation of the sub-words. That combined term is stored at the position of the final sub-word. Users authoring documents are not always consistent on whether they use the hyphens or not, but the goal of the Lucid query parser is to be able to match either given a query of either. To do this as well as possible, the Lucid query parser will expand any hyphenated term into a Boolean OR of the sub-words as a phrase and the combined term.

h3. Simple Hyphenated Terms

A query of plug-in will automatically be interpreted as ("plug in" OR plugin). If we have these mini-documents:
* Doc #1: This is a plugin.
* Doc #2: This is the plug-in.
* Doc #3: Where is my plug in?

The query will match all three documents.

A query of plugin will only match the first two documents, but that is a limitation of this heuristic feature. The query results are better than without this feature even if they are still not ideal.

h3. Hyphenated Terms within Quoted Phrases

Quoted phrases may contain any number of hyphenated terms, in which case the Lucene "span query" feature is used for the entire phrase as well as the individual hyphenated terms which are expanded as above.

A query of:
* "buy a cd-rom with plug-in software"

would match any of the following mini-documents:
* Doc #1: I want to buy a cdrom with plugin software
* Doc #2: I want to buy a cdrom with plug-in software
* Doc #3: I want to buy a cd-rom with plugin software
* Doc #4: I want to buy a cd-rom with plug-in software

In terms of the new proximity operators, this query is equivalent to:
* buy a before:0 cd-rom before:0 with before:0 plug-in software

which is equivalent to:

  • buy a before:0 ("cd rom" or cdrom) before:0 with before:0 ("plug in" or plugin) before:0 software

Multiple Hyphens in Terms

Some hyphenated terms have more than two sub-words. For example:

  • on-the-run and never-to-be-forgotten

will be interpreted as:

  • ("on the run" OR ontherun) and ("never to be forgotten" OR nevertobeforgotten)

Multiple hyphens occur in various special formats, such as phone numbers. For example:

  • 646-414-1593 1-800-555-1212

which will be interpreted as:

  • ("646 414 1593" OR 6464141593) AND ("1 800 555 1212" OR 18005551212)

Social Security numbers and ISBNs also have multiple hyphens. For example,

  • 101-23-1234 and 978-3-16-148410-0

will be interpreted as:

  • ("101 23 1234" OR 101231234) and ("978 3 16 148410 0" OR 9783161484100)

Part numbers and various ID formats also tend to contain more than one hyphen. These would be treated similarly to the examples above.

Labels

lwdg lwdg Delete
copy copy Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.