A proximity query searches for terms that are either near each other or occur in a specified order in a document rather than simply whether they occur in a document or not.
Exact phrase matching is a powerful query tool, but frequently the phrasing used in relevant documents is not exactly the same. It is commonly the case that there are extra terms, or the terms may be in another order. In other cases, the phrase terms may be relatively near, with quite a few extra words between them. For example, the following two queries may return different results even though they are semantically equivalent:
"development of teams"
The difference between the two is an extra word in the middle and a reversal of the two key terms.
We can write a single phrase proximity query that will match both phrases:
The tilde ("~") is used after a quoted phrase to indicate a phrase proximity search. It is followed by an integer (whole number) which is the maximum editing distance for phrases that will match the query phrase. The editing distance treats each term as a single unit and measures how many unit terms need to be moved to translate from one phrase to another. In this case, it takes one unit to move "team" to "of", a second unit to move it to "development", and a third unit to move it before "development".
To query for two terms that are within 50 words of each other:
To query a person's name and allow for an optional middle initial:
"John Doe"~1 matches "John Doe" and "John Q. Doe"
To query a person's name and allow for both first name first or last name first:
"John Doe"~2 matches "John Doe" and "Doe, John", as well as "John Q. Doe"
The Lucid query parser also supports advanced proximity query operators to specify more elaborate sequences of terms and to control the order of terms and how many intervening terms are permitted. The advanced proximity operator keywords are:
|Advanced Operator||Sample Query||Matches|
|NEAR||x near y||Documents containing "x" within 15 terms of "y", either before or after|
|BEFORE||x before y||Documents containing the term "x" no more than 15 terms before the term "y"|
|AFTER||x after y||Documents containing the term "x" no more than 15 terms before the term "y"|
These operators are case insensitive and may be upper, lower, or mixed case, unless the opUp configuration setting is set to "true", which would then treat them (and all other operator keywords) as normal terms unless they are entirely upper case.
By default, the distance between the two terms of a proximity operator can be up to 15 additional terms. That default distance is controlled by the nearSlop configuration setting. But if you need more or fewer intervening terms for a specific proximity operator, you can specify the desired limit of intervening terms by writing a colon (":") and the number immediately after the operator name. For example,
x before:3 y
matches documents containing "x" no more than three terms before "y".
A distance of 0 (zero) means no intervening terms. For example,
x before:0 y
is the same as:
which matches documents where the terms are adjacent and in that order.
The advanced proximity operators can be composed (or "daisy-chained") to match more complex term sequences. For example:
x before y before z
matches documents containing "x" before "y" with no more than 15 intervening terms and followed by "z" with no more than 15 intervening terms after "y".
The distance limit can be controlled for each proximity operator, such as:
x before:10 y before:100 z
which requires that there be no more than 10 terms between "x" and "y", but "z" can be up to 100 terms after "y".
Any combination of any number of NEAR, BEFORE, and AFTER proximity operators can be composed into a sequence, such as
cat near dog before:50 fox after fish near:3 bat before zebra
When multiple advanced proximity operators are composed, they are evaluated left to right, except as parentheses are used to explicitly specify the evaluation order. So, the previous example is evaluated as:
(x before y) before z
In fact, the evaluation order does not matter in that example, which could also be written as:
x before (y before z)
But evaluation order does matter with:
x near:3 (y before:50 z)
where the intent is that "x" could be shortly before or after either end of the "y"/"z" sequence. But,
x near:3 y before:50 z
would evaluate as:
(x near:3 y) before:50 z
which would match "x" close to "y" but not close to "z".
Within parentheses used for operands of proximity operators, only the OR and proximity operators can be used. Other operators will be treated as if they were the OR operator.
Quoted phrases with any number of terms can be used as the operands of the proximity operators. For example,
"First step" before:200 "last step"
The terms in the quoted phrase must occur in order, with no intervening terms between the quoted terms.
Quoted phrases may specify a maximum number of terms that may appear between the terms of the phrase, using the usual quoted phrase proximity query notation of a tilde ("~") and the number of terms permitted. For example,
"proposal development"~3 near:50 project
Would match the terms "proposal" and "development" (in that order) with no more than three intervening terms and occurring no more than 50 terms before or after "project".
Unlike normal quoted proximity phrases, the phrase terms are expected to occur in order. So, this example will not match "development proposal...project".
When several different terms are permitted at a position in a proximity sequence, the alternative terms can be specified using the OR operator and parentheses for either or both terms of the operator. For example,
(cd-rom or dvd) before:1 drive
would match documents with the term "drive" preceded by either "cd-rom" or "dvd" with at most one intervening term. Alternatives can also be used with composed proximity operators. For example,
(cd-rom or dvd) before:1 ((built-in or external) before:0 drive)
which requires "built-in" or "external" to immediately precede "drive", but an intervening term is permitted after "cd-rom" or "dvd".
Alternatives can also be quoted phrases. For example, ("In the beginning" or "At the start" or "Starting out") before:1000 "the end" will match documents containing the phrase "the end" preceded by either the phrase "In the beginning", "At the start", or "Starting out" with up to 1,000 intervening terms.
A phrase that is not enclosed within quotes is known as a term list and may be used as either of the operands of a proximity operator, where it will be treated as if it were a quoted phrase. For example,
pets before animal judgments before book
will match the same documents as:
pets before "animal judgments" before book
Although term lists with proximity operators may seem like a mere convenience to avoid typing the quotes around a phrase, the construct is much more powerful. Each of the terms in a proximity term list can be one of:
- Single term (but no wildcard or fuzzy term)
- Quoted phrase
- Parentheses enclosing:
- One or more proximity operators (evaluated left to right)
- Another term list
- List of term alternatives separated by OR operators. Each term alternative can be a full proximity expression, including nested parentheses.
red (light or sign) picture near street
would be equivalent to:
("red light picture" or "red sign picture") near street
which could also be written using nested term lists:
(red light picture or red sign picture) near street
which is also equivalent to:
((red light picture) or (red sign picture)) near street
Although field names can be used for terms within a proximity expression, only the first field name is used and the others are ignored since an entire proximity expression is evaluated within only a single field.
title:x after (body:y near author:z)
is evaluated as:
title: x after (y near z)
A proximity query with no field or the DEFAULT field will query against all of the fields listed in the qf (query fields) request parameter. The proximity query will be evaluated against each field in turn and the results combined with the disjunction max query operation. But, that will still evaluate the full proximity query expression on only one field at a time.
Multiple proximity expressions, each with its own field, can be used within a single query simply by combining them with the AND, OR, or NOT boolean operators. The precedence of the boolean operators is such that entire proximity expressions will be evaluated before the surrounding boolean operators. So,
title: red before light or body: empty before tank
would evaluate as:
(title: red before light) or (body: empty before tank)
The AND operator can be used to require a set of proximity queries to be satisfied, such as:
(title: red before blue) and (body: night after day) and (town near city)
where "red" must occur before "blue" in the title field, "night" must occur after "day" in the body field, and "town" must occur near "city" in any field.