|
Before Solr 1.3, the Standard Request Handler called the standard query parser as the default query parser. In versions since Solr 1.3, the Standard Request Handler calls the DisMax query parser as the default query parser. You can configure Solr to call the standard query parser instead, if you like. The advantage of the standard query parser is that it enables users to specify very precise queries. The disadvantage is that it is less tolerant of syntax errors than the DisMax query parser. The DisMax query parser is designed to throw as few errors as possible. |
Topics covered in this section:
|
Standard Query Parser Parameters
In addition to the Common Query Parameters, Faceting Parameters, Highlighting Parameters, and MoreLikeThis Parameters, the standard query parser supports the parameters described in the table below.
| Parameter | Description |
|---|---|
| q | Defines a query using standard query syntax. This parameter is mandatory. |
| q.op | Specifies the default operator for query expressions, overriding the default operator specified in the schema.xml file. Possible values are "AND" or "OR". |
| df | Specifies a default field, overriding the definition of a default field in the schema.xml file. |
Default parameter values are specified in solrconfig.xml, or overridden by query-time values in the request.
The Standard Query Parser's Response
By default, the response from the standard query parser contains one <result> block, which is unnamed. If the debugQuery parameter is used, then an additional <lst> block will be returned, using the name "debug". This will contain useful debugging info, including the original query string, the parsed query string, and explain info for each document in the <result> block. If the explainOther parameter is also used, then additional explain info will be provided for all the documents matching that query.
Sample Responses
This section presents examples of responses from the standard query parser.
The URL below submits a simple query and requests the XML Response Writer to use indentation to make the XML response more readable.
http://yourhost.tld:9999/solr/select?q=id:SP2514N&version=2.1&indent=1
Results:
<?xml version="1.0" encoding="UTF-8"?> <response> <responseHeader><status>0</status><QTime>1</QTime></responseHeader> <result numFound="1" start="0"> <doc> <arr name="cat"><str>electronics</str><str>hard drive</str></arr> <arr name="features"><str>7200RPM, 8MB cache, IDE Ultra ATA-133</str> <str>NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</str></arr> <str name="id">SP2514N</str> <bool name="inStock">true</bool> <str name="manu">Samsung Electronics Co. Ltd.</str> <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str> <int name="popularity">6</int> <float name="price">92.0</float> <str name="sku">SP2514N</str> </doc> </result> </response>
Here's an example of a query with a limited field list.
http://yourhost.tld:9999/solr/select?q=id:SP2514N&version=2.1&indent=1&fl=id+name
Results:
<?xml version="1.0" encoding="UTF-8"?> <response> <responseHeader><status>0</status><QTime>2</QTime></responseHeader> <result numFound="1" start="0"> <doc> <str name="id">SP2514N</str> <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str> </doc> </result> </response>
Specifying Terms for the Standard Query Parser
A query to the standard query parser is broken up into terms and operators. There are two types of terms: single terms and phrases.
- A single term is a single word such as "test" or "hello"
- A phrase is a group of words surrounded by double quotes such as "hello dolly"
Multiple terms can be combined together with Boolean operators to form more complex queries (as described below).
| It is important that the analyzer used for queries parses terms and phrases in a way that is consistent with the way the analyzer used for indexing parses terms and phrases; otherwise, searches may produce unexpected results. |
Term Modifiers
Solr supports a variety of term modifiers that add flexibility or precision, as needed, to searches. These modifiers include wildcard characters, characters for making a search "fuzzy" or more general, and so on. The sections below describe these modifiers in detail.
Wildcard Searches
Solr's standard query parser supports single and multiple character wildcard searches within single terms. Wildcard characters can be applied to single terms, but not to search phrases.
| Wildcard Search Type | Special Character | Example |
|---|---|---|
| Single character (matches a single character) | ? | The search string te?t would match both test and text. |
| Multiple characters (matches zero or more sequential characters) | * | The wildcard search: tes* would match test, testing, and tester. You can also use wildcard characters in the middle of a term. For example: te*t would match test and text. *est would match pest and test. |
| As of Solr 1.4, you can use a * or ? symbol as the first character of a search with the standard query parser. |
Fuzzy Searches
Solr's standard query parser supports fuzzy searches based on the Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term. For example, to search for a term similar in spelling to "roam," use the fuzzy search:
roam~
This search will match terms like foam and roams. It will also match the word "roam" itself.
An optional, additional parameter specifies the degree of similarity required for a match in a fuzzy search. The value must be between 0 and 1. When set closer to 1, the optional parameter causes only terms with a higher similarity to be matched. For example, the search below requires a high degree of similarity to the term "roam" in order for Solr to return a match:
roam~0.8
If this numerical parameter is omitted, Lucene performs the search as though the parameter were set to 0.5. The sample query above is not very scalable. Upon parsing this query will check the quasi-edit distance for every term in the index. As a result, this query is practical for only very small indexes.
| In many cases, stemming (reducing terms to a common stem) can produce similar effects to fuzzy searches and wildcard searches. |
Proximity Searches
A proximity search looks for terms that are within a specific distance from one another.
To perform a proximity search, add the tilde character ~ and a numeric value to the end of a search phrase. For example, to search for a "apache" and "jakarta" within 10 words of each other in a document, use the search:
"jakarta apache"~10
The distance referred to here is the number of term movements needed to match the specified phrase. In the example above, if "apache" and "jakarta" were 10 spaces apart in a field, but "apache" appeared before "jakarta", more than 10 term movements would be required to move the terms together and position "apache" to the right of "jakarta" with a space in between.
Range Searches
A range search specifies a range of values for a field (a range with an upper bound and a lower bound). The query matches documents whose values for the specified field or fields fall within the range. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically, except on numeric fields. For example, the range query below matches all documents whose mod_date field has a value between 20020101 and 20030101, inclusive.
mod_date:[20020101 TO 20030101]
Range queries are not limited to date fields or even numerical fields. You could also use range queries with non-date fields:
title:{Aida TO Carmen}
This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.
The brackets around a query determine its inclusiveness.
- Square brackets [ ] denote an inclusive range query that matches values including the upper and lower bound.
- Curly brackets { } denote an exclusive range query that matches values between the upper and lower bounds, but excluding the upper and lower bounds themselves.
Boosting a Term with ^
Lucene/Solr provides the relevance level of matching documents based on the terms found. To boost a term use the caret symbol ^ with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for
"jakarta apache" and you want the term "jakarta" to be more relevant, you can boost it by adding the ^ symbol along with the boost factor immediately after the term. For example, you could type:
jakarta^4 apache
This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:
"jakarta apache"^4 "Apache Lucene"
By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, it could be 0.2).
Specifying Fields in a Query to the Standard Query Parser
Data indexed in Solr is organized in fields, which are defined in the Solr schema.xml file. Searches can take advantage of fields to add precision to queries. For example, you can search for a term only in a specific field, such as a title field.
The schema.xml file defines one field as a default field. If you do not specify a field in a query, Solr searches only the default field. Alternatively, you can specify a different field or a combination of fields in a query.
To specify a field, type the field name followed by a colon ":" and then the term you are searching for within the field.
For example, suppose an index contains two fields, title and text,and that text is the default field. If you want to find a document called "The Right Way" which contains the text "don't go this way," you could include either of the following terms in your search query:
title:"The Right Way" AND text:go
title:"Do it right" AND go
Since text is the default field, the field indicator is not required; hence the second query above omits it.
The field is only valid for the term that it directly precedes, so the query title:Do it right will find only "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field).
Boolean Operators Supported by the Standard Query Parser
Boolean operators allow you to apply Boolean logic to queries, requiring the presence or absence of specific terms or conditions in fields in order to match documents. The table below summarizes the Boolean operators supported by the standard query parser.
| Boolean Operator | Alternative Symbol | Description |
|---|---|---|
| AND | && | Requires both terms on either side of the Boolean operator to be present for a match. |
| NOT | ! | Requires that the following term not be present. |
| OR | || | Requires that either term (or both terms) be present for a match. |
| + | Requires that the following term be present. | |
| - | Prohibits the following term (that is, matches on fields or documents that do not include that term). The - operator is functional similar to the Boolean operator !. Because it's used by popular search engines such as Google, it may be more familiar to some user communities. |
Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators.
| When specifying Boolean operators with keywords such as AND or NOT, the keywords must appear in all uppercase. |
| The standard query parser supports all the Boolean operators listed in the table above. The DisMax query parser supports only + and -. |
The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
In the schema.xml file, you can specify which symbols can take the place of Boolean operators such as OR. To search for documents that contain either "jakarta apache" or just "jakarta," use the query:
"jakarta apache" jakarta
or
"jakarta apache" OR jakarta
The Boolean Operator +
The + symbol (also known as the "required" operator) requires that the term after the + symbol exist somewhere in a field in at least one document in order for the query to return a match.
For example, to search for documents that must contain "jakarta" and that may or may not contain "lucene," use the following query:
+jakarta lucene
| This operator is supported by both the standard query parser and the DisMax query parser. |
The Boolean Operator AND (&&)
The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.
To search for documents that contain "jakarta apache" and "Apache Lucene," use either of the following queries:
"jakarta apache" AND "Apache Lucene"
"jakarta apache" && "Apache Lucene"
The Boolean Operator NOT (!)
The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.
The following queries search for documents that contain the phrase "jakarta apache" but do not contain the phrase "Apache Lucene":
"jakarta apache" NOT "Apache Lucene"
"jakarta apache" ! "Apache Lucene"
The Boolean Operator -
The - symbol or "prohibit" operator excludes documents that contain the term after the - symbol.
For example, to search for documents that contain "jakarta apache" but not "Apache Lucene," use the following query:
"jakarta apache" -"Apache Lucene"
Escaping Special Characters
Solr gives the following characters special meaning when they appear in a query:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : /
To make Solr interpret any of these characters literally, rather as a special character, precede the character with a backslash character \. For example, to search for (1+1):2 without having Solr interpret the plus sign and parentheses as special characters for formulating a sub-query with two terms, escape the characters by preceding each one with a backslash:
\(1\+1\)\:2
Grouping Terms to Form Sub-Queries
Lucene/Solr supports using parentheses to group clauses to form sub-queries. This can be very useful if you want to control the Boolean logic for a query.
The query below searches for either "jakarta" or "apache" and "website":
(jakarta OR apache) AND website
This adds precision to the query, requiring that the term "website" exist, along with either term "jakarta" and "apache."
Grouping Clauses within a Field
To apply two or more Boolean operators to a single field in a search, group the Boolean clauses within parentheses. For example, the query below searches for a title field that contains both the word "return" and the phrase "pink panther":
title:(+return +"pink panther")
Differences between Lucene Query Parser and the Solr Standard Query Parser
Solr's standard query parser differs from the Lucene Query Parser in the following ways:
- A * may be used for either or both endpoints to specify an open-ended range query
- field:[* TO 100] finds all field values less than or equal to 100
- field:[100 TO *] finds all field values greater than or equal to 100
- field:[* TO *] matches all documents with the field
- Pure negative queries (all clauses prohibited) are allowed (only as a top-level clause)
- -inStock:false finds all field values where inStock is not false
- -field:[* TO *] finds all documents without a value for field
- A hook into FunctionQuery syntax. You'll need to use quotes to encapsulate the function if it includes parentheses, as shown in the second example below:
- val:myfield
- val:"recip(rord(myfield),1,2,3)"
- Support for any type of query parser. Prior to Solr 4.1, the "magic" field "_query_ needed to be used to nest another query parser. However, with Solr 4.1, other query parsers can be used directly using the local parameters syntax.
- {!geodist d=10 p=20.5,30.2}
- Range queries ("[a TO z]"), prefix queries ("a*"), and wildcard queries ("a*b") are constant-scoring (all matching documents get an equal score). The scoring factors TF, IDF, index boost, and "coord" are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).
Specifying Dates and Times
If you use the Solr DateField type, any queries on those fields (typically range queries) should use the TrieDate Field. Here are some examples of valid parameters using syntax appropriate for the DateField type:
- timestamp:[*TO NOW]
- createdate:[1976-03-06T23:59:59.999Z TO *]
- createdate:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]
- pubdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]
- createdate:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]
- createdate:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]