The ExternalFileField type makes it possible to specify the values for a field in a file outside the Solr index. For such a field, the file contains mappings from a key field to the field value. Another way to think of this is that, instead of specifying the field in documents as they are indexed, Solr finds values for this field in the external file.
|External fields are not searchable. They can be used only for function queries or display. For more information on function queries, see the section on Function Queries.|
The ExternalFileField type is handy for cases where you want to update a particular field in many documents more often than you want to update the rest of the documents. For example, suppose you have implemented a document rank based on the number of views. You might want to update the rank of all the documents daily or hourly, while the rest of the contents of the documents might be updated much less frequently. Without ExternalFileField, you would need to update each document just to change the rank. Using ExternalFileField is much more efficient because all document values for a particular field are stored in an external file that can be updated as frequently as you wish.
In schema.xml, the definition of this field type might look like this:
The keyField attribute defines the key that will be defined in the external file. It is usually the unique key for the index, but it doesn't need to be as long as the keyField can be used to identify documents in the index. A defVal defines a default value that will be used if there is no entry in the external file for a particular document.
The valType attribute specifies the actual type of values that will be found in the file. The type specified must be either a float field type, so valid values for this attribute are pfloat, float or tfloat. This attribute can be omitted.
The file itself is located in Solr's index directory, which by default is $SOLR_HOME/data. The name of the file should be external_fieldname or external_fieldname.*. For the example above, then, the file could be named external_entryRankFile or external_entryRankFile.txt.
|If any files using the name pattern .* (such as .txt) appear, the last (after being sorted by name) will be used and previous versions will be deleted. This behavior supports implementations on systems where one may not be able to overwrite a file (for example, on Windows, if the file is in use).|
The file contains entries that map a key field, on the left of the equals sign, to a value, on the right. Here are a few example entries:
The keys listed in this file do not need to be unique. The file does not need to be sorted, but Solr will be able to perform the lookup faster if it is.
As of Solr 4.1, it's possible to define an event listener to reload an external file when either a searcher is reloaded or when a new searcher is started. See the section Query Related Listeners for more information, but a sample definition in solrconfig.xml might look like this:
The PreAnalyzedField type provides a way to send to Solr serialized token streams, optionally with independent stored values of a field, and have this information stored and indexed without any additional text processing applied in Solr. This is useful if user wants to submit field content that was already processed by some existing external text processing pipeline (e.g., it has been tokenized, annotated, stemmed, synonyms inserted, etc), while using all the rich attributes that Lucene's TokenStream provides (per-token attributes).
The serialization format is pluggable using implementations of PreAnalyzedParser interface. There are two out-of-the-box implementations:
- JsonPreAnalyzedParser: as the name suggests, it parses content that uses JSON to represent field's content. This is the default parser to use if the field type is not configured otherwise.
- SimplePreAnalyzedParser: uses a simple strict plain text format, which in some situations may be easier to create than JSON.
There is only one configuration parameter, parserImpl. The value of this parameter should be a fully qualified class name of a class that implements PreAnalyzedParser interface. The default value of this parameter is org.apache.solr.schema.JsonPreAnalyzedParser.