| As of Solr 4.0, SolrCloud is the preferred way to distribute indexes for redundancy, failover, and improved performance. Index Replication and Distributed Search are considered obsolete technologies; while still supported, they are not in active development. See the section on Using SolrCloud in LucidWorks for more information on using SolrCloud with LucidWorks Search. |
Index Replication distributes complete copies of a master index to one or more slave servers. The master server continues to manage updates to the index. All querying is handled by the slaves. This division of labor enables Solr to scale to provide adequate responsiveness to queries against large search volumes. The master server's index is replicated on the slaves, which then process requests such as queries.
LucidWorks Search supports index replication, but it is not configured through the Admin UI. Instead, replication configuration requires editing XML configuration files in the Solr release included with LucidWorks Search. This section explains how replication works and how to edit the configuration files. Detailed examples are provided, so even if you're new to XML and Solr configuration, you should be able to set up and configure master/slave replication servers with ease.
| When the Click Scoring Relevance Framework is enabled, LucidWorks ensures that also the click boost data is replicated together with index files. See the section on Click Scoring Tools and Index Replication for more information. |
Configuring Replication on the Master Server
To set up replication, you will need to edit the solrconfig.xml file on the master server. To edit the file, you can use an XML editor or even a simpler tool such as Notepad on a PC or TextEdit on a Mac.
Within the solrconfig.xml file, you will edit the definition for a Request Handler. A Request Handler is a Solr process that responds to requests. In this case, you will be configuring the Replication RequestHandler, which processes requests specific to replication.
The example below shows how to configure the Replication RequestHandler on a master server.
<requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="master"> <!-- Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string --> <str name="replicateAfter">optimize</str> <!-- Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string. Note that this is just for backup, replication does not require this. --> <!-- <str name="backupAfter">optimize</str> --> <!-- If configuration files need to be replicated give the names here, separated by comma --> <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str> <!-- The default value of reservation is 10 secs. See the documentation below. Normally, you should not need to specify this --> <str name="commitReserveDuration">00:00:10</str> </lst> </requestHandler>
Operations that Trigger Replication
The value of the replicateAfter parameter in the ReplicationHandler configuration determines which types of events should trigger the creation of snapshots for use in replication.
The replicateAfter parameter can accept multiple arguments.
| replicateAfter Setting | Description |
|---|---|
| startup | Triggers replication whenever the master index starts up. |
| commit | Triggers replication whenever a commit is performed on the master index. |
| optimize | Triggers replication whenever the master index is optimized. |
If you are using startup setting for replicateAfter, you'll also need a commit or optimize if you want to trigger replication on future commits/optimizes as well. If only the startup option is given, replication will not be triggered on subsequent commits/optimizes after it is done for the first time at the start.
Configuring Replication on Slave Servers
The code below shows how to configure a ReplicationHandler on a slave server.
<requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="slave"> <!-- fully qualified url for the replication handler of master. It is possible to pass on this as a request param for the fetchindex command --> <str name="masterUrl">http://master.solr.company.com:8983/solr/corename/replication</str> <!-- Interval in which the slave should poll master. Format is HH:mm:ss. If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API --> <str name="pollInterval">00:00:20</str> <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED --> <!-- To use compression while transferring the index files. The possible values are internal|external if the value is 'external' make sure that your master Solr has the settings to honor the accept-encoding header. see here for details http://wiki.apache.org/solr/SolrHttpCompression If it is 'internal' everything will be taken care of automatically. USE THIS ONLY IF YOUR BANDWIDTH IS LOW. THIS CAN ACTUALLY SLOW DOWN REPLICATION IN A LAN --> <str name="compression">internal</str> <!-- The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency --> <str name="httpConnTimeout">5000</str> <str name="httpReadTimeout">10000</str> <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following --> <str name="httpBasicAuthUser">username</str> <str name="httpBasicAuthPassword">password</str> </lst> </requestHandler>
The master server is unaware of the slaves. Each slave server continuously polls the master (depending on the pollInterval parameter) to check the current index version of the master. If the slave finds out that the master has a newer version of the index it initiates a replication process. The steps are as follows:
- The slave issues a filelist command to get the list of the files. This command returns the names of the files as well as some metadata (e.g., size, a lastmodified timestamp, an alias if any).
- The slave checks with its own index if it has any of those files in the local index. It then runs the filecontent command to download the missing files. This uses a custom format (akin to the HTTP chunked encoding) to download the full content or a part of each file. If the connection breaks in between, the download resumes from the point it failed. At any point, the slave tries 5 times before giving up a replication altogether.
- The files are downloaded into a temp directory, so that if either the slave or the master crashes during the download process, no files will be corrupted. Instead, the replication process will simply abort.
- After the download completes, all the new files are 'mv'ed to the live index directory, and the file's timestamp is set to be identifical to the file's counterpart on the master master.
- A commit command is issued on the slave by the Slave's ReplicationHandler, and the new index is loaded.
Configuring Replication on a Repeater Server
A master may be able to serve only so many slaves without affecting performance. Some organizations have deployed slave servers across multiple data centers. If each slave downloads the index from a remote data center, the resulting download may consume too much network bandwidth. To avoid performance degradation in cases like this, you can configure one or more slaves as repeaters. A repeater is simply a node that acts as both a master and a slave. To configure a server as a repeater, the definition of the Replication requestHandler in the solrconfig.xml file must include file lists of use for both masters and slaves. Be sure to set the replicateAfter parameter to commit, even if replicateAfter is set to optimize on the main master. This is because on a repeater (or any slave), a commit is called only after the index is downloaded. The optimize command is never called on slaves. Optionally, one can configure the repeater to fetch compressed files from the master through the compression parameter to reduce the index download time.
Here's an example of a ReplicationHandler configuration for a repeater:
<requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="master"> <str name="replicateAfter">commit</str> <str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str> </lst> <lst name="slave"> <str name="masterUrl">http://master.solr.company.com:8983/solr/corename/replication</str> <str name="pollInterval">00:00:60</str> </lst> </requestHandler>
Replicating Configuration Files
To replicate configuration files, list them with the confFiles parameter in the master's configuration. Only files found in the conf directory of the master's Solr instance will be replicated.
Solr replicates configuration files only when the index itself is replicated. Even if a configuration file is changed on the master, that file will be replicated only after there is a new commit/optimize on master's index.
As a precaution when replicating configuration files, Solr copies configuration files to a temporary directory before moving them into their ultimate location in the conf directory. The old configuration files are then renamed and kept in the same conf/ directory. The ReplicationHandler does not automatically clean up these old files.
Unlike the index files, where the timestamp is good enough to figure out if they are identical, configuration files are compared against their checksum. If a replication involved downloading at least one configuration file with a modified checksum, the ReplicationHandler issues a core-reload command instead of a commit command.
Replicating the solrconfig.xml File
To keep the configuration of the master servers and slave servers in sync, you can configure the replication process to copy configuration files from the master server to the slave servers. In the solrconfig.xml on the master server, include a confFiles value like the following:
<str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>
This ensures that the local configuration solrconfig_slave.xml will be saved as solrconfig.xml on the slave. All other files will be saved with their original names. On the master server, the file name of the slave configuration file can be anything, as long as the name is correctly identified in the confFiles string; then it will be saved as whatever file name appears after the colon ':'.
Related Topics
- Using SolrCloud in LucidWorks
- Scaling and Distribution chapter from the Apache Solr Reference Guide