|As of Solr 4.0, SolrCloud is the preferred way to distribute indexes for redundancy, failover, and improved performance. Index Replication and Distributed Search are considered obsolete technologies; while still supported, they are not in active development. See the section on Using SolrCloud in LucidWorks for more information on using SolrCloud with LucidWorks Search.|
Index Replication distributes complete copies of a master index to one or more slave servers. The master server continues to manage updates to the index. All querying is handled by the slaves. This division of labor enables Solr to scale to provide adequate responsiveness to queries against large search volumes. The master server's index is replicated on the slaves, which then process requests such as queries.
LucidWorks Search supports index replication, but it is not configured through the Admin UI. Instead, replication configuration requires editing XML configuration files in the Solr release included with LucidWorks Search. This section explains how replication works and how to edit the configuration files. Detailed examples are provided, so even if you're new to XML and Solr configuration, you should be able to set up and configure master/slave replication servers with ease.
|When the Click Scoring Relevance Framework is enabled, LucidWorks ensures that also the click boost data is replicated together with index files. See the section on Click Scoring Tools and Index Replication for more information.|
To set up replication, you will need to edit the solrconfig.xml file on the master server. To edit the file, you can use an XML editor or even a simpler tool such as Notepad on a PC or TextEdit on a Mac.
Within the solrconfig.xml file, you will edit the definition for a Request Handler. A Request Handler is a Solr process that responds to requests. In this case, you will be configuring the Replication RequestHandler, which processes requests specific to replication.
The example below shows how to configure the Replication RequestHandler on a master server.
The value of the replicateAfter parameter in the ReplicationHandler configuration determines which types of events should trigger the creation of snapshots for use in replication.
The replicateAfter parameter can accept multiple arguments.
|startup||Triggers replication whenever the master index starts up.|
|commit||Triggers replication whenever a commit is performed on the master index.|
|optimize||Triggers replication whenever the master index is optimized.|
If you are using startup setting for replicateAfter, you'll also need a commit or optimize if you want to trigger replication on future commits/optimizes as well. If only the startup option is given, replication will not be triggered on subsequent commits/optimizes after it is done for the first time at the start.
The code below shows how to configure a ReplicationHandler on a slave server.
The master server is unaware of the slaves. Each slave server continuously polls the master (depending on the pollInterval parameter) to check the current index version of the master. If the slave finds out that the master has a newer version of the index it initiates a replication process. The steps are as follows:
- The slave issues a filelist command to get the list of the files. This command returns the names of the files as well as some metadata (e.g., size, a lastmodified timestamp, an alias if any).
- The slave checks with its own index if it has any of those files in the local index. It then runs the filecontent command to download the missing files. This uses a custom format (akin to the HTTP chunked encoding) to download the full content or a part of each file. If the connection breaks in between, the download resumes from the point it failed. At any point, the slave tries 5 times before giving up a replication altogether.
- The files are downloaded into a temp directory, so that if either the slave or the master crashes during the download process, no files will be corrupted. Instead, the replication process will simply abort.
- After the download completes, all the new files are 'mv'ed to the live index directory, and the file's timestamp is set to be identifical to the file's counterpart on the master master.
- A commit command is issued on the slave by the Slave's ReplicationHandler, and the new index is loaded.
A master may be able to serve only so many slaves without affecting performance. Some organizations have deployed slave servers across multiple data centers. If each slave downloads the index from a remote data center, the resulting download may consume too much network bandwidth. To avoid performance degradation in cases like this, you can configure one or more slaves as repeaters. A repeater is simply a node that acts as both a master and a slave. To configure a server as a repeater, the definition of the Replication requestHandler in the solrconfig.xml file must include file lists of use for both masters and slaves. Be sure to set the replicateAfter parameter to commit, even if replicateAfter is set to optimize on the main master. This is because on a repeater (or any slave), a commit is called only after the index is downloaded. The optimize command is never called on slaves. Optionally, one can configure the repeater to fetch compressed files from the master through the compression parameter to reduce the index download time.
Here's an example of a ReplicationHandler configuration for a repeater:
To replicate configuration files, list them with the confFiles parameter in the master's configuration. Only files found in the conf directory of the master's Solr instance will be replicated.
Solr replicates configuration files only when the index itself is replicated. Even if a configuration file is changed on the master, that file will be replicated only after there is a new commit/optimize on master's index.
As a precaution when replicating configuration files, Solr copies configuration files to a temporary directory before moving them into their ultimate location in the conf directory. The old configuration files are then renamed and kept in the same conf/ directory. The ReplicationHandler does not automatically clean up these old files.
Unlike the index files, where the timestamp is good enough to figure out if they are identical, configuration files are compared against their checksum. If a replication involved downloading at least one configuration file with a modified checksum, the ReplicationHandler issues a core-reload command instead of a commit command.
To keep the configuration of the master servers and slave servers in sync, you can configure the replication process to copy configuration files from the master server to the slave servers. In the solrconfig.xml on the master server, include a confFiles value like the following:
This ensures that the local configuration solrconfig_slave.xml will be saved as solrconfig.xml on the slave. All other files will be saved with their original names. On the master server, the file name of the slave configuration file can be anything, as long as the name is correctly identified in the confFiles string; then it will be saved as whatever file name appears after the colon ':'.