Newer Versions

v2.1
v2.0
v1.8
v1.7

LucidWorks Enterprise v1.6

Other Resources

Lucid Support Forums

This is the User Guide for LucidWorks Enterprise v1.6. The latest version is v2.1

Skip to end of metadata
Go to start of metadata

Index Replication distributes complete copies of a master index to one or more slave servers. The master server continues to manage updates to the index. All querying is handled by the slaves. This division of labor enables Solr to scale to provide adequate responsiveness to queries against large search volumes. The master server's index is replicated on the slaves, which then process requests such as queries.

When Click Scoring Relevance Framework is enabled LucidWorks Enterprise ensures that also the click boost data is replicated together with index files.

LucidWorks Enterprise supports index replication. In LucidWorks Enterprise 1.6, index replication is not configured through the administration user interface. Instead, replication configuration requires editing XML configuration files in the Solr release included with LucidWorks Enterprise. This chapter explains how replication works and how to edit the configuration files. Detailed examples are provided, so even if you're new to XML and Solr configuration, you should be able to set up and configure master/slave replication servers with ease.

Note
Remember that LucidWorks Enterprise uses Solr multi-core technology, with "collection1" as the default core, causing paths to be of the form http://host:port/solr/collection1/* instead of http://host:port/solr/* as shown in the Solr documentation.

Configuring Replication on the Master Server

To set up replication, you'll need to edit the solrconfig.xml on the master server. To edit the file, you can use an XML editor or even a simpler tool such as Notepad on a PC or TextEdit on a Mac.

Within the solrconfig.xml file, you'll edit the definition for a Request Handler. A Request Handler is a Solr process that responds to requests. In this case, you'll be configuring the Replication RequestHandler, which processes requests specific to replication.

The example below shows how to configure the Replication RequestHandler on a master server.

<requestHandler name="/replication" class="solr.ReplicationHandler">
 <lst name="master">
  <!-- Replicate on 'optimize'. Other values can be 'commit', 'startup'.
       It is possible to have multiple entries of this config string -->
  <str name="replicateAfter">optimize</str>
  <!-- Create a backup after 'optimize'. Other values can be 'commit', 'startup'.
       It is possible to have multiple entries of this config string.
       Note that this is just for backup, replication does not require this.
   -->
  <!-- <str name="backupAfter">optimize</str> -->
  <!-- If configuration files need to be replicated give the names here,
       separated by comma -->
  <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
  <!-- The default value of reservation is 10 secs. See the documentation
      below. Normally, you should not need to specify this -->
  <str name="commitReserveDuration">00:00:10</str>
 </lst>
</requestHandler>

Operations that Trigger Replication

The value of the replicateAfter parameter in the ReplicationHandler configuration determines which types of events should trigger the creation of snapshots for use in replication.

The replicateAfter parameter can accept multiple arguments.

replicateAfter Setting Description
startup Triggers replication whenever the master index starts up.
commit Triggers replication whenever a commit is performed on the master index.
optimize Triggers replication whenever the master index is optimized.

If you are using startup setting for replicateAfter, you'll also need a commit or optimize if you want to trigger replication on future commits/optimizes as well. If only the startup option is given, replication will not be triggered on subsequent commits/optimizes after it is done for the first time at the start.

Configuring Replication on Slave Servers

The code below shows how to configure a ReplicationHandler on a slave server.

<requestHandler name="/replication" class="solr.ReplicationHandler">
 <lst name="slave">
  <!-- fully qualified url for the replication handler of master.
       It is possible to pass on this as a request param for the
       fetchindex command
    -->
  <str name="masterUrl">http://localhost:port/solr/corename/replication</str>
  <!-- Interval in which the slave should poll master. Format is HH:mm:ss.
       If this is absent slave does not poll automatically.
       But a fetchindex can be triggered from the admin or the http API
    -->
  <str name="pollInterval">00:00:20</str>
  <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED -->
  <!-- To use compression while transferring the index files.
       The possible values are internal|external
       if the value is 'external' make sure that your master Solr
       has the settings to honor the accept-encoding header.
       see here for details http://wiki.apache.org/solr/SolrHttpCompression
       If it is 'internal' everything will be taken care of automatically.

       USE THIS ONLY IF YOUR BANDWIDTH IS LOW.
       THIS CAN ACTUALLY SLOW DOWN REPLICATION IN A LAN -->
  <str name="compression">internal</str>
  <!-- The following values are used when the slave connects to the
       master to download the index files.
       Default values implicitly set as 5000ms and 10000ms respectively.
       The user DOES NOT need to specify these unless the bandwidth
       is extremely low or if there is an extremely high latency
    -->
  <str name="httpConnTimeout">5000</str>
  <str name="httpReadTimeout">10000</str>
  <!-- If HTTP Basic authentication is enabled on the master,
       then the slave can be configured with the following -->
  <str name="httpBasicAuthUser">username</str>
  <str name="httpBasicAuthPassword">password</str>
 </lst>
</requestHandler>

The master server is unaware of the slaves. Each slave server continuously polls the master (depending on the pollInterval parameter) to check the current index version of the master. If the slave finds out that the master has a newer version of the index it initiates a replication process. The steps are as follows:

  1. The slave issues a filelist command to get the list of the files. This command returns the names of the files as well as some metadata (e.g., size, a lastmodified timestamp, an alias if any).
  2. The slave checks with its own index if it has any of those files in the local index. It then runs the filecontent command to download the missing files. This uses a custom format (akin to the HTTP chunked encoding) to download the full content or a part of each file. If the connection breaks in between, the download resumes from the point it failed. At any point, the slave tries 5 times before giving up a replication altogether.
  3. The files are downloaded into a temp directory, so that if either the slave or the master crashes during the download process, no files will be corrupted. Instead, the replication process will simply abort.
  4. After the download completes, all the new files are 'mv'ed to the live index directory, and the file's timestamp is set to be identifical to the file's counterpart on the master master.
  5. A commit command is issued on the slave by the Slave's ReplicationHandler, and the new index is loaded.

Configuring Replication on a Repeater Server

A master may be able to serve only so many slaves without affecting performance. Some organizations have deployed slave servers across multiple data centers. If each slave downloads the index from a remote data center, the resulting download may consume too much network bandwidth. To avoid performance degradation in cases like this, you can configure one or more slaves as repeaters. A repeater is simply a node that acts as both a master and a slave. To configure a server as a repeater, the definition of the Replication requestHandler in the solrconfig.xml file must include file lists of use for both masters and slaves. Be sure to set the replicateAfter parameter to commit, even if replicateAfter is set to optimize on the main master. This is because on a repeater (or any slave), a commit is called only after the index is downloaded. The optimize command is never called on slaves. Optionally, one can configure the repeater to fetch compressed files from the master through the compression parameter to reduce the index download time.

Here's an example of a ReplicationHandler configuration for a repeater:

<requestHandler name="/replication" class="solr.ReplicationHandler">
 <lst name="master">
  <str name="replicateAfter">commit</str>
  <str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str>
 </lst>
 <lst name="slave">
  <str name="masterUrl">http://master.solr.company.com:8983/solr/replication</str>
  <str name="pollInterval">00:00:60</str>
 </lst>
</requestHandler>

Replicating Configuration Files

To replicate configuration files, list them with the confFiles parameter in the master's configuration. Only files found in the conf directory of the master's Solr instance will be replicated.

Solr replicates configuration files only when the index itself is replicated. Even if a configuration file is changed on the master, that file will be replicated only after there is a new commit/optimize on master's index.

As a precaution when replicating configuration files, Solr copies configuration files to a temporary directory before moving them into their ultimate location in the conf directory. The old configuration files are then renamed and kept in the same conf/ directory. The ReplicationHandler does not automatically clean up these old files.

Unlike the index files, where the timestamp is good enough to figure out if they are identical, configuration files are compared against their checksum. If a replication involved downloading at least one configuration file with a modified checksum, the ReplicationHandler issues a core-reload command instead of a commit command.

Replicating the solrconfig.xml File

To keep the configuration of the master servers and slave servers in sync, you can configure the replication process to copy configuration files from the master server to the slave servers. In the solrconfig.xml on the master server, include a confFiles value like the following:

<str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>

This ensures that the local configuration solrconfig_slave.xml will be saved as solrconfig.xml on the slave. All other files will be saved with their original names. On the master server, the file name of the slave configuration file can be anything, as long as the name is correctly identified in the confFiles string; then it will be saved as whatever file name appears after the colon ':'.

More Information

For more information about configuring index replication, please see Chapter 10 of the LucidWorks for Solr Certified Distribution Guide.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.