SolrCloud is a set of Solr features that expands the capabilities of Solr's distributed search, simplifying the creation and management of Solr clusters. SolrCloud is still under active development, but already supports the following features:
The Apache Solr Reference Guide includes an extensive section on SolrCloud, which includes background information and configuration instructions. Some changes have been made for LucidWorks Search, however, which are described below.
LucidWorks Search implements SolrCloud as a purely Solr feature; to manage SolrCloud shards and replicas, you should refer to and use instructions designed for a purely Solr installation. There are only a few caveats and modifications for LucidWorks Search, detailed below, specifically for bootstrapping ZooKeeper and the cluster nodes.
The standard instructions for starting SolrCloud are modified slightly for LucidWorks Search. While much of the SolrCloud documentation in the Apache Solr Reference Guide and in the Solr Wiki can be used, it is important to only start LucidWorks Search in SolrCloud mode with the instructions included here.
To start LucidWorks Search in SolrCloud mode, you need to bootstrap the configuration and start ZooKeeper. To show how to do this, we'll use a very simple two-node cluster as an example.
Because we need two servers for this example, we will make two installations of LucidWorks, one on the server "example" and the other on the server "example2". During installation, do not start LucidWorks Search. Instead, start them manually, as shown below.
|We recommend that you only install LucidWorks using the installer application; copying the LucidWorksSearch directory to another directory to create another server may cause conflicts with ports. Information on installing LucidWorks is available in the section on Installation.|
The installation in example should use port 8983 for the LWE-Core component, which is selected during the installation process. The installation on example2 should use the default port (8888) for the LWE-Core component. If enabling other components, be sure to modify the ports for each installation as well. If new to LucidWorks, see the section on Working With LucidWorks Search Components. Your port selections might look like this:
|Component||example Ports||example2 Ports|
ZooKeeper will run on the LWE-Core port + 1000, so in this scenario we expect ZooKeeper to run on port 9983. It's important to keep that in mind while planning the installation ports so there isn't an inadvertent conflict with LucidWorks Search ports.
|SolrCloud uses ZooKeeper to manage nodes, and it's worth taking a look at the ZooKeeper website to understand how ZooKeeper works before configuring SolrCloud. Solr can embed ZooKeeper, but for a production use, it's recommended to run a ZooKeeper ensemble, as described in the ZooKeeper section of the SolrCloud wiki page.|
This will define example as the leader node. We use the usual LucidWorks start script, but pass some Java options to it. The bootstrap_conf allows copying of the configuration files for each collection to the nodes, while zkRun starts ZooKeeper. The numShards value defines how many nodes there will be in the cluster. Be sure to set this accurately, as Solr cannot yet easily increase the number of shards without re-bootstrapping the cluster.
We only need to pass bootstrap_conf and numShards the first time LucidWorks is started in SolrCloud mode. In subsequent LucidWorks restarts, start this leader node with ./start.sh -lwe_core_java_opts "-DzkRun". The -DzkRun could be added to master.conf, in which case the start.sh script alone would start ZooKeeper each time.
For each node of the cluster, we use a slightly different command. This would start example2:
Note that the port defined as the zkHost is the port of the LWE-Core component + 1000. So, if LWE-Core was defined at port 8983, ZooKeeper would be started at port 9983.
|The above instructions assume a Linux-based operating system. For Windows-based systems, use start.bat as in these examples:
This table outlines the differences between the Solr instructions for bootstrapping SolrCloud mode and the LucidWorks Search instructions. It is meant as a summary if you are already familiar with how SolrCloud works.
|Use start.jar||Use start.sh or start.bat with -lwe_core_java_opts defined|
|Use bootstrap_confdir to upload configuration files to ZooKeeper||bootstrap_conf=true|
|Use collection.configName||Not needed with bootstrap_conf=true|
|Default configuration directory is ./solr/collection1/conf||Default configuration directory is $LWE_HOME/conf/solr/cores/collection1_0/conf|
There are some caveats to using SolrCloud with LucidWorks Search, as it is so far only partially integrated with the system. Future releases of LucidWorks Search will contain more tight integration points with SolrCloud functionality.
When running LucidWorks Search in SolrCloud mode, some LucidWorks Search-specific features are not yet fault tolerant and highly available. While the index and configuration files are fully SolrCloud supported, the following are not currently replicated across shards:
- Data sources and their related metadata (such as crawl history)
- The LucidWorks user database, which stores manually created users (such as the default "admin" user)
- User alerts
- LDAP configuration files
- SSL configuration
Even though these features aren't replicated, they can still be used with LucidWorks Search in SolrCloud mode. The files that hold this metadata are in the $LWE_HOME/conf folder and could be copied to the other nodes in the cluster to act as backup if the main node goes down for any length of time. This is a manual process and not yet automated by LucidWorks Search.
To accommodate for this limitation, we recommend that you do a full LucidWorks Search installation (i.e., all components) on every machine in your cluster. You should then choose one node to use for the Admin UI. This is the node that will store your data sources and associated metadata. Another node can be chosen as the node that does crawling, or you can use the same node used by the Admin UI. Document updates will still be sent to the nodes, via the index update processes that make up SolrCloud functionality. If the node used for the Admin UI goes down, you can choose another node to act as the Admin UI node, but unless the related configuration files have been copied to that node you will not have the same user accounts and data sources in the other nodes. Once you bring the node originally used for the Admin UI back, it should still have your data sources and other LucidWorks-specific metadata.
|The following LucidWorks features may encounter significant problems when working in SolrCloud mode:
|When creating a new collection (with either the Admin UI or the API), and you are working in SolrCloud mode, you can specify the number of shards to break it up into. This number, however, cannot be higher than the number of shards defined when LucidWorks Search was bootstrapped.|
If you review the Solr Reference Guide or any of the Solr documentation about SolrCloud, you may notice that using the Apache ZooKeeper instance that is included with Solr is not recommended for real production systems. This is because the embedded Zookeeper will not provide sufficient failover; the ZooKeeper instance is dependent on the Solr instance so if one of the Solr instances is shut down, an associated ZooKeeper instance will also be shut down.
The LucidWorks Apache Solr Reference Guide provides information about how to use a stand-alone ZooKeeper instance at Setting Up an External ZooKeeper Ensemble. That information is worth reviewing before installing a stand-alone ZooKeeper. The same instructions apply if used with LucidWorks Search, with the exception of the bootstrapping instructions as described in the earlier section Starting LucidWorks Search (above).
|When using stand-alone ZooKeeper with LucidWorks Search, you need to take care to keep your version of ZooKeeper updated with the latest version distributed with Solr and LucidWorks Search. Since you are using it as a stand-alone application, it does not get upgraded when you upgrade LucidWorks Search.
Solr 4.0 and LucidWorks 2.5.0 and 2.5.1 use Apache ZooKeeper v3.3.6.
Solr 4.1 and LucidWorks 2.5.2 use Apache ZooKeeper v3.4.5.