Monitoring your application always is an important part of running production system. Most system administrators have used various tools to ensure everything is ok from the health of server's filesystem to the the temperature of CPUs. LucidWorks Enterprise provides additional capabilities to integrate application level statistics information into these monitoring tools.
JMX is a standard way for managing and monitoring all varieties of software components for Java applications. JMX uses objects called MBeans (Managed Beans) to expose data and resources from your application. LucidWorks Enterprise provides number of read-only monitoring beans that provide useful statistical/performance information. Combined with JVM (platform JMX MBeans) and OS level information, it becomes powerful tool for monitoring.
By default JMX is enabled in LucidWorks Enterprise for local access only. If you want to connect and monitor application remotely you need to change lwecore.jvm.params parameter in the LWE_HOME/conf/master.conf file and add the following JVM parameters:
Where 3000 is an unused TCP port number.
You might want to secure remote JMX access either by configuring a software or hardware firewall to allow connections to specified port only from your hosts/network or by configuring password authentication and/or SSL encryption. For more information about various security options please refer to the JMX documentation.
There are number of various JMX clients out there you can use to connect to LucidWorks Enterprise server and browse available information.
JConsole is a standard (part of the JDK) graphical monitoring tool to monitor Java Virtual Machine (JVM) and Java applications which provides a nice way to display memory and CPU information as well MBeans from arbitrary applications.
Jmxterm is an open source command line based interactive JMX client. It allows you to easily navigate JMX MBeans on remote servers without running a graphical interface or opening a JMX port. It can also be integrated with script languages such as Bash, Perl, Python, Ruby, etc. See the following as an example of how it can be used:
LucidWorks Enterprise provides number of useful JMX MBeans, some in Solr and some in LucidWorks Enterprise:
|solr/<collection_name>||type=updateHandler,id=org.apache.solr.update.DirectUpdateHandler2||cumulative_adds, cumulative_deletesById, cumulative_deletesByQuery, cumulative_errors, commits, autocommits, optimizes, rollbacks, docsPending, etc||This MBean provides comprehensive information about indexing activity like number of added documents, number of errors, number of commits, autocommits and optimize operations. It is really useful to plot that information into graphs in your monitoring system. The cumulative_errors parameter shows the number of low level IO exceptions.|
|solr/<collection_name>||type=/update,id=org.apache.solr.handler.XmlUpdateRequestHandler||request, errors, avgTimePerRequest, etc||If using direct Solr API, there are separate beans for all types of handlers you can use to index documents into the system, such as XML, CSV, JSON request handlers. It makes sense to add this UpdateRequest Handler information to indexing graphs as well. You might also setup monitoring alert on a number of errors for particular update handler to make sure LucidWorks Enterprise clients don't hit any errors during indexing like invalid fields names or types, no required fields in indexed documents, etc.|
|solr/<collection_name>||type=/lucid,id=org.apache.solr.handler.StandardRequestHandler||requests, errors, timeouts, avgTimePerRequest||This MBean represents the default LucidWorks Enterprise Solr request handler and provides statistics about number of search requests, errors, timeouts and average response time for search requests. It's pretty useful to display this information on monitoring graphs as well as setup monitoring alerts, such as, "notify administrator if average response time is more than 0.5 second or total number of errors and timeouts is more than 1% of total requests".|
|solr/<collection_name>||type=searcher,id=org.apache.solr.search.SolrIndexSearcher||numDocs, warmupTime||numDocs is the total number of documents in the index. warmupTime is the amount of time a new Searcher takes to warm. When LucidWorks Enterprise commits new data into index, a new Searcher is opened and warmed. The warming operation regenerates caches from the previous Searcher instance and runs some predefined in solrconfig.xml queries to warm up IO filesystem cache and load Lucene FieldCache in memory. This attribute basically defines how long does it take to commit before new data will be available to users. It makes sense to monitor this parameter and setup trigger to alert the LucidWorks Enterprise administrator if it takes more time than you expect.|
|solr/<collection_name>||type=filterCache,id=org.apache.solr.search.LRUCache||cumulative_evictions, cumulative_hitratio, cumulative_hits, cumulative_inserts, cumulative_lookups, warmupTime, etc||Solr caches popular filter query (fq=category:IT) attributes as unordered sets of document ids. This technique significantly improves search filtering/faceting performance. size is the current number of cached filter queries. cumulative_hitratio represents if this cache is successfully utilized by giving the ratio of successful cache hits to overall number of lookups. If it's low (such as < 0.3 or 30%) over long period of time then you might want either increase cache size or disable it at all to reduce performance overhead.|
|solr/<collection_name>||type=queryResultCache,id=org.apache.solr.search.LRUCache||cumulative_evictions, cumulative_hitratio, cumulative_hits, cumulative_inserts, cumulative_lookups, warmupTime, etc||This cache stores ordered sets of document IDs and the top N results of a query ordered by some criteria. It has the same attributes as filterCache.|
|solr/<collection_name>||type=documentCache,id=org.apache.solr.search.LRUCache||cumulative_evictions, cumulative_hitratio, cumulative_hits, cumulative_inserts, cumulative_lookups, etc||The documentCache stores Lucene Document objects that have been fetched from disk.|
LucidWorks Enterprise MBeans
|lwe||id=crawlers,name=<data_source_id>,type=datasources||total_runs, total_time, num_total, num_new, num_updated, num_unchanged, num_failed, num_deleted||This MBean displays crawlers statistics information for specific data source (like number of processed documents, number of errors, etc). If you have periodically or long running scheduled data source then you might want to monitor and alert if there's any problem with the underlying source (web site, SharePoint server, etc) or how optimized your incremental crawl is (percentage of num_unchanged to num_total), for example.|
|lwe||id=crawlers,name=<collection_name>,type=collections||total_runs, total_time, num_total, num_new, num_updated, num_unchanged, num_failed, num_deleted||If you have multiple data sources and don't want to monitor on per data source level, but keep an eye on aggregate numbers for the whole collection you might want to use this bean.|
|lwe||id=crawlers,type=total||total_runs, total_time, num_total, num_new, num_updated, num_unchanged, num_failed, num_deleted||You can use this MBean if you have multiple collections (homogeneous collections or multi-tenant architecture) to monitor on per instance level.|
Using JConsole and JmxTerm tools is a good way to explore information hidden in JMX, but what you really need is to monitor your application automatically, record historical information, display it in a graphical form, configure parameters thresholds as triggers and send alerts in case of denial of service or performance problems. There are various standard sysadmin tools for that and integrating LucidWorks Enterprise with them is no different than with any other Java application. The idea is that you can retrieve application information and send it to external monitoring system. In our documentation we provide two examples of integrating LucidWorks Enterprise server with popular open source monitoring tools - Zabbix and Nagios.
Zabbix is an enterprise-class open source distributed monitoring solution for networks and applications. It comes with pre-defined templates for almost all operating systems as well as various open source applications. It also has a great template for JVM that contains the most vital statistics of arbitrary Java application. There are different ways how you can integrate LucidWorks Enterprise with Zabbix and the best approach depends on the Zabbix release version.
Post-2.0 releases (currently it's in beta release stage) comes with built-in support for monitoring Java applications (Zabbix Java proxy). For more information please see the JMX Monitoring section of the Zabbix manual.
If you are handy with scripting and command line tools then you can also gather and send all the JMX information using either:
- UserParameter: You can configure the Zabbix system agent to send custom monitored items using UserParameter configuration parameter. For retrieving JMX statistics you can use either cmdline-jmxclient or jmxterm command line clients.
- zabbix_sender tool: If you have a large number of JMX monitored items, or you need to monitor some items quite frequently, then spawning a Java Virtual Machine process to get a single object/attribute can be too expensive. In this case consider scripting JMX interactions using the JMXTerm command line tool and your favorite scripting language. The solution below is in Ruby but it could be implemented using any scripting language. The main idea is that you can run a JMXTerm java application from your script and communicate with it using stdin and stdout streams using expect library.
This section covers step by step guide how to integrate LucidWorks Enterprise product with the Zabbix 2.0 (1.9.x) release. This won't work with previous releases (1.8.x) because they lack built-in JMX support.
- Download and install 2.0 (1.9.x) release according to official documentation. In order to build Zabbix JMX proxy you should build Zabbix package with the --enable-java configuration option, such as ./configure --enable-server --with-mysql --enable-java.
- After make install you should copy the example init.d start script from misc/init.d/debian/zabbix-server into the /etc/init.d directory and edit it to start the JMX proxy daemon as well. To do that you should add <install_dir>/sbin/zabbix_java/startup.sh and <install_dir>/sbin/zabbix_java/shutdown.sh calls to the corresponding options in init.d.
- Configure JMX proxy in /etc/zabbix/zabbix_server.conf (see JavaProxy, JavaProxyPort and StartJavaPollers parameters). Verify that you're using the same port configured in <install_dir>/sbin/zabbix_java/settings.sh file. It is also recommended to enable JMX proxy verbose logging (edit <install_dir>/sbin/zabbix_java/lib/logback.xml file and change file element to point to your log file directory and set level attribute to debug level).
- Import the sample Zabbix templates found in $LWE_HOME/app/examples/zabbix called lwe_zabbix_templates.xml (there are 3 in that file).
- Install the Zabbix agent to the server where LucidWorks Enterprise is installed and configure it to connect to the Zabbix server.
- Add Zabbix host and assign proper template for the OS (linux, freebsd, etc.).
- Assign the imported templates (Template_JVM, Template_Solr, Template_LWE) to that host.
- Enable JMX monitoring in LucidWorks Enterprise and allow the Zabbix server connect to JMX interface over the network.
- Add JMX interface to host where LucidWorks Enterprise is installed.
- Start any activity in the LucidWorks Enterprise server (crawling, indexing, serving) and check out graphs for monitored host (see screenshots below).
- Total number of documents in search index
- Solr index operations (commits, optimizes, rollbacks)
- Solr document operations (adds, deletes by id or query)
- Crawling activity - number of total documents processed, number of failures (retrieve, parsing), number of new documents
- Search activity - number of search requests
- Search Average Response Time
- Searcher Warmup Time (how fast committed docs become visible/searchable)
- Java Heap Memory Usage
- Caches stats
Nagios is a popular open source computer system and network monitoring software application. It watches hosts and services, alerting users when things go wrong and again when they get better. There are different Nagios plugins that allow you to monitor Java applications using JMX interface. We recommend you to use Syabru Nagios JMX Plugin as the most mature plugin that supports different data types (integers, floats, string regular expressions) and advanced Nagios threshold syntax. In order to install Syabru Nagios JMX Plugin you should copy check_jmx and check_jmx.jar from the downloaded package to Nagios plugins directory and add check_jmx command definition to either global commands.cfg configuration file or put the jmx.cfg file into nagios_plugins configuration directory. The next step is to define Nagios services, as in this example:
After you setup your services and reload the Nagios configuration you can monitor application state using either the Nagios web UI or receive email notifications.
- Nagios UI screenshot (thresholds on the screenshots are lowered to trigger critical state as an example)
- Nagios email alert
- OS file system cache: One of the frequent problems with LucidWorks Enterprise and Lucene/Solr applications is that if you do not have enough free memory and a significant index size you might notice performance problems because there's not enough free memory for the file system cache. IO cache is a crucial resource for search applications, so it definitely makes sense to monitor this parameter and display it in graphs with other memory information like free memory, jvm heap memory, swap, etc. This parameter is part of the OS level monitoring in Zabbix (name is vm.memory.size[cached]).
- File descriptors: Another problem is that sometimes your application can hit OS or per process file descriptor limits. It is also recommended to monitor these parameters and set trigger thresholds for these parameters.
- CPU usage: Default Zabbix templates have triggers for CPU load average numbers. You might want to tune thresholds for your server based on number of CPUs and expected load.
- Heap memory usage and garbage collector statistics: Zabbix Java template contains multiple items and triggers for memory and garbage collector invocation counts. You should also tune these parameters to match your scenario.
- Solr index size and free disk space: These should be set properly to avoid "Out Of Disk Space" errors.