Support Resources

LucidWorks Support Portal
LucidWorks Big Data Forum

LucidWorks Big Data

PDF Versions

This is the documentation for LucidWorks Big Data v1.2.

Skip to end of metadata
Go to start of metadata

LucidWorks Big Data is an integrated platform for enhancing data-driven decisions by leveraging tools for search, discovery and analysis of massive data or content sets.

Built on leading open source components such as Apache Solr, Hadoop, Mahout and others, in addition to the power of LucidWorks Search, it brings together previously disparate tools into a single distribution with a unified, secure REST API for application integration and administration. This open core approach allows application developers the ability to focus on integration, while giving Data Scientists, Architects, Search Developers and Big Data Engineers the ability to plug in scalable algorithms customized to meet business goals without having to wire together the low-level Hadoop ecosystem pieces time and again. Moreover, LucidWorks Big Data (LWBD) also provides the tools DevOps staff needs to manage and monitor a large-scale, distributed cluster.

Common search and discovery capabilities are provided out of the box and work on the data (content and logs) in the system with minimal client input. Discovery algorithms such as clustering, named entity recognition and statistically interesting phrases complement commonly used metrics like mean reciprocal rank, popular queries and terms, and queries with low-number of results in order to give users deeper insight into both the content and the ways users interact with it.

Data acquisition, often a major problem in getting big data projects off the ground, is also greatly simplified by leveraging LWBD's prebuilt Hadoop data loading workflows with the power of LucidWorks Search's extensible connector framework. This extensive capabilities allows applications to bring a large number of disparate data sources into LucidWorks Big Data, including support for Hadoop-supported sequence files, databases, Amazon S3, websites, SolrXML, WARC files, SharePoint directories, Windows Shares, FTP servers, and other types of file systems. Data can also be pushed directly to the system using JSON over REST. In short, all the methods of acquiring content in Hadoop (including WebHDFS), LucidWorks, and Solr are supported.

  • No labels