Support Resources

LucidWorks Support Portal
LucidWorks Big Data Forum

LucidWorks Big Data

PDF Versions

This is the documentation for LucidWorks Big Data v1.1.

compared with
Current by Cassandra Targett
on Mar 04, 2013 13:43.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (2)

View Page History
h5. High-Volume HDFS Attributes

The High Volume HDFS (HV-HDFS) data source uses a [MapReduce-|http://hadoop.apache.org/mapreduce/]enabled [MapReduce-|http://wiki.apache.org/hadoop/MapReduce]enabled crawler designed to leverage the scaling qualities of [Apache Hadoop|http://hadoop.apache.org] while indexing content.

To achieve this, HV-HDFS consists of a series of MapReduce enabled Jobs to convert raw content into documents that can be indexed which in turn relies on the [Behemoth|https://github.com/jnioche/behemoth] [Behemoth|https://github.com/digitalpebble/behemoth] project (we specifically leverage the [LWE|https://github.com/gsingers/behemoth/tree/LWE] fork of this project) for MapReduce ready document conversion via [Apache Tika|http://tika.apache.org] and writing of documents to LucidWorks.

The HV-HDFS data source is currently marked as "Early Access" and is thus subject to changes in how it works in future releases.