Support Resources

LucidWorks Forum
KnowledgeBase

LucidWorks Search v2.5

PDF Versions

Older Versions

LucidWorks 2.1
LucidWorks 2.0
LucidWorks 1.8

This is the documentation for the LucidWorks Search v2.5, the latest release. Go here for LucidWorks 2.1.

Skip to end of metadata
Go to start of metadata

The Data Source Status API provides a means to get information about whether a data source is currently being processed. This outputs the same information as the Data Source Jobs API but is available as a way to intermittently check the progress of the job.

API Entry Points

/api/collections/collection/datasources/id/status: Get this data source's status

Get the Status of a Data Source

GET /api/collection/collection/datasources/id/status

Input

Path Parameters

Key Description
collection The collection name.
id The data source ID.

Query Parameters

None

Output

Output Content

Key Description
batch_job If false, the content crawled will be added to the index.
crawl_started The date and time the crawl started.
crawl_state The current state of the job.  Entries are FINISHED, STOPPED, or RUNNING.
crawl_stopped The date and time the crawl stopped.
id The unique id of the datasource.
job_id The ID of the job itself.
num_access_denied The number of documents that could not be accessed because of file permissions or wrong authentication.
num_deleted The number of documents that were removed from the index.
num_failed The number of documents that could not be parsed.
num_filter_denied The number of documents that could not be accessed because of inclusion or exclusion rules.
num_new The number of documents considered "new".
num_not_found The number of documents the crawler expected to find (because of a link from a known document, from a symlink, or a redirect) but the remote server responded with HTTP 404 NOT_FOUND or "file missing".
num_robots_denied The number of documents that could not be crawled because of robots.txt rules.
num_total The total number of documents found during the last crawl.
num_unchanged The number of documents that were not changed.
num_updated The number of documents that were updated.

Examples

Input

curl 'http://localhost:8888/api/collections/collection1/datasources/2/status'

Output
While the data source is being processed:

{
    "batch_job": false, 
    "crawl_started": "2012-02-06T18:40:12+0000", 
    "crawl_state": "RUNNING", 
    "crawl_stopped": null, 
    "id": 6, 
    "job_id": "6", 
    "num_access_denied": 0, 
    "num_deleted": 0, 
    "num_failed": 2, 
    "num_filter_denied": 0, 
    "num_new": 227, 
    "num_not_found": 0, 
    "num_robots_denied": 0, 
    "num_total": 229, 
    "num_unchanged": 0, 
    "num_updated": 0
}

After processing is finished, and the data source is idle:

{
     "batch_job": false, 
     "crawl_started": "2012-02-06T18:40:12+0000", 
     "crawl_state": "FINISHED", 
     "crawl_stopped": "2012-02-06T18:42:19+0000", 
     "id": 6, 
     "job_id": "6", 
     "num_access_denied": 0, 
     "num_deleted": 0, 
     "num_failed": 2, 
     "num_filter_denied": 0, 
     "num_new": 1099, 
     "num_not_found": 0, 
     "num_robots_denied": 0, 
     "num_total": 1101, 
     "num_unchanged": 0, 
     "num_updated": 0
}

Labels

apilink apilink Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.