Support Resources

LucidWorks Forum
KnowledgeBase

LucidWorks Search v2.5

PDF Versions

Older Versions

LucidWorks 2.1
LucidWorks 2.0
LucidWorks 1.8

This is the documentation for the LucidWorks Search v2.5, the latest release. Go here for LucidWorks 2.1.

Skip to end of metadata
Go to start of metadata

The Data Source History API returns historical statistics for previous data source runs. History only returns information about prior crawls for a data source. Use Data Source Jobs or Data Source Status for details on currently running crawls.

Note that some crawlers (such as the lucid.fs crawler) are "stateless", meaning the crawler can not be aware of documents deleted or modified between crawls. In crawl statistics, this can mean that deleted or updated documents are not counted as such, or that adding the total number of "new" documents in two different crawls does not equal the number of documents in the index.

API Entry Points

/api/collections/name/datasources/id/history: Get statistics for the last 50 runs of the given data source.

Get Data Source History

GET /api/collections/collection/datasources/id/history

Input

Path Parameters

Enter path parameters.

Key Description
collection The collection name.
id The data source ID.

Query Parameters

None

Output

Output Content

Key Type Description
id integer The ID of the datasource.
crawl_started date string When the crawl began.
crawl_stopped date string When the crawl finished.
crawl_state string The current state of the crawl (RUNNING, FINISHED, or STOPPED).
num_unchanged 32-bit integer The number of documents found that were not modified and did not need to be indexed.
num_deleted 32-bit integer The number of documents that were removed from the index because they were no longer found in the source.
num_new 32-bit integer The number of new documents that were found in the source and added to the index.
num_updated 32-bit integer The number of existing documents that were found in the source and updated in the index because they were modified since the last time they were indexed.
num_failed 32-bit integer The number of documents from which the crawler failed to extract text.
num_total 32-bit integer The total number of documents found.
batch_job boolean If false, documents found will be indexed after crawling.
job_id integer The ID of the job.

Examples

Input

curl 'http://localhost:8888/api/collections/myCollection/datasources/8/history'

Output

[
    {
        "id": 2,
        "crawl_started": "2011-03-17T22:16:46+0000",
        "num_unchanged": 0,
        "crawl_state" : "FINISHED",
        "crawl_stopped": "2011-03-17T22:16:51+0000",
        "job_id": "2",
        "num_updated": 0 ,
        "num_new": 6,
        "num_failed": 0,
        "num_deleted": 0,
        "num_total":6,
        "batch_job":false,
        "job_id":3
    },
    {
        "id": 2,
        "crawl_started": "2011-03-18T03: 25:04+0000",
        "num_unchanged": 0,
        "crawl_state": "FINISHED",
        "crawl_stopped": "2011-03-18T 03:25:12+0000",
        "job_id": "2",
        "num_updated": 0,
        "num_new": 6,
        "num_failed": 0,
        "num_deleted": 0,
        "num_total: 6,
        "batch_job":false,
        "job_id":2
    }
]
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.