The Data Source History API returns historical statistics for previous data source runs. History only returns information about prior crawls for a data source. Use Data Source Jobs or Data Source Status for details on currently running crawls.
Note that some crawlers (such as the lucid.fs crawler) are "stateless", meaning the crawler can not be aware of documents deleted or modified between crawls. In crawl statistics, this can mean that deleted or updated documents are not counted as such, or that adding the total number of "new" documents in two different crawls does not equal the number of documents in the index.
API Entry Points
/api/collections/name/datasources/id/history: Get statistics for the last 50 runs of the given data source.
Get Data Source History
GET /api/collections/collection/datasources/id/history
Input
Path Parameters
Enter path parameters.
| Key | Description |
|---|---|
| collection | The collection name. |
| id | The data source ID. |
Query Parameters
None
Output
Output Content
| Key | Type | Description |
|---|---|---|
| id | integer | The ID of the datasource. |
| crawl_started | date string | When the crawl began. |
| crawl_stopped | date string | When the crawl finished. |
| crawl_state | string | The current state of the crawl (RUNNING, FINISHED, or STOPPED). |
| num_unchanged | 32-bit integer | The number of documents found that were not modified and did not need to be indexed. |
| num_deleted | 32-bit integer | The number of documents that were removed from the index because they were no longer found in the source. |
| num_new | 32-bit integer | The number of new documents that were found in the source and added to the index. |
| num_updated | 32-bit integer | The number of existing documents that were found in the source and updated in the index because they were modified since the last time they were indexed. |
| num_failed | 32-bit integer | The number of documents from which the crawler failed to extract text. |
| num_total | 32-bit integer | The total number of documents found. |
| batch_job | boolean | If false, documents found will be indexed after crawling. |
| job_id | integer | The ID of the job. |
Examples
Input
curl 'http://localhost:8888/api/collections/myCollection/datasources/8/history'
Output
[
{
"id": 2,
"crawl_started": "2011-03-17T22:16:46+0000",
"num_unchanged": 0,
"crawl_state" : "FINISHED",
"crawl_stopped": "2011-03-17T22:16:51+0000",
"job_id": "2",
"num_updated": 0 ,
"num_new": 6,
"num_failed": 0,
"num_deleted": 0,
"num_total":6,
"batch_job":false,
"job_id":3
},
{
"id": 2,
"crawl_started": "2011-03-18T03: 25:04+0000",
"num_unchanged": 0,
"crawl_state": "FINISHED",
"crawl_stopped": "2011-03-18T 03:25:12+0000",
"job_id": "2",
"num_updated": 0,
"num_new": 6,
"num_failed": 0,
"num_deleted": 0,
"num_total: 6,
"batch_job":false,
"job_id":2
}
]