The examples/perl directory contains utilities demonstrating many of the LucidWorks REST API features from Perl code. These utilities can be used to assist people in managing their LucidWorks installation, or as an example of how to write Perl code as part of customer applications that will interact with LucidWorks and Solr.
Dependencies
All of these tools require that the "JSON" Perl module be installed.
All of these tools assume that the main URL for LWE is "http://localhost:8888" If LWE is running elsewhere, please set the LWE_URL Environment variable appropriately in the shell where you will be using these tools.
Basic Usage
| All of these tools can be run without any arguments to see "help" info about their usage. |
Get Some basic Info about the running instance of LWE...
collection.pl show collection.pl show index_num_docs index_size free_disk_space
View, Modify Settings...
settings.pl show settings.pl show boost_recent stopword_list settings.pl update boost_recent=false stopword_list=a stopword_list=an stopword_list=the
(note that creating a list is done by specifying the same setting key multiple times)
View, Create, Modify, Delete Data Sources...
ds.pl show ds.pl show id=74 ds.pl show name=simple ds.pl create name=simple type=FileSystemDataSource path=/usr/share/gtk-doc/html ds.pl create name=docs type=FileSystemDataSource path=/usr/share/gtk-doc/html crawl_depth=100 include_subdirectories=true ds.pl update id=74 crawl_depth=999 ds.pl update name=simple crawl_depth=999 ds.pl update id=74 name=new_name crawl_depth=999 ds.pl delete id=74 ds.pl delete name=simple ds.pl delete-all YES YES YES
Modify the schedule of an existing Data Source...
ds.pl schedule id=74 active=true period=60 start_time=2076-03-06T12:34:56-0800 ds.pl schedule id=74 active=true period=60 start_time=now ds.pl schedule name=simple active=true period=60 start_time=now
View the status and indexing history of existing Data Sources...
ds.pl status ds.pl status id=74 ds.pl status name=simple ds.pl history id=74 ds.pl history name=simple
View, Create, Modify, Check, Delete Alerts...
alerts.pl show username=bob alerts.pl show username=bob id=68 alerts.pl create username=bob query=gnome name=gnome_alert alerts.pl update username=bob id=68 update_interval=5 alerts.pl check username=bob id=68 alerts.pl delete username=bob id=68
View, Create, Modify, Delete Activities...
activities.pl show activities.pl show id=68 activities.pl create type=click active=true period=60 start_time=2076-03-06T12:34:56-0800 activities.pl create type=click active=true period=60 start_time=now activities.pl update id=68 active=true period=300 activities.pl delete id=68
View the status and history of existing Activities...
activities.pl status activities.pl status id=68 activities.pl history id=68
View, Create, Modify, Delete Fields...
fields.pl show fields.pl show name=mimeType fields.pl create name=category field_type=string facet=true fields.pl update name=category search_by_default=true fields.pl delete name=category
View, Create, Modify, Delete Users...
users.pl show users.pl show username=admin users.pl create username=jim first_name=Jim last_name=Bo email=jim@bo.com password=jpass users.pl update username=jim first_name=James users.pl delete username=jim
Modify Roles...
roles.pl show roles.pl show name=ROLE_SEARCH roles.pl create name=ROLE_SECRET users=hank users=sam filters=status:secret roles.pl update name=ROLE_SEARCH filters=status:public roles.pl append name=ROLE_ADMIN users=jim users=sam groups=executives roles.pl delete name=ROLE_SECRET users=hank roles.pl delete name=ROLE_OLD
Pause, Resume All Background Jobs...
maintenance.pl pause maintenance.pl resume ds=5 ds=7 activity=9
Execute Searches (with optional filters)
search.pl "gtk gnome" search.pl "gtk -gnome" search.pl "+gtk +gnome" "mimeType:text/html"
Recipes
Indexing Some Data Sources
- Start up LWE
- Create a datasourse using files on the same server as LWE
ds.pl create name=localdocs type=FileSystemDataSource path=/usr/share/gtk-doc/html crawl_depth=100 include_subdirectories=true
- Schedule the 'localdocs' datasource to be indexed every 30 minutes starting now
ds.pl schedule name=localdocs active=true period=1800 start_time=now
- Create a datasourse using a remote HTTP server
ds.pl create name=solrwiki type=WebDataSource url=http://wiki.apache.org/solr/ crawl_depth=1
- Schedule the 'solrwiki' datasource to be indexed once right now
ds.pl schedule name=solrwiki active=true period=0 start_time=now
- periodically check the 'status' of your datasources to see when the initial indexing is done (look for: "running" : false)
ds.pl status
- Execute some searches in your browser, ie: http://localhost:8989/search?q=configuration
- Searches can also be executed via the REST API using search.pl, ie...
search.pl configuration
Indexing and Activating "Filters" for Certain Users
- Start up LWE
- Modify the ROLE_SEARCH so by default users who can load the search UI can only search for HTML files
roles.pl update name=ROLE_SEARCH filters=mimeType:text/html
- Create a new user named jim
users.pl create username=jim first_name=Jim last_name=Bo email=jim@bo.com password=jimpass
- Give user jim access to the search UI
roles.pl append name=ROLE_SEARCH users=jim
- Give user jim special access to all docs via a new role
roles.pl create name=ROLE_SEE_ALL users=jim filters=*:*
- Create a datasourse of a directory containing HTML files as well as other plain text files
ds.pl create name=simple type=FileSystemDataSource path=/usr/share/gtk-doc/html crawl_depth=100 include_subdirectories=true
- Schedule the datasource to be indexed once right now
ds.pl schedule name=simple active=true period=0 start_time=now
- periodically check the 'status' of your datasource to see when the initial indexing is done (look for: "running" : false)
ds.pl status name=simple
- Execute a search in your browser, ie: http://localhost:8989/search?q=*:*
- As you execute various searches you should only see HTML documents (note the "Type" Facet in the right hand navigation column)
- Click the "Login" link in the uper right corner of search pages to go to the Login page: http://localhost:8989/login
- Login as user "jim" with password "jimpass"
- Execute the same searches as before, ie: http://localhost:8989/search?q=*:*
- As you execute various searches you should now see all documents (note the "Type" Facet in the right hand navigation column)
Indexing and Activating "Click Boosting"
- Start up LWE
- Update your settings to enabled click tracking
settings.pl update click_enabled=true
- Create a datasourse
ds.pl create name=local_click_ds type=FileSystemDataSource path=/usr/share/gtk-doc/html crawl_depth=100 include_subdirectories=true
- Schedule the datasource to be indexed every 30 minutes starting now
ds.pl schedule name=local_click_ds active=true period=1800 start_time=now
- Schedule the click processing activity to run every 10 minutes
activities.pl create type=click active=true period=600 start_time=now
- periodically check the 'status' of your datasource to see when the initial indexing is done (look for: "running" : false)
ds.pl status name=local_click_ds
- Execute a search in your browser, ie: http://localhost:8989/search?q=gnome
- As you execute searches and click on results, you should see the documents you click on filter up to the top of those searches as the click processing activity runs every 10 minutes.
Pause and Resume All Background Jobs for Maintenance
- Start up LWE
- Update your settings to enabled click tracking
settings.pl update click_enabled=true
- Create a datasourse
ds.pl create name=local_click_ds type=FileSystemDataSource path=/usr/share/gtk-doc/html crawl_depth=100 include_subdirectories=true
- Schedule the datasource to be indexed every 30 minutes starting now
ds.pl schedule name=local_click_ds active=true period=1800 start_time=now
- Schedule the click processing activity to run every 10 minutes
activities.pl create type=click active=true period=600 start_time=now
- Pause all active datasource schedules and activities, blocking until any currently running datasources and activities are finished
maintenance.pl pause
This command should output something like the following...
$ maintenance.pl pause De-Activating activity #9: http://localhost:8888/api/collections/collection1/activities/9 De-Activating schedule of ds#5: http://localhost:8888/api/collections/collection1/datasources/5/schedule Waiting for any currently running Activities to finish... ...Done! Waiting for any currently running DataSources to finish... ...Done! Run this command to resume everything that was de-activated... maintenance.pl resume activity=9 ds=5
- Perform whatever maintenance is needed
- When you are ready, run the command mentioned in the output of the "Pause" step to resume scheduled datasource and activity processing
maintenance.pl resume activity=9 ds=5
- Your datasource and click activity will now continue to be run on the previously specified schedules.