Solr

From Wildsong
Jump to navigationJump to search

Apache Solr is a search platform built on Apache Lucene.

I have the official Solr 8.0.0 running in a Docker container. I am learning how to put data into it now.

Docs

I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.

official Solr Docker repo

Solr Reference Guide; includes getting started instructions.

How are docker volumes used?

I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on Bellman. It's mounted at /var/lib/solr inside the container.

Where's the web server?

In the web server you can perform administrative tasks and you can run queries too. I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr

How to do stuff

Get shell access?

To get a bash shell, so you can look around,

docker exec -it --user=solr solr bash

Normal management is via REST API so you don't usually get much benefit from using a shell.

Create a core?

In the command line environment, you'd do it with the bin/solr command. The following is assuming the core name is "films" and we're using the sample data that came with Solr download.

Create the core, local command line

cd source/solr/solr-8.0.0
bin/solr create_core -c films

Create the core, dockerized command line version

docker exec -it solr bin/solr create_core -c films

Create the core, API version Don't have it documented yet

Edit the schema per the tutorial. First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".

Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.

The "source:*" could be refined to search only selected fields.

curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 'http://localhost:8983/solr/films/schema'

Add data to it, command line version

bin/post -c films example/films/films.json

Add data via curl,

Delete a core

In Docker,

docker exec -it solr bin/solr delete -c corename

I bet there is a curl command too.

Show the fields in a core's schema

In this case, for "films" core:

curl 'http://localhost:8983/solr/films/schema/fields'

Query

Go to the query page for taxlots and click "Execute Query". You'll get the first 10 records because q = *.*

Enter a query in the 'q' field. Try these

  • owner:leornal -- search only the owner field
  • leornal -- search everywhere
  • 27539
  • michelle && gardner
  • "river point"
  • "walter p"
  • !state:or
  • owner:null -- the string "null" not "no data in owner".
  • wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".

okay that's enough of that let's move on

How to be a client

I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me. Now it's time to put together a web browser search tool. I will be doing this in my react-bootstrap-test app, it has a search menu item.

On the search page, I added a simple controlled text input box.

  1. I want it to do that cool command completion via ajax thing.
  2. I want it to build a results table once you pick something.