Solr: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
mNo edit summary
Brian Wilson (talk | contribs)
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
[http://lucene.apache.org/solr/ Apache Solr] is a search platform built on Apache Lucene.
[http://lucene.apache.org/solr/ Apache Solr] is a search platform built on Apache Lucene.


I have official Solr 8.0.0 running in a Docker container.
I have the official Solr 8.0.0 running in a Docker container.
I am learning how to put data into it now.
I am learning how to put data into it now.


Line 7: Line 7:


I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html
I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html
He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.


[https://hub.docker.com/_/solr/ official Solr Docker repo]
[https://hub.docker.com/_/solr/ official Solr Docker repo]


[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions.
[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions.
=== Solr + GeoServer ===
This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and
I added the extension to my docker-compose project.


== How are docker volumes used? ==
== How are docker volumes used? ==


I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]].
I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]].
It's mounted at /var/lib/solr in the container.
It's mounted at /var/lib/solr inside the container.
 
On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox
and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian
machine.
 
Since I am using Docker Compose, I don't have to create the volumes. Compose does that.
 
So on the Mac I see /var/lib/volumes/solr_solr_data


== Where's the web server? ==
== Where's the web server? ==


I have it running behind my firewall right now at https://solr.wildsong.biz/.
In the web server you can perform administrative tasks and you can run queries too.
I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr
 
== How to do stuff ==


== How do I get shell access? ==
=== Get shell access ===


To get a bash shell,
To get a bash shell, so you can look around,


  docker exec -it --user=solr solr bash
  docker exec -it --user=solr solr bash


"Cores" are the setups containing data and configuration. This worked to create the "taxlots" core.
Normal management is via REST API so you don't usually get much benefit from using a shell.
 
=== Create a core ===
 
In the command line environment, you'd do it with the bin/solr command.
The following is assuming the core name is "films" and we're using the sample data that came with Solr download.
 
Create the core, local command line
cd source/solr/solr-8.0.0
bin/solr create_core -c films
 
Create the core, dockerized command line version
docker exec -it solr bin/solr create_core -c films
 
Create the core, API version
''Don't have it documented yet''
 
=== Edit the schema per the tutorial. ===
 
First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".
 
Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.
 
The "source:*" could be refined to search only selected fields.
 
curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' '<nowiki>http://localhost:8983/solr/films/schema</nowiki>'
 
===  Index some data ===
 
Add data to it, command line version
bin/post -c films example/films/films.json
 
Add data via curl, this does not work, the file is too big I think.
ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki>
 
ogr2ogr -f CSV \
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \
taxlots_accounts.csv taxlots_accounts.shp
 
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki>
 
If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data?
What if the schema changes?
 
=== Delete a core ===
 
Normally I do this from the Web admin page.
 
In Docker though the command would be:
 
docker exec -it solr bin/solr delete -c ''corename''
 
I bet there is a curl command too.
 
=== Show the fields in a core's schema ===
 
In this case, for "films" core:
curl <nowiki>'http://localhost:8983/solr/films/schema/fields'</nowiki>
 
=== Query ===
 
Go to [https://solr.wildsong.biz/solr/#/taxlots/query the query page for taxlots] and click "Execute Query". You'll get the first 10 records because q = *.*
 
Enter a query in the 'q' field. Try these
* owner:leornal -- search only the owner field
* leornal -- search everywhere
* 27539
* michelle && gardner
* "river point"
* "walter p"
* !state:or
* owner:null -- the string "null" not "no data in owner".
* wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".
 
Okay that's enough of that let's move on; read the reference guide!
 
== How to be a client ==
 
I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me.
Now it's time to put together a web browser search tool. I will be doing this in my [https://github.com/brian32768/react-bootstrap-test react-bootstrap-test] app, it has a search menu item.
 
On the search page, I added a simple controlled text input box.
 
# I want it to do that cool command completion via ajax thing.
# I want it to build a results table once you pick something.
 
The query generator see previous section shows you how to construct queries as URLs. Here is one.
 
https://solr.wildsong.biz/solr/taxlots/select?q=sears


docker exec -it --user=solr solr bin/solr create_core -c taxlots
Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/

Latest revision as of 00:08, 17 September 2019

Apache Solr is a search platform built on Apache Lucene.

I have the official Solr 8.0.0 running in a Docker container. I am learning how to put data into it now.

Docs

I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.

official Solr Docker repo

Solr Reference Guide; includes getting started instructions.

Solr + GeoServer

This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and I added the extension to my docker-compose project.

How are docker volumes used?

I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on Bellman. It's mounted at /var/lib/solr inside the container.

On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian machine.

Since I am using Docker Compose, I don't have to create the volumes. Compose does that.

So on the Mac I see /var/lib/volumes/solr_solr_data

Where's the web server?

In the web server you can perform administrative tasks and you can run queries too. I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr

How to do stuff

Get shell access

To get a bash shell, so you can look around,

docker exec -it --user=solr solr bash

Normal management is via REST API so you don't usually get much benefit from using a shell.

Create a core

In the command line environment, you'd do it with the bin/solr command. The following is assuming the core name is "films" and we're using the sample data that came with Solr download.

Create the core, local command line

cd source/solr/solr-8.0.0
bin/solr create_core -c films

Create the core, dockerized command line version

docker exec -it solr bin/solr create_core -c films

Create the core, API version Don't have it documented yet

Edit the schema per the tutorial.

First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".

Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.

The "source:*" could be refined to search only selected fields.

curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 'http://localhost:8983/solr/films/schema'

Index some data

Add data to it, command line version

bin/post -c films example/films/films.json

Add data via curl, this does not work, the file is too big I think.

ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'
ogr2ogr -f CSV \
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \
taxlots_accounts.csv taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'

If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data? What if the schema changes?

Delete a core

Normally I do this from the Web admin page.

In Docker though the command would be:

docker exec -it solr bin/solr delete -c corename

I bet there is a curl command too.

Show the fields in a core's schema

In this case, for "films" core:

curl 'http://localhost:8983/solr/films/schema/fields'

Query

Go to the query page for taxlots and click "Execute Query". You'll get the first 10 records because q = *.*

Enter a query in the 'q' field. Try these

  • owner:leornal -- search only the owner field
  • leornal -- search everywhere
  • 27539
  • michelle && gardner
  • "river point"
  • "walter p"
  • !state:or
  • owner:null -- the string "null" not "no data in owner".
  • wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".

Okay that's enough of that let's move on; read the reference guide!

How to be a client

I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me. Now it's time to put together a web browser search tool. I will be doing this in my react-bootstrap-test app, it has a search menu item.

On the search page, I added a simple controlled text input box.

  1. I want it to do that cool command completion via ajax thing.
  2. I want it to build a results table once you pick something.

The query generator see previous section shows you how to construct queries as URLs. Here is one.

https://solr.wildsong.biz/solr/taxlots/select?q=sears

Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/