Solr: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
mNo edit summary
Brian Wilson (talk | contribs)
 
(19 intermediate revisions by the same user not shown)
Line 12: Line 12:


[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions.
[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions.
=== Solr + GeoServer ===
This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and
I added the extension to my docker-compose project.


== How are docker volumes used? ==
== How are docker volumes used? ==
Line 17: Line 22:
I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]].
I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]].
It's mounted at /var/lib/solr inside the container.
It's mounted at /var/lib/solr inside the container.
On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox
and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian
machine.
Since I am using Docker Compose, I don't have to create the volumes. Compose does that.
So on the Mac I see /var/lib/volumes/solr_solr_data


== Where's the web server? ==
== Where's the web server? ==
Line 25: Line 38:
== How to do stuff ==
== How to do stuff ==


=== Get shell access? ===
=== Get shell access ===


To get a bash shell, so you can look around,
To get a bash shell, so you can look around,
Line 33: Line 46:
Normal management is via REST API so you don't usually get much benefit from using a shell.
Normal management is via REST API so you don't usually get much benefit from using a shell.


=== Create a core? ===
=== Create a core ===


In the command line environment, you'd do it with the bin/solr command.
In the command line environment, you'd do it with the bin/solr command.
The following is assuming the core name is "films" and we're using the sample data that came with Solr download.
The following is assuming the core name is "films" and we're using the sample data that came with Solr download.


Create the core, local command line
  cd source/solr/solr-8.0.0
  cd source/solr/solr-8.0.0
bin/solr create_core -c films


Create the core
Create the core, dockerized command line version
  bin/solr create_core -c films
  docker exec -it solr bin/solr create_core -c films
 
Create the core, API version
''Don't have it documented yet''
 
=== Edit the schema per the tutorial. ===


Edit the schema per the tutorial.
First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".
First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".
Then to demo using the REST command, we did this; I am not sure what it does. But we did it.


  curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' \
Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.
http://localhost:8983/solr/films/schema
 
The "source:*" could be refined to search only selected fields.
 
  curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' '<nowiki>http://localhost:8983/solr/films/schema</nowiki>'


Add data to it
===  Index some data ===
 
Add data to it, command line version
  bin/post -c films example/films/films.json
  bin/post -c films example/films/films.json


If you run the same post command twice it will just index the same data again, so it changes nothing internally.
Add data via curl, this does not work, the file is too big I think.
ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki>
 
ogr2ogr -f CSV \
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \
taxlots_accounts.csv taxlots_accounts.shp
 
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki>
 
If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data?
What if the schema changes?


=== Delete a core ===
=== Delete a core ===


In Docker,
Normally I do this from the Web admin page.
 
In Docker though the command would be:


  docker exec -it solr bin/solr delete -c ''corename''
  docker exec -it solr bin/solr delete -c ''corename''
Line 65: Line 102:
=== Show the fields in a core's schema ===
=== Show the fields in a core's schema ===


Wiki bug?? If I put the http in the right place on the next line I cannot save it. You know what to do.
In this case, for "films" core:
  curl <nowiki>'http://localhost:8983/solr/films/schema/fields'</nowiki>
 
=== Query ===
 
Go to [https://solr.wildsong.biz/solr/#/taxlots/query the query page for taxlots] and click "Execute Query". You'll get the first 10 records because q = *.*
 
Enter a query in the 'q' field. Try these
* owner:leornal -- search only the owner field
* leornal -- search everywhere
* 27539
* michelle && gardner
* "river point"
* "walter p"
* !state:or
* owner:null -- the string "null" not "no data in owner".
* wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".
 
Okay that's enough of that let's move on; read the reference guide!
 
== How to be a client ==
 
I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me.
Now it's time to put together a web browser search tool. I will be doing this in my [https://github.com/brian32768/react-bootstrap-test react-bootstrap-test] app, it has a search menu item.
 
On the search page, I added a simple controlled text input box.  
 
# I want it to do that cool command completion via ajax thing.
# I want it to build a results table once you pick something.
 
The query generator see previous section shows you how to construct queries as URLs. Here is one.
 
https://solr.wildsong.biz/solr/taxlots/select?q=sears


curl hachteeteepee://localhost:8983/solr/films/schema/fields
Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/

Latest revision as of 00:08, 17 September 2019

Apache Solr is a search platform built on Apache Lucene.

I have the official Solr 8.0.0 running in a Docker container. I am learning how to put data into it now.

Docs

I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.

official Solr Docker repo

Solr Reference Guide; includes getting started instructions.

Solr + GeoServer

This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and I added the extension to my docker-compose project.

How are docker volumes used?

I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on Bellman. It's mounted at /var/lib/solr inside the container.

On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian machine.

Since I am using Docker Compose, I don't have to create the volumes. Compose does that.

So on the Mac I see /var/lib/volumes/solr_solr_data

Where's the web server?

In the web server you can perform administrative tasks and you can run queries too. I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr

How to do stuff

Get shell access

To get a bash shell, so you can look around,

docker exec -it --user=solr solr bash

Normal management is via REST API so you don't usually get much benefit from using a shell.

Create a core

In the command line environment, you'd do it with the bin/solr command. The following is assuming the core name is "films" and we're using the sample data that came with Solr download.

Create the core, local command line

cd source/solr/solr-8.0.0
bin/solr create_core -c films

Create the core, dockerized command line version

docker exec -it solr bin/solr create_core -c films

Create the core, API version Don't have it documented yet

Edit the schema per the tutorial.

First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".

Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.

The "source:*" could be refined to search only selected fields.

curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 'http://localhost:8983/solr/films/schema'

Index some data

Add data to it, command line version

bin/post -c films example/films/films.json

Add data via curl, this does not work, the file is too big I think.

ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'
ogr2ogr -f CSV \
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \
taxlots_accounts.csv taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'

If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data? What if the schema changes?

Delete a core

Normally I do this from the Web admin page.

In Docker though the command would be:

docker exec -it solr bin/solr delete -c corename

I bet there is a curl command too.

Show the fields in a core's schema

In this case, for "films" core:

curl 'http://localhost:8983/solr/films/schema/fields'

Query

Go to the query page for taxlots and click "Execute Query". You'll get the first 10 records because q = *.*

Enter a query in the 'q' field. Try these

  • owner:leornal -- search only the owner field
  • leornal -- search everywhere
  • 27539
  • michelle && gardner
  • "river point"
  • "walter p"
  • !state:or
  • owner:null -- the string "null" not "no data in owner".
  • wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".

Okay that's enough of that let's move on; read the reference guide!

How to be a client

I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me. Now it's time to put together a web browser search tool. I will be doing this in my react-bootstrap-test app, it has a search menu item.

On the search page, I added a simple controlled text input box.

  1. I want it to do that cool command completion via ajax thing.
  2. I want it to build a results table once you pick something.

The query generator see previous section shows you how to construct queries as URLs. Here is one.

https://solr.wildsong.biz/solr/taxlots/select?q=sears

Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/