Solr: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
Created page with "I have [https://store.docker.com/images/f4e3929d-d8bc-491e-860c-310d3f40fff2?tab=description official Solr] running in a Docker container. When I try to create a core it just..."
 
Brian Wilson (talk | contribs)
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
I have [https://store.docker.com/images/f4e3929d-d8bc-491e-860c-310d3f40fff2?tab=description official Solr] running in a Docker container.
[http://lucene.apache.org/solr/ Apache Solr] is a search platform built on Apache Lucene.
When I try to create a core it just spits error messages at me.
 
I have the official Solr 8.0.0 running in a Docker container.
I am learning how to put data into it now.


== Docs ==
== Docs ==
I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html
He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.


[https://hub.docker.com/_/solr/ official Solr Docker repo]
[https://hub.docker.com/_/solr/ official Solr Docker repo]


[http://lucene.apache.org/solr/ Solr Home Page]
[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions.
 
=== Solr + GeoServer ===
 
This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and
I added the extension to my docker-compose project.
 
== How are docker volumes used? ==
 
I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]].
It's mounted at /var/lib/solr inside the container.
 
On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox
and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian
machine.
 
Since I am using Docker Compose, I don't have to create the volumes. Compose does that.
 
So on the Mac I see /var/lib/volumes/solr_solr_data
 
== Where's the web server? ==
 
In the web server you can perform administrative tasks and you can run queries too.
I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr
 
== How to do stuff ==
 
=== Get shell access ===
 
To get a bash shell, so you can look around,
 
docker exec -it --user=solr solr bash
 
Normal management is via REST API so you don't usually get much benefit from using a shell.
 
=== Create a core ===
 
In the command line environment, you'd do it with the bin/solr command.
The following is assuming the core name is "films" and we're using the sample data that came with Solr download.
 
Create the core, local command line
cd source/solr/solr-8.0.0
bin/solr create_core -c films
 
Create the core, dockerized command line version
docker exec -it solr bin/solr create_core -c films
 
Create the core, API version
''Don't have it documented yet''
 
=== Edit the schema per the tutorial. ===
 
First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".
 
Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.
 
The "source:*" could be refined to search only selected fields.
 
curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' '<nowiki>http://localhost:8983/solr/films/schema</nowiki>'
 
===  Index some data ===
 
Add data to it, command line version
bin/post -c films example/films/films.json
 
Add data via curl, this does not work, the file is too big I think.
ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki>
 
ogr2ogr -f CSV \
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \
taxlots_accounts.csv taxlots_accounts.shp
 
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki>
 
If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data?
What if the schema changes?
 
=== Delete a core ===
 
Normally I do this from the Web admin page.
 
In Docker though the command would be:
 
docker exec -it solr bin/solr delete -c ''corename''
 
I bet there is a curl command too.
 
=== Show the fields in a core's schema ===
 
In this case, for "films" core:
curl <nowiki>'http://localhost:8983/solr/films/schema/fields'</nowiki>
 
=== Query ===
 
Go to [https://solr.wildsong.biz/solr/#/taxlots/query the query page for taxlots] and click "Execute Query". You'll get the first 10 records because q = *.*
 
Enter a query in the 'q' field. Try these
* owner:leornal -- search only the owner field
* leornal -- search everywhere
* 27539
* michelle && gardner
* "river point"
* "walter p"
* !state:or
* owner:null -- the string "null" not "no data in owner".
* wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".
 
Okay that's enough of that let's move on; read the reference guide!
 
== How to be a client ==
 
I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me.
Now it's time to put together a web browser search tool. I will be doing this in my [https://github.com/brian32768/react-bootstrap-test react-bootstrap-test] app, it has a search menu item.
 
On the search page, I added a simple controlled text input box.


[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide], a PDF document; includes getting started instructions.
# I want it to do that cool command completion via ajax thing.
# I want it to build a results table once you pick something.


== How do I get shell access? ==
The query generator see previous section shows you how to construct queries as URLs. Here is one.


You can go to the console in VirtualBox.
https://solr.wildsong.biz/solr/taxlots/select?q=sears


[[File:docker_console.png]]
Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/

Latest revision as of 00:08, 17 September 2019

Apache Solr is a search platform built on Apache Lucene.

I have the official Solr 8.0.0 running in a Docker container. I am learning how to put data into it now.

Docs

I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.

official Solr Docker repo

Solr Reference Guide; includes getting started instructions.

Solr + GeoServer

This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and I added the extension to my docker-compose project.

How are docker volumes used?

I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on Bellman. It's mounted at /var/lib/solr inside the container.

On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian machine.

Since I am using Docker Compose, I don't have to create the volumes. Compose does that.

So on the Mac I see /var/lib/volumes/solr_solr_data

Where's the web server?

In the web server you can perform administrative tasks and you can run queries too. I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr

How to do stuff

Get shell access

To get a bash shell, so you can look around,

docker exec -it --user=solr solr bash

Normal management is via REST API so you don't usually get much benefit from using a shell.

Create a core

In the command line environment, you'd do it with the bin/solr command. The following is assuming the core name is "films" and we're using the sample data that came with Solr download.

Create the core, local command line

cd source/solr/solr-8.0.0
bin/solr create_core -c films

Create the core, dockerized command line version

docker exec -it solr bin/solr create_core -c films

Create the core, API version Don't have it documented yet

Edit the schema per the tutorial.

First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".

Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.

The "source:*" could be refined to search only selected fields.

curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 'http://localhost:8983/solr/films/schema'

Index some data

Add data to it, command line version

bin/post -c films example/films/films.json

Add data via curl, this does not work, the file is too big I think.

ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'
ogr2ogr -f CSV \
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \
taxlots_accounts.csv taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'

If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data? What if the schema changes?

Delete a core

Normally I do this from the Web admin page.

In Docker though the command would be:

docker exec -it solr bin/solr delete -c corename

I bet there is a curl command too.

Show the fields in a core's schema

In this case, for "films" core:

curl 'http://localhost:8983/solr/films/schema/fields'

Query

Go to the query page for taxlots and click "Execute Query". You'll get the first 10 records because q = *.*

Enter a query in the 'q' field. Try these

  • owner:leornal -- search only the owner field
  • leornal -- search everywhere
  • 27539
  • michelle && gardner
  • "river point"
  • "walter p"
  • !state:or
  • owner:null -- the string "null" not "no data in owner".
  • wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".

Okay that's enough of that let's move on; read the reference guide!

How to be a client

I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me. Now it's time to put together a web browser search tool. I will be doing this in my react-bootstrap-test app, it has a search menu item.

On the search page, I added a simple controlled text input box.

  1. I want it to do that cool command completion via ajax thing.
  2. I want it to build a results table once you pick something.

The query generator see previous section shows you how to construct queries as URLs. Here is one.

https://solr.wildsong.biz/solr/taxlots/select?q=sears

Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/