Solr: Difference between revisions
Brian Wilson (talk | contribs) |
Brian Wilson (talk | contribs) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 12: | Line 12: | ||
[https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions. | [https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ Solr Reference Guide]; includes getting started instructions. | ||
=== Solr + GeoServer === | |||
This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and | |||
I added the extension to my docker-compose project. | |||
== How are docker volumes used? == | == How are docker volumes used? == | ||
Line 17: | Line 22: | ||
I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]]. | I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on [[Bellman]]. | ||
It's mounted at /var/lib/solr inside the container. | It's mounted at /var/lib/solr inside the container. | ||
On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox | |||
and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian | |||
machine. | |||
Since I am using Docker Compose, I don't have to create the volumes. Compose does that. | |||
So on the Mac I see /var/lib/volumes/solr_solr_data | |||
== Where's the web server? == | == Where's the web server? == | ||
Line 25: | Line 38: | ||
== How to do stuff == | == How to do stuff == | ||
=== Get shell access | === Get shell access === | ||
To get a bash shell, so you can look around, | To get a bash shell, so you can look around, | ||
Line 33: | Line 46: | ||
Normal management is via REST API so you don't usually get much benefit from using a shell. | Normal management is via REST API so you don't usually get much benefit from using a shell. | ||
=== Create a core | === Create a core === | ||
In the command line environment, you'd do it with the bin/solr command. | In the command line environment, you'd do it with the bin/solr command. | ||
Line 48: | Line 61: | ||
''Don't have it documented yet'' | ''Don't have it documented yet'' | ||
Edit the schema per the tutorial. | === Edit the schema per the tutorial. === | ||
First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible". | First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible". | ||
Line 56: | Line 70: | ||
curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' '<nowiki>http://localhost:8983/solr/films/schema</nowiki>' | curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' '<nowiki>http://localhost:8983/solr/films/schema</nowiki>' | ||
=== Index some data === | |||
Add data to it, command line version | Add data to it, command line version | ||
bin/post -c films example/films/films.json | bin/post -c films example/films/films.json | ||
Add data via curl, | Add data via curl, this does not work, the file is too big I think. | ||
ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp | |||
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki> | |||
ogr2ogr -f CSV \ | |||
-sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \ | |||
-lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \ | |||
taxlots_accounts.csv taxlots_accounts.shp | |||
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json <nowiki>'http://localhost:8983/solr/taxlots/update/json/docs'</nowiki> | |||
If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data? | |||
What if the schema changes? | |||
=== Delete a core === | === Delete a core === | ||
In Docker | Normally I do this from the Web admin page. | ||
In Docker though the command would be: | |||
docker exec -it solr bin/solr delete -c ''corename'' | docker exec -it solr bin/solr delete -c ''corename'' |
Latest revision as of 00:08, 17 September 2019
Apache Solr is a search platform built on Apache Lucene.
I have the official Solr 8.0.0 running in a Docker container. I am learning how to put data into it now.
Docs
I've been watching this guy's videos. https://factorpad.com/tech/solr/tutorial/solr-tutorial.html He says the standard Solr tutorials jump in too fast and I tend to agree but these are a bit too far the other direction. They are a bit lightweight but follow up watching with the reference guide. They form a good starting point.
Solr Reference Guide; includes getting started instructions.
Solr + GeoServer
This is via community supported extension, see https://docs.geoserver.org/stable/en/user/community/solr/index.html and I added the extension to my docker-compose project.
How are docker volumes used?
I am keeping solr's data in a volume that can be found at /home/docker/volumes/solr_data/_data on Bellman. It's mounted at /var/lib/solr inside the container.
On the Mac, I tried running Docker and it would not start so (expendiency) I installed Debian in VirtualBox and then installed Docker in Debian. So from there things are similar to Bellman. I can ssh into the Debian machine.
Since I am using Docker Compose, I don't have to create the volumes. Compose does that.
So on the Mac I see /var/lib/volumes/solr_solr_data
Where's the web server?
In the web server you can perform administrative tasks and you can run queries too. I have it running behind a reverse proxy and behind my firewall right now; I access it at: https://solr.wildsong.biz/solr
How to do stuff
Get shell access
To get a bash shell, so you can look around,
docker exec -it --user=solr solr bash
Normal management is via REST API so you don't usually get much benefit from using a shell.
Create a core
In the command line environment, you'd do it with the bin/solr command. The following is assuming the core name is "films" and we're using the sample data that came with Solr download.
Create the core, local command line
cd source/solr/solr-8.0.0 bin/solr create_core -c films
Create the core, dockerized command line version
docker exec -it solr bin/solr create_core -c films
Create the core, API version Don't have it documented yet
Edit the schema per the tutorial.
First using the GUI, add a field called "name" set type to "text_general" and uncheck "indexed" and "uninvertible".
Adding a "copy field" will copy all searchable data into one field called _text_ so that queries on anything work.
The "source:*" could be refined to search only selected fields.
curl -XPOST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 'http://localhost:8983/solr/films/schema'
Index some data
Add data to it, command line version
bin/post -c films example/films/films.json
Add data via curl, this does not work, the file is too big I think.
ogr2ogr -f "GeoJSON" taxlots_accounts.json taxlots_accounts.shp curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'
ogr2ogr -f CSV \ -sql 'select OBJECTID1 as id,TAXLOTKEY,MAPNUM,SITUS_ADDR,SITUS_CITY,OWNER_LINE,OWNER_LL_1,OWNER_LL_2,STREET_ADD,CITY,STATE,ZIP_CODE FROM taxlots_accounts' \ -lco geometry=AS_WKT -s_srs "EPSG:2913" -t_srs "EPSG:4326" \ taxlots_accounts.csv taxlots_accounts.shp
curl -XPOST -H 'Content-type: application/json' --data @taxlots_accounts.json 'http://localhost:8983/solr/taxlots/update/json/docs'
If I run this exact same command twice do I end up with twice the data or is it smart enough to overwrite old data? What if the schema changes?
Delete a core
Normally I do this from the Web admin page.
In Docker though the command would be:
docker exec -it solr bin/solr delete -c corename
I bet there is a curl command too.
Show the fields in a core's schema
In this case, for "films" core:
curl 'http://localhost:8983/solr/films/schema/fields'
Query
Go to the query page for taxlots and click "Execute Query". You'll get the first 10 records because q = *.*
Enter a query in the 'q' field. Try these
- owner:leornal -- search only the owner field
- leornal -- search everywhere
- 27539
- michelle && gardner
- "river point"
- "walter p"
- !state:or
- owner:null -- the string "null" not "no data in owner".
- wilson~ && !wilson -- find words that sound like wilson but are not wilson, for example "watson".
Okay that's enough of that let's move on; read the reference guide!
How to be a client
I have taxlot data loaded into a solr instance. I can query it from the web interface. All nice but actually quite useless to me. Now it's time to put together a web browser search tool. I will be doing this in my react-bootstrap-test app, it has a search menu item.
On the search page, I added a simple controlled text input box.
- I want it to do that cool command completion via ajax thing.
- I want it to build a results table once you pick something.
The query generator see previous section shows you how to construct queries as URLs. Here is one.
https://solr.wildsong.biz/solr/taxlots/select?q=sears
Try looking at this next: http://www.flax.co.uk/blog/2016/06/29/simple-solr-connector-react-js/