Notebook Servers: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
Created page with "Basically Jupyter already runs as a server on your local machine, but now there are a bunch of other ways to run "notebooks". I am looking at alternatives to the ArcGIS Note..."
 
Brian Wilson (talk | contribs)
 
(20 intermediate revisions by the same user not shown)
Line 7: Line 7:
I've never had a need to license Docker, I just use the community version.
I've never had a need to license Docker, I just use the community version.


I started making a list of options but then I found https://datasciencenotebook.org/
=== Resources ===
which was created by someone at the Deepnote project.


I need a notebook server that supports Conda so that I can try installing arcgis.
''DataScienceNotebook'' -- I started making a list of options but then I found https://datasciencenotebook.org/
which was created by someone at the Deepnote project. I marched through the ones marked "OpenSource".
The problem is the emphasis here is not on Python notebooks. It's on "data science" notebooks,
some of which include Python support.


I need to be able to schedule jobs to run.
I found many links to people giving suggestions on how to set up just a plain old Jupyter server for example
[https://ipython.org/ipython-doc/3/notebook/public_server.html here.] I need other features though.


I just decided to look at Zeppelin first.
== My requirements ==


== Zeppelin ==
# Must support Conda so that I can install arcgis.
# Can I schedule jobs to run?
# Is there a dark mode?
# Can I store notebooks in git?
 
Okay maybe the last one is not a '''requirement'''.
 
I just decided to look at Deepnote first. It's running already on someone else's server.
 
 
== Binder ==
 
[https://mybinder.org Binder] appears to be 100% free, at least for today!
 
It is a project of Leibniz Institute for Social Sciences https://gesis.org/


https://zeppelin.apache.org/
I created a test project, set up a conda environment on my local machine, saved the environment,
pushed it up to a git repo, and then fed the repository URL to mybinder.org


docker run -p 8080:8080 --rm --name zeppelin apache/zeppelin:0.10.0
I was able to visualize a map in Visual Studio but not in the MyBinder browser window. I guess I can't have everything,
but I can run a Jupyter notebook and the arcgis module for free in a browser, that's still all very cool. Could be a great instructional tool.


Okay now what -- that worked. I can type Python in a browser window and run it.
My github repository is here: https://github.com/brian32768/hello-binder




Line 28: Line 47:


They don't charge for it, so does it do what I need?
They don't charge for it, so does it do what I need?
'''YES in fact it appears to check all the boxes.''' I have not tried storing a project in Github or running a local copy yet.
I used my brian32768@github  account to access it.
I used my brian32768@github  account to access it.


Install conda:
I was able to run an arcgis task in it.
 
Update, when I ran it today it ran for a few seconds and then said I had used up all my time for 24 hours. So, forget it.
 
=== Can I install arcgis module? ===
 
Of course you can.
 
Create a notebook and install conda:


<pre>
<pre>
Line 52: Line 82:
from arcgis import gis as GIS
from arcgis import gis as GIS
gis = GIS(portal="", username="", password="")
gis = GIS(portal="", username="", password="")
cm = gis.content
maps = cm.search("", item_type='Web Map', outside_org=False,max_items=-1)
thismap = 0
for map in maps:
    thismap += 1
    print(f"{thismap}: {map.title}")
</pre>
</pre>
Okay, so that took all of 10 minutes.
Point goes to Deepnote.
=== What about scheduling? ===
Yes, another point for Deepnote.
See [https://docs.deepnote.com/features/scheduling How to schedule a notebook]
=== Running locally ===
In theory I can [https://stackoverflow.com/questions/65151990/installing-conda-on-deepnote run in a Docker],
but I have to set up access to the Google docker repos.
It does not seem like they want me to do this. So I won't, I will use Zeppelin.
Put this in a Dockerfile
<pre>
FROM gcr.io/deepnote-200602/templates/deepnote
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
RUN bash ~/miniconda.sh -b -p $HOME/miniconda
ENV PATH $HOME/miniconda/bin:$PATH
RUN conda install python=3.7 ipykernel -y
RUN conda install <insert packages here> -y
RUN python -m ipykernel install --user --name=conda
ENV DEFAULT_KERNEL_NAME "conda"
</pre>
docker build -t deepnote .
It fails because I don't have access to the Google data. See https://docs.deepnote.com/integrations/google-container-repository
=== Git integration ===
You can store projects in github.
https://docs.deepnote.com/integrations/github
== Zeppelin ==
https://zeppelin.apache.org/
Can I do the same things I did in Deepnote to install the arcgis module?
Here is how I launch it to test it.
docker run -p 8080:8080 --rm --name zeppelin apache/zeppelin:0.10.0
I started a Jupyter note, typed in a couple lines of python, hit Shift-Enter and waited. On the first run it takes some time to load the Python kernel.
After that it was fast. The first line in a "note" needs to be %python so that it loads a Python kernel
It works the same way if you load a Python note directly, but no need to tell it the kernel name.
=== Conda ===
Run this in a PYTHON kernel, NOT a Jupyter kernel
works; installs into base, but it's an old version 1.6 - should be at least 1.8
TIP: NO NEWLINES!!!
%python.conda install -c esri arcgis
fails
%python.conda create --name gis -c esri arcgis
%python.conda activate gis
%python.conda list
== Polynote ==
Runs on Apache Spark.
Python depends on pip, strangely awkward. Moving on.
Install https://polynote.org/latest/docs/installation/
In Docker, they give me a blank page with an "edit" pencil. Huh. https://polynote.org/latest/docs/docker/
See also https://hub.docker.com/r/polynote/polynote :-)
And for actual instructions, see https://github.com/polynote/polynote/tree/master/docker
<pre>
cat > config.yml
listen:
  host: 0.0.0.0
storage:
  dir: /opt/notebooks
  mounts:
    examples:
      dir: examples
</pre>
Then run this; if you don't create 'notebooks', Docker will create it and it won't be writeable.
mkdir notebooks
docker run --rm -it -p 8192:8192 -p 4040-4050:4040-4050 -v `pwd`/config.yml:/opt/config/config.yml -v `pwd`/notebooks:/opt/notebooks/ polynote/polynote:latest --config /opt/config/config.yml
Then go to http://cc-testmaps:8192/
I might be able to create my own image with arcgis pre-installed in it?
I was able to download and install Miniconda interactively, which means I should be able to run it in a Dockerfile?
== JupyterHub ==
Looks insanely complicated.
== CoCalc ==
"On Prem" = $999 / year
The list says it's "Open Source" but look like that is no longer true.
== nteract (sic) ==
Not even sure what this is.
== Querybook ==
Forget this, there is no option to use Python. QueryBook is "science for dummies". Might be a great way to experiment with SQL queries.
It's friendly though. I wonder if I can tone that down: FRIENDLINESS_LEVEL=10 # Default:10 Set to an integer, 0-10
Looks like it wants a lot of memory.
<pre>
git clone https://github.com/pinterest/querybook.git
cd querybook
make
</pre>
http://cc-testmaps.clatsop.co.clatsop.or.us:10001/
Did not complete. If I can't start a service in a docker, I'm thinking it's time to move on. So unfair of me, and yet, I've already seen two candidates for this project that look promising, DeepNote and Zeppelin.
Okay okay I gave it a second try. I just want to see its web page. It's spitting out lots of warning messages but in fact it did start.
I can see redis, mysql, elasticsearch, a "scheduler", a "worker", and a web server.

Latest revision as of 22:55, 31 October 2022

Basically Jupyter already runs as a server on your local machine, but now there are a bunch of other ways to run "notebooks".

I am looking at alternatives to the ArcGIS Notebook Server because it's $20000 + $5000/year for what appears to be basically a Docker manager. Esri uses the commercial version of Docker, that means they have to license it from Mirantis.

I've never had a need to license Docker, I just use the community version.

Resources

DataScienceNotebook -- I started making a list of options but then I found https://datasciencenotebook.org/ which was created by someone at the Deepnote project. I marched through the ones marked "OpenSource". The problem is the emphasis here is not on Python notebooks. It's on "data science" notebooks, some of which include Python support.

I found many links to people giving suggestions on how to set up just a plain old Jupyter server for example here. I need other features though.

My requirements

  1. Must support Conda so that I can install arcgis.
  2. Can I schedule jobs to run?
  3. Is there a dark mode?
  4. Can I store notebooks in git?

Okay maybe the last one is not a requirement.

I just decided to look at Deepnote first. It's running already on someone else's server.


Binder

Binder appears to be 100% free, at least for today!

It is a project of Leibniz Institute for Social Sciences https://gesis.org/

I created a test project, set up a conda environment on my local machine, saved the environment, pushed it up to a git repo, and then fed the repository URL to mybinder.org

I was able to visualize a map in Visual Studio but not in the MyBinder browser window. I guess I can't have everything, but I can run a Jupyter notebook and the arcgis module for free in a browser, that's still all very cool. Could be a great instructional tool.

My github repository is here: https://github.com/brian32768/hello-binder


Deepnote

They don't charge for it, so does it do what I need?

YES in fact it appears to check all the boxes. I have not tried storing a project in Github or running a local copy yet.

I used my brian32768@github account to access it.

I was able to run an arcgis task in it.

Update, when I ran it today it ran for a few seconds and then said I had used up all my time for 24 hours. So, forget it.

Can I install arcgis module?

Of course you can.

Create a notebook and install conda:

# 1. Install Conda and make Conda packages available in current environment

!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!sudo bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

Install a package:

!sudo conda install -y arcgis -c esri

Use it:

from arcgis import gis as GIS
gis = GIS(portal="", username="", password="")
cm = gis.content
maps = cm.search("", item_type='Web Map', outside_org=False,max_items=-1)
thismap = 0
for map in maps:
    thismap += 1
    print(f"{thismap}: {map.title}")

Okay, so that took all of 10 minutes.

Point goes to Deepnote.

What about scheduling?

Yes, another point for Deepnote. See How to schedule a notebook

Running locally

In theory I can run in a Docker, but I have to set up access to the Google docker repos.

It does not seem like they want me to do this. So I won't, I will use Zeppelin.

Put this in a Dockerfile

FROM gcr.io/deepnote-200602/templates/deepnote
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
RUN bash ~/miniconda.sh -b -p $HOME/miniconda
ENV PATH $HOME/miniconda/bin:$PATH
RUN conda install python=3.7 ipykernel -y
RUN conda install <insert packages here> -y
RUN python -m ipykernel install --user --name=conda
ENV DEFAULT_KERNEL_NAME "conda"

docker build -t deepnote .

It fails because I don't have access to the Google data. See https://docs.deepnote.com/integrations/google-container-repository

Git integration

You can store projects in github. https://docs.deepnote.com/integrations/github

Zeppelin

https://zeppelin.apache.org/

Can I do the same things I did in Deepnote to install the arcgis module?

Here is how I launch it to test it.

docker run -p 8080:8080 --rm --name zeppelin apache/zeppelin:0.10.0

I started a Jupyter note, typed in a couple lines of python, hit Shift-Enter and waited. On the first run it takes some time to load the Python kernel. After that it was fast. The first line in a "note" needs to be %python so that it loads a Python kernel

It works the same way if you load a Python note directly, but no need to tell it the kernel name.


Conda

Run this in a PYTHON kernel, NOT a Jupyter kernel

works; installs into base, but it's an old version 1.6 - should be at least 1.8

TIP: NO NEWLINES!!!

%python.conda install -c esri arcgis

fails

%python.conda create --name gis -c esri arcgis
%python.conda activate gis
%python.conda list

Polynote

Runs on Apache Spark.

Python depends on pip, strangely awkward. Moving on.

Install https://polynote.org/latest/docs/installation/ In Docker, they give me a blank page with an "edit" pencil. Huh. https://polynote.org/latest/docs/docker/ See also https://hub.docker.com/r/polynote/polynote :-) And for actual instructions, see https://github.com/polynote/polynote/tree/master/docker

cat > config.yml
listen:
  host: 0.0.0.0

storage:
  dir: /opt/notebooks
  mounts:
    examples:
      dir: examples

Then run this; if you don't create 'notebooks', Docker will create it and it won't be writeable.

mkdir notebooks
docker run --rm -it -p 8192:8192 -p 4040-4050:4040-4050 -v `pwd`/config.yml:/opt/config/config.yml -v `pwd`/notebooks:/opt/notebooks/ polynote/polynote:latest --config /opt/config/config.yml

Then go to http://cc-testmaps:8192/

I might be able to create my own image with arcgis pre-installed in it?

I was able to download and install Miniconda interactively, which means I should be able to run it in a Dockerfile?

JupyterHub

Looks insanely complicated.

CoCalc

"On Prem" = $999 / year

The list says it's "Open Source" but look like that is no longer true.

nteract (sic)

Not even sure what this is.

Querybook

Forget this, there is no option to use Python. QueryBook is "science for dummies". Might be a great way to experiment with SQL queries.

It's friendly though. I wonder if I can tone that down: FRIENDLINESS_LEVEL=10 # Default:10 Set to an integer, 0-10

Looks like it wants a lot of memory.

git clone https://github.com/pinterest/querybook.git
cd querybook
make

http://cc-testmaps.clatsop.co.clatsop.or.us:10001/

Did not complete. If I can't start a service in a docker, I'm thinking it's time to move on. So unfair of me, and yet, I've already seen two candidates for this project that look promising, DeepNote and Zeppelin.

Okay okay I gave it a second try. I just want to see its web page. It's spitting out lots of warning messages but in fact it did start. I can see redis, mysql, elasticsearch, a "scheduler", a "worker", and a web server.