Docker Swarm

From Wildsong
Revision as of 21:51, 29 November 2023 by Brian Wilson (talk | contribs) (→‎Service migration plan)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Docker Swarm is an orchestrator for Docker containers and so is Kubernetes. Today I spend an hour examining Kubernetes and it just adds more complexity that I don't need.

I ran it for about a year at home then went back to Docker Compose. Swarm is extra work I don't need.

Initialization

Here is the command to turn Bellman into a Docker Swarm Manager and a Node. The command made me pick an ethernet address, because Bellman has more than one. This is Bellman's primary internet interface.

bellman> docker swarm init --advertise-addr 192.168.123.2
Swarm initialized: current node (isk0jocx0rb37yonoafstyvoj) is now a manager.

To add a worker to this swarm, run the following command:

   docker swarm join --token SWMTKN-1-5b81dywl9xkis6769fxnsvjahfy361w2kxkz69nc35bz3nxt6s-43jxeopl6inw8xur1vpcl23w7 192.168.123.2:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

I can add more nodes on other machines using that token. I won't be doing this today. It would look like this.

tern> docker swarm join --token SWMTKN-1-5b81dywl9xkis6769fxnsvjahfy361w2kxkz69nc35bz3nxt6s-43jxeopl6inw8xur1vpcl23w7 192.168.123.2:2377
This node joined a swarm as a worker.
bellman> docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE  VERSION
isk0jocx0rb37yonoafstyvoj *   bellman             Ready               Active              Leader              19.03.5
vjbx2h8n8280ecib2btzkwcxw     tern                Ready               Active                                  18.09.1

Create a network

bellman> docker network create -d overlay --attachable testing
shaboxhgakqer14j1ve7zyysj

The "attachable" option is for containers not yet running in swarm. I will need that soon, when I run bash in Debian for tests.

Spinning up my first Swarm-managed container

To try things out I will spin up 4 copies of a simple web server. When I add Tern it will spread them over the two nodes. I want it to use that "testing" network.

docker service create --name web --replicas 4 -p 80:80 --network testing --detach nginx:latest

Now I have 4 copies of nginx running. They were published on port 80 but that's inside the funny swarm network, how to see them? Well, they are also exposed directly on the host when I used the -p "publish" option. I can do "curl http://localhost". I can get the id (or just use the name "web") and then kill them off,

docker service ls
docker service rm pmbrvm6wow7q
curl http://localhost

When I do "curl" with the nginx replicas shut down, I see the page served by Varnish (still running in Compose), it's showing me the Home Assistant instance. So I guess the swarm takes precedence over whatever is running in Compose. This bit me when I accidentally masked Psono by putting the test for nginx on port 81. That's where I run Psono.

I also tried skipping the "--network testing" parameter and everything still worked. Creating a separate network allows me to isolate containers, just like with Docker and Docker Compose. For example, there is no reason for my Home Assistant and Pihole containers to know about each other.

docker run -it --rm --network testing debian:bullseye bash
# apt update
# apt install -y bind9-dnsutils
# nslookup web
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   web
Address: 10.0.1.27
# apt install -y curl
# curl http://10.0.1.27
# curl http://web/

The standard nginx page is returned from curl both times, so I know it's hitting a replica and running under the name "web", which is what I assigned. Inside the container I can see my local LAN too, for example from the Debian instance I can "curl http://bellman.wildsong.biz:8123/" and get the Home Assistant page. So far, all this is easy.

Healthcheck

With nginx I can create my own Docker image and bake the healthcheck right into the Docker image. I know "curl" is not the best way to do this. https://blog.sixeyed.com/docker-healthchecks-why-not-to-use-curl-or-iwr/ but for now it's what I am using!

In my Dockerfile, I have this

FROM nginx:latest
HEALTHCHECK CMD curl --fail http://localhost || exit 1
$ docker build -t wildsong/nginx .
$ docker service rm web
$ docker service create --name web --replicas 1 -p 80:80 --network testing --detach wildsong/nginx
$ docker ps | grep web
01a3f36f7580   wildsong/nginx:latest      "/docker-entrypoint.…"   About a minute ago   Up About a minute (healthy)   80/tcp   web.1.ssssfmwmwp8je7g1dnsabevew

When I create the service, I will get a warning because I have not pushed that image (wildsong/nginx) to a registry, but it still works because I am running only one node for now. When I do "docker ps" I can see that the container is marked as "healthy".

The service is bound to the IP address I created with the "docker swarm init" command. So I can hit it with the localhost address or the one I specified, with nginx, pointing a browser at http://192.168.123.2/ works.

I suspect this means Varnish will redirect traffic to it??? https://bellman.wildsong.biz/ should work if that's true. It does not, for some reason it's going to Home Assistant.

Okay. Some rule in Varnish was kicking in, not sure what, but I added Bellman support in there and now it's working as expected.

Replicas 0, Problems 1

Earlier in my testing regimen I was encountering creating a stack and having lots of services running but showing "Replicas 0/1". This means, nothing running. When I searched for the containers with "docker ps", there was nothing. I kept going and got past this. When I figure out in 5 minutes what I did wrong I will write it up here. But that is what 0/1 means: nothing running for that service.

Service migration plan

I moved most of the services I normally use on Bellman into the Swarm to try it out, for about a year.

  1. Varnish (and hitch)
  2. Pihole
  3. Psono (Including PostgreSQL) (retired 9/9/2023 in favor of Bit Warden.)
  4. Unifi (retired 9/9/2023)
  5. Logitech Media Server (Squeezebox)
  1. Home Assistant (Including Mosquitto) This cannot move to Swarm because it uses a USB device currently for Zigbee.

Psono has to be accessible from the Internet but none of the others do; some things are in Varnish but really currently only for testing.

I see a copy of mysql, I wonder who that belongs to.

Before I can proceed, I need persistent data, and I need to deal with these issues.

Persistence: How do volumes work in a swarm?

Normally my life revolves around file storage. I never noticed there are also block storage and object storage options. For example you can use a block device and then use the btrfs driver in Docker. There's a thing called the "devicemapper". All so exotic, beyond my attention span presently. Go read here: httpgeners://docs.docker.com/storage/storagedriver/select-storage-driver/

On a Docker Compose set up, the Docker Engine manages the volumes and I generally ignore how it does that. These are called "local" volumes.

With Swarm I would have to consider what happens if I have replica servers spread across several physical servers. But I don't.

Refer to Nigel Poulton's "Docker Deep Dive" chapter 13. He suggests putting the volumes onto a shared NFS server and briefly says you can corrupt files quickly this way. Yeah I can see that. Ha. Fortunately I normally mount most volumes READ ONLY when I can.

Bring in Compose

By that I mean I want to deploy a stack of containers using a docker-compose.yml file as the configuration. So far I have not needed it, if I start just one container per project then "docker service" commands are fine.