Bellman

From Wildsong
Jump to navigationJump to search

Bellman is a very small Linux server. Currently its primary job is to run Docker containers.

"What's the good of Mercator's North Poles and Equators,
Tropics, Zones, and Meridian Lines?"'
So the Bellman would cry: and the crew would reply
"They are merely conventional signs!"
--Lewis Carroll, The Hunting of the Snark

todo

backups

8/5/20 There is a partial rsync of Supermicro Bellman on Wenda in ~bwilson/bellman.

Audio notes

I have one pair of speakers in my lab but many computers.

Tenrec line out => line in Murre line out => line in Bellman line out => Speakers

This worked on the ASrock hardware but the NUC has no Line In. I need a mixer. In theory I can do all this with software, right? Can I send all audio from Windows to Pulse Audio on Bellman? I think I can from Tenrec anyway.

Magic command from https://askubuntu.com/questions/211136/get-the-audio-from-line-in-to-output-to-the-speaker

#manually start the module-loopback
pactl load-module module-loopback
#configure your system to load module-loopback on startup
#this places load-module module-loopback at the end of
#the /etc/pulse/default.pa pulseaudio configuration file.
sudo sh -c ' echo "load-module module-loopback" >>  /etc/pulse/default.pa '

Now sound should be passed through as above. Since Bellman is always running, he gets to be connected directly to the speakers.

pavucontrol

Software

Ubuntu Server 20.04 LTS

Custom install

Node.JS LTS to support an agent for Sematext Cloud,

https://github.com/nodesource/distributions/blob/master/README.md

apt packages

This is a list of things added after a basic installation of Ubuntu. The goal this time out is to put every service into a Docker container, so packages added is minimal.

emacs-nox
docker
docker-compose
mlocate 
pulseaudio pulseaudio-utils avahi-daemon
pavumeter pavucontrol ubuntu-sounds
nfs-common (needed to mount synology volumes locally, notably /green/music)

From Supermicro Bellman

conda
yaml-mode
dnsutils
net-tools
nfs-common
vnc4server (this installs tiger vnc)

VNC server

On the temporary version I had running on Tern I did this

vncserver -localhost no -geometry=2048x1024 -depth 24
vncserver --list

Firewall

This is all different under Ubuntu. Hmm.

See https://blog.daknob.net/debian-firewall-docker/ for ideas.

I use my own bash script to load iptables rules. See /usr/local/bin, /etc/network, and /var/lib/vastra.

Printing

The Brother printer is currently connected to Wenda not Bellman. When it was, I found the Linux drivers for my HL-L2320D printer didn't work, so I set up a raw driver on Bellman and then used the appropriate driver (manually selected) on client computers. It works fine.

Allow remote access

cupsctl --remote-admin --remote-any --share-printers

I also had to edit and add to /etc/cups/cupsd.conf

HostNameLookups on

and then

systemctl restart cups

Back ups

Disk

sudo mkdir bellman
bwilson@bellman:/green/BACKUPS$ cd bellman
bwilson@bellman:/green/BACKUPS/bellman$ sudo rsync -av --exclude proc --exclude /var/tmp -exclude /proc --exclude /sys --exclude /dev --exclude /home --exclude /green / .

Back up mysql - important ones are asterisk and owncloud, everything else can go. Well okay I guess maybe phpmyadmin can stay too.

sudo mkdir bellman_mysql
cd bellman_mysql
for i in asterisk mysql owncloud phpmyadmin yaris ; do 
  mysqldump $i > $i.sql
done

BBR congestion

See https://www.cyberciti.biz/cloud-computing/increase-your-linux-server-internet-speed-with-tcp-bbr-congestion-control/ for example.

Is kernel ready?

uname -a
Linux bellman 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux
grep 'CONFIG_TCP_CONG_BBR' /boot/config-$(uname -r)
grep 'CONFIG_NET_SCH_FQ' /boot/config-$(uname -r)
egrep 'CONFIG_TCP_CONG_BBR|CONFIG_NET_SCH_FQ' /boot/config-$(uname -r)

sudo -s 
cat > /etc/sysctl.d/10-custom-kernel-bbr.conf <<EOF
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr
EOF

'''sysctl --system'''
* Applying /etc/sysctl.d/10-custom-kernel-bbr.conf ...
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
* Applying /etc/sysctl.d/30-postgresql-shm.conf ...
* Applying /etc/sysctl.d/99-sysctl.conf ...
net.ipv4.ip_forward = 1
* Applying /etc/sysctl.d/asterisk.conf ...
kernel.core_uses_pid = 1
kernel.core_pattern = /tmp/core-%e-%s-%u-%g-%p-%t
fs.suid_dumpable = 2
* Applying /etc/sysctl.conf ...
net.ipv4.ip_forward = 1

That's that.

Services that run here

  • git (see Running my own git server)
  • gpsd welll --- not in Astoria, not yet anyway. No antennas in this rental. :-(
  • cups to share Brother and Canon printers
  • ssh to allow remote access
  • fail2ban to cut off break in attempts via ssh

I wasted time trying to get "nut" installed to monitor the Cyberpower 1500AVR and failed. It's not worth the effort. (2017-09-07)

Network syslog

To allow devices such as [[Granstream GXV3240] phones to spit out logging information, I enable rsyslog from remote hosts: In /etc/rsyslog.conf you must uncomment 2 lines and restart rsyslogd:

# provides UDP syslog reception
module(load="imudp")
input(type="imudp" port="514")

Services that run in Docker containers

Install Docker: https://docs.docker.com/install/linux/docker-ce/debian/

Set up /etc/default/docker - use this

DOCKER_OPTS="--dns 192.168.123.2"

Use the docker "--restart" option to run services, it's far easier than messing with systemd config files.

service notes
Asterisk in Docker always a work in progress
Dnsmasq Stack deploy
ElasticSearch Kibana and ElasticSearch in compose.
GeoServer includes PostGIS, GeoServer, NGinx in compose.
Home Assistant Stack deploy includes Mosquitto and Node Red
Squeezebox Squeezebox server SqueezeBox See Streaming media for installation notes.
Nginx web server.
Unifi manage my Ubiquiti WiFi access point.
Vault secure storage of credentials.

Other things come and go, I am working on the GeoServer docker services now.

History

2020-08-05 Intel NUC edition hardware arrived.

2020-07-29 Bellman, Supermicro edition completely went dark. I set up Tern as a temporary replacement and ordered an Intel NUC.

2020-01-20 - started generating errors: NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0. Then it would not restart. I ended up replacing the CR2032 (did not help) and then reseating the connectors. Fixed. This system is getting long in the tooth.

2019-10-04 - fix for networking, applied on Dart too.

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

2019-10-03 - Upgraded to Debian Buster (10) and this broke Docker because of the move from iptables to nftables. Must fix ASAP. All Dockers are broken. How dismaying. Repeat after me: "It's only a hobby."

2019-05-09 - Added docker-compose for elasticsearch and per recommendations of ElasticSearch docker docs, changed vm.max_map_count; it was 65530 and I set it: sysctl -w vm.max_map_count=262144

2019-04-26 - Moving NVMe WDC Black drive from Murre to Bellman.

2018-03-20 - Installed 8TB Archive drive, for TimeMachine and Owncloud storage. Moved from 120GB SSD to 750GB Samsung Evo 840. Installed clean copy of Stretch on the SSD.

2017-09-06 - Upgrade to 32GB RAM, yay! I need to do something with all that space. I did move /tmp to RAM; see SSD optimizations. I also removed a lot of dead code including lightdm (how'd that get in there?)

bwilson@bellman:~$ free
              total        used        free      shared  buff/cache   available
Mem:       32937080     2287376    27811208       25700     2838496    30153064


2017-08-25 - Migrated mariabdb and owncloud to Docker

2017-07-25 - Migrated logitech media server to Docker

2017-07-25 - Upgraded to Debian 9 (Stretch)

2016-10-16 - Seeing disk errors in the WDC. It's 6 years old! REPLACE!!! Installed new Seagate Barracuda ST2000DM006 2TB $70 10-26-16 Added a fan in the hard drive section of the case, too.

2016-01-26 - Installed VirtualBox 5.0.14 and Vagrant 1.8.1 (from DEB files, repos are too old) and started migration of services.

2015-12-?? - Moved to hardware formerly used for Vastra2

2015-07-10 - Added lm-sensors and added temperature tracking to Cacti.

2015-07-01 - Replaced APC UPS with Cyberpower. Installed monitoring software.

2015-06-19 - reconnected the MX330 printer and shared it.

2015-06-18 - upgraded to Debian 8 Jessie

2013-12-29 - returned from X-Mas and discovered Bellman won't boot. Snarks about a degraded RAID. Darn.

2013 Mar - Installed Linux Mint 14 so that I could use Makerware with my new Replicator 2

2013 Jan - Seagate Barracuda 2TB Green drive died. ST2000DL003 S/N 5YD77CTE Replaced with a Barracuda 2TB mirror

2011 Dec - Been doing PostGIS experiments so I upgraded the hardware.

2010 Jan - I just started this section but I have had this machine online for at least a couple years now.

2015-06-19 back up

Note this includes /home but not /green.

cd /
tar --one-file-system czvf /mnt/bellman_root.tar.gz .

2013-12-29 Rescue from boot fail

I no longer need a desktop environment on the small server, because I moved my main desktop next to the 3D printer. So I put Debian back on the server again. So I am going to try a Debian rescue image.

Diagnosis

Step 1. Build rescue thumbdrive. Download from http://debian.osuosl.org/ and copy image to thumbdrive

sudo cp debian-live-7.2-amd64-rescue.iso /dev/sdX
sudo sync
sudo eject /dev/sdX

where X is the appropriate drive letter, do NOT use the wrong letter!

Step 2. Boot Bellman with the thumb drive

Step 3. Look around

Using hdparm -i

  • sda Vertex SSD S/N OCZ-9UDI676M56Z4IR8P
  • sdb Seagate 2TB ST2000DM001-9YN164 S/N Z240BVP5
  • sdc Seagate 2TB ST2000DM001-9YN164 S/N Z240A0H1
  • sdd rescue drive
# fdisk -l /dev/sda

Disk /dev/sda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders, total 234441648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009c7c9

  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   218460159   109229056   83  Linux
/dev/sda2       218462206   234440703     7989249    5  Extended
/dev/sda5       218462208   234440703     7989248   82  Linux swap / Solaris

sdb and sdc don't have partition tables as they are used in a RAID (see 2013 Jan entry)

See LVM page

cat /proc/mdstat 
Personalities : [raid1] 
md126 : active raid1 sda[1]
      117218240 blocks [2/1] [_U]
      
md127 : active raid1 sdb[0] sdc[1]
      1953514496 blocks [2/2] [UU]
      
unused devices: <none>

mdadm --detail /dev/md126
/dev/md126:
        Version : 0.90
  Creation Time : Thu Feb 21 06:23:36 2013
     Raid Level : raid1
     Array Size : 117218240 (111.79 GiB 120.03 GB)
  Used Dev Size : 117218240 (111.79 GiB 120.03 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 126
    Persistence : Superblock is persistent

    Update Time : Thu Feb 21 06:30:49 2013
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 9f48e120:81a0f612:edd8d016:611227ea
         Events : 0.12

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        0        1      active sync   /dev/sda

mdadm --detail /dev/md127
/dev/md127:
        Version : 0.90
  Creation Time : Mon Jan  7 04:12:45 2013
     Raid Level : raid1
     Array Size : 1953514496 (1863.02 GiB 2000.40 GB)
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 127
    Persistence : Superblock is persistent

    Update Time : Mon Dec 30 17:21:21 2013
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 462f6c0c:68770b3a:b268e686:64f77a36
         Events : 0.131

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc

Looks like there are 2 RAID's, and md126 is the broken one. It should be the SSD and something else? Time to open the box and see what's in there.

fdisk /dev/md126

Command (m for help): p

Disk /dev/md126: 120.0 GB, 120031477760 bytes
255 heads, 63 sectors/track, 14592 cylinders, total 234436480 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009c7c9

      Device Boot      Start         End      Blocks   Id  System
/dev/md126p1   *        2048   218460159   109229056   83  Linux
/dev/md126p2       218462206   234440703     7989249    5  Extended
/dev/md126p5       218462208   234440703     7989248   82  Linux swap / Solaris

Command (m for help): 

Conclusion - I was planning on doing RAID mirror and never got the second drive installed. I think I might have used it in Stellar instead. Steller's drive failed and needed immediate replacement. Something failed on the SSD and now it's not booting, but this has nothing to do with the hardware from what I can tell. It complains about the RAID missing a drive but that's not new.

2014 Jan 01 rebuild

Do as in the Linux Mint section below

Also note:

PRESERVE MYSQL!!

/etc/hdparm.conf

2013 Jan data mirror build

apt-get install mdadm lvm2
mdadm --create --metadata=0.90 --level=mirror --raid-devices=2 /dev/md0 /dev/sdb /dev/sdc
cat /proc/mdstat 
pvcreate /dev/md0 
vgcreate vg_mirror /dev/md0 
lvcreate --verbose --extents 100%FREE -n lv_mirror vg_mirror
mkfs.ext4 /dev/vg_mirror/lv_mirror 
mount /dev/vg_mirror/lv_mirror /green
dd if=/dev/zero of=/green/swapfile1 bs=1024 count=1048576

2013 Mar Linux Mint rebuild

Had to install mdadm and lvm2 but then it recognized the LVM drives All I had to do was mount the RAID on /green.

sudo apt-get install synaptic nfs-kernel-server ssh mysql-server phpmyadmin ntp winbind smartmontools postfix

Re-install dropbox

Re-install squeezeboxserver from Logitech. http://bellman:9000/

Set up cups again

Copy over /etc/exports file

Need AFP support for Apple Timemachine. See Netatalk 3 on Debian

December 2011 upgrade

Bellman had an Intel Little Falls Atom 230 mini-itx main board + 2GB RAM until Dec 2011. Bellman used to be an Athlon desktop system, I recycled the name because I like it.

Hardware

Intel NUC edition, born 8/5/2020

  • Intel NUC10i5FNH (Newegg 7/29/20)
  • Kingston Technology Corp. HX429S17IBK2/32 32GB 2933MHZ DDR4 (2 16GB SODIMM) (Newegg 7/29/20)
  • WD Black 512GB Performance SSD - M.2 2280 PCIe NVMe Solid State Drive - WDS512G1X0C (moved from Supermicro)
  • Corsair Neutron 240GB, was in Dart once upon a time
  • Seagate Archive 8TB drive in external container

Spin down after 10 minutes and write cache ON for Seagate HD:

cat >> /etc/rc.local
hdparm -W1 -B120 /dev/sdb

Supermicro version, deceased 7/29/20

Local copy of motherboard manual File:SuperMicro SYS5018A.pdf

Audio output: NuForce UDAC 2

Newegg 03/26/2019 2 Noctua 40mm x 20mm system fans
Newegg 09/03/2017 Inv 153021116
Newegg 10/16/2016 Inv 143374043
Newegg 11/21/2014 Inv 120335149

  • SUPERMICRO SYS-5018A-FTN4 1U Rackmount Server Barebone FCBGA 1283 DDR3 1600/1333
  • SUPERMICRO MCP-220-00051-0N Single 2.5" Fixed HDD Mounting Bracket
  • 4 x Kingston 8GB 204-Pin DDR3 SO-DIMM ECC Unbuffered DDR3 1600 (PC3 12800) Server Memory Model KVR16LSE11 (3 added 2017-09-07)
  • sda = Samsung SSD 840 EVO 750GB
  • sdb = Seagate Archive 8TB (Installed 3/18/18, purchased 9/03/17)
  • WD Black 512GB Performance SSD - M.2 2280 PCIe NVMe Solid State Drive - WDS512G1X0C (moved from Murre)

eth0 00:25:90:F7:37:72

Bellman is configured to bring up a management interface on this ethernet interface too. (Optionally there is a separate management interface. This server has 5 ethernet ports, 4 on the motherboard and 1 on the management card.) Since it's a Supermicro possibly it can be pwned at any time by the Chinese. Since I don't use Bellman to control centrifuges or do weapons research I live with it.

Spin down the Seagate drive

To reduce wear on the spinning hard drive, I am setting "apm" down to 127 (default is 254) so that it can spin down the drive. This should make it last longer.

smartctl -s apm,127 /dev/sdb

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
APM set to level 127 (intermediate level with standby)

smartctl -A /dev/sdb

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   100   006    Pre-fail  Always       -       92858576
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       149013
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       619
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       32
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   056   045    Old_age   Always       -       35 (Min/Max 19/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       67
194 Temperature_Celsius     0x0022   035   044   000    Old_age   Always       -       35 (0 17 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       619 (178 156 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1169083792
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       156038821

Automatic boot

I know it's possible to get this system to boot every day at a specific time because it's set to do that right now. I cannot find the setting! It's not in BIOS anywhere that I can see and I can't find in ipmitool either.

Sometimes I shut Bellman down at night, but it needs to boot in the morning before we get up so that the Logitech radio will work. The way to set it is NOT from BIOS, there is no user interface there. It's not from the IPMI web page either.

Maybe it's in here http://www.accuratesolution.net/asd/resume.htm

IPMI

Tips from Oracle: http://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html

Using ipmitool you can connect remotely so if the system is off you can turn it on. This means I could just script the turn on from another server...

ipmitool -H 192.168.1.3 -U ADMIN -P password chassis status
System Power         : off
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : always-off
Last Power Event     : 
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false

Read environmental sensors

ipmitool -I lanplus -H 192.168.1.3 -P password -U ADMIN sdr elist full

CPU Temp         | 01h | lnr |  3.1 | 36 degrees C
System Temp      | 0Bh | ok  |  7.1 | 33 degrees C
Peripheral Temp  | 0Ch | ok  |  7.2 | 34 degrees C
DIMMA1 Temp      | B0h | ok  | 32.64 | 29 degrees C
DIMMA2 Temp      | B1h | ns  | 32.65 | No Reading
DIMMB1 Temp      | B4h | ns  | 32.68 | No Reading
DIMMB2 Temp      | B5h | ns  | 32.69 | No Reading
FAN1             | 41h | ok  | 29.1 | 3200 RPM
FAN2             | 42h | ns  | 29.2 | No Reading
FAN3             | 43h | ns  | 29.3 | No Reading
VCCP             | 20h | ok  |  3.2 | 0.82 Volts
VDIMM            | 24h | ok  | 32.1 | 1.33 Volts
12V              | 30h | ok  |  7.17 | 12.32 Volts
5VCC             | 31h | ok  |  7.33 | 4.95 Volts
3.3VCC           | 32h | ok  |  7.32 | 3.30 Volts
VBAT             | 33h | ok  |  7.18 | 2.97 Volts
5V Dual          | 37h | ok  |  7.15 | 4.95 Volts
3.3V AUX         | 38h | ok  |  7.12 | 3.28 Volts
Chassis Intru    | AAh | ok  | 23.1 | 

System event log (SEL)

ipmitool -I lanplus -H 192.168.1.3 -P password -U ADMIN sel list last 10

  43 |  Pre-Init  |0004692099| Unknown #0xff |  | Asserted
  44 |  Pre-Init  |0004692100| Unknown #0xff |  | Asserted
  45 |  Pre-Init  |0004692106| Unknown #0xff |  | Asserted
  46 |  Pre-Init  |0004692108| Unknown #0xff |  | Asserted
  47 |  Pre-Init  |0004692109| Unknown #0xff |  | Asserted
  48 |  Pre-Init  |0004692110| Unknown #0xff |  | Asserted
  49 |  Pre-Init  |0004692116| Unknown #0xff |  | Asserted
  4a |  Pre-Init  |0004692118| Unknown #0xff |  | Asserted
  4b |  Pre-Init  |0004692119| Unknown #0xff |  | Asserted
  4c |  Pre-Init  |0004692120| Unknown #0xff |  | Asserted