Bellman

From Wildsong Wiki
Jump to: navigation, search

Bellman is a Debian Linux server. It is on a UPS and lives in my electronics lab.

Bellman.jpg

"What's the good of Mercator's North Poles and Equators,
Tropics, Zones, and Meridian Lines?"'
So the Bellman would cry: and the crew would reply
"They are merely conventional signs!"
--Lewis Carroll, The Hunting of the Snark

Services that run here

I am in the process of migrating some/most? of these services to run in Vagrant/Virtualbox machines.

  • Asterisk to run our phones
  • Festival for text to speech in Asterisk
  • mysql for asterisk and Owncloud
  • Owncloud server
  • Nginx for munin and Owncloud
  • cups to share Brother and Canon printers
  • Logitech squeezebox server SqueezeBox See Streaming media for installation notes.
  • ssh to allow remote access
  • fail2ban to cut off break in attempts via ssh
  • UPS daemon to monitor UPS
  • Netatalk to support Timemachine backups

Additional software tools installed

  • X11 desktop so I can use it from my workbench.

History

2016-10-16 - Seeing disk errors in the WDC. It's 6 years old! REPLACE!!! Installed new Seagate Barracuda ST2000DM006 2TB $70 10-26-16 Added a fan in the hard drive section of the case, too.

2016-01-26 - Installed VirtualBox 5.0.14 and Vagrant 1.8.1 (from DEB files, repos are too old) and started migration of services.

2015-12-?? - Moved to hardware formerly used for Vastra2

2015-07-10 - Added lm-sensors and added temperature tracking to Cacti.

2015-07-01 - Replaced APC UPS with Cyberpower. Installed monitoring software.

2015-06-19 - reconnected the MX330 printer and shared it.

2015-06-18 - upgraded to Debian 8 Jessie

2013-12-29 - returned from X-Mas and discovered Bellman won't boot. Snarks about a degraded RAID. Darn.

2013 Mar - Installed Linux Mint 14 so that I could use Makerware with my new Replicator 2

2013 Jan - Seagate Barracuda 2TB Green drive died. ST2000DL003 S/N 5YD77CTE Replaced with a Barracuda 2TB mirror

2011 Dec - Been doing PostGIS experiments so I upgraded the hardware.

2010 Jan - I just started this section but I have had this machine online for at least a couple years now.

2015-06-19 back up

Note this includes /home but not /green.

cd /
tar --one-file-system czvf /mnt/bellman_root.tar.gz .

2013-12-29 Rescue from boot fail

I no longer need a desktop environment on the small server, because I moved my main desktop next to the 3D printer. So I put Debian back on the server again. So I am going to try a Debian rescue image.

Diagnosis

Step 1. Build rescue thumbdrive. Download from http://debian.osuosl.org/ and copy image to thumbdrive

sudo cp debian-live-7.2-amd64-rescue.iso /dev/sdX
sudo sync
sudo eject /dev/sdX

where X is the appropriate drive letter, do NOT use the wrong letter!

Step 2. Boot Bellman with the thumbdrive

Step 3. Look around

Using hdparm -i

  • sda Vertex SSD S/N OCZ-9UDI676M56Z4IR8P
  • sdb Seagate 2TB ST2000DM001-9YN164 S/N Z240BVP5
  • sdc Seagate 2TB ST2000DM001-9YN164 S/N Z240A0H1
  • sdd rescue drive
# fdisk -l /dev/sda

Disk /dev/sda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders, total 234441648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009c7c9

  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   218460159   109229056   83  Linux
/dev/sda2       218462206   234440703     7989249    5  Extended
/dev/sda5       218462208   234440703     7989248   82  Linux swap / Solaris

sdb and sdc don't have partition tables as they are used in a RAID (see 2013 Jan entry)

See LVM page

cat /proc/mdstat 
Personalities : [raid1] 
md126 : active raid1 sda[1]
      117218240 blocks [2/1] [_U]
      
md127 : active raid1 sdb[0] sdc[1]
      1953514496 blocks [2/2] [UU]
      
unused devices: <none>

mdadm --detail /dev/md126
/dev/md126:
        Version : 0.90
  Creation Time : Thu Feb 21 06:23:36 2013
     Raid Level : raid1
     Array Size : 117218240 (111.79 GiB 120.03 GB)
  Used Dev Size : 117218240 (111.79 GiB 120.03 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 126
    Persistence : Superblock is persistent

    Update Time : Thu Feb 21 06:30:49 2013
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 9f48e120:81a0f612:edd8d016:611227ea
         Events : 0.12

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        0        1      active sync   /dev/sda

mdadm --detail /dev/md127
/dev/md127:
        Version : 0.90
  Creation Time : Mon Jan  7 04:12:45 2013
     Raid Level : raid1
     Array Size : 1953514496 (1863.02 GiB 2000.40 GB)
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 127
    Persistence : Superblock is persistent

    Update Time : Mon Dec 30 17:21:21 2013
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 462f6c0c:68770b3a:b268e686:64f77a36
         Events : 0.131

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc

Looks like there are 2 RAID's, and md126 is the broken one. It should be the SSD and something else? Time to open the box and see what's in there.

fdisk /dev/md126

Command (m for help): p

Disk /dev/md126: 120.0 GB, 120031477760 bytes
255 heads, 63 sectors/track, 14592 cylinders, total 234436480 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009c7c9

      Device Boot      Start         End      Blocks   Id  System
/dev/md126p1   *        2048   218460159   109229056   83  Linux
/dev/md126p2       218462206   234440703     7989249    5  Extended
/dev/md126p5       218462208   234440703     7989248   82  Linux swap / Solaris

Command (m for help): 

Conclusion - I was planning on doing RAID mirror and never got the second drive installed. I think I might have used it in Stellar instead. Steller's drive failed and needed immediate replacement. Something failed on the SSD and now it's not booting, but this has nothing to do with the hardware from what I can tell. It complains about the RAID missing a drive but that's not new.

2014 Jan 01 rebuild

Do as in the Linux Mint section below

Also note:

PRESERVE MYSQL!!

/etc/hdparm.conf

2013 Jan data mirror build

apt-get install mdadm lvm2
mdadm --create --metadata=0.90 --level=mirror --raid-devices=2 /dev/md0 /dev/sdb /dev/sdc
cat /proc/mdstat 
pvcreate /dev/md0 
vgcreate vg_mirror /dev/md0 
lvcreate --verbose --extents 100%FREE -n lv_mirror vg_mirror
mkfs.ext4 /dev/vg_mirror/lv_mirror 
mount /dev/vg_mirror/lv_mirror /green
dd if=/dev/zero of=/green/swapfile1 bs=1024 count=1048576

2013 Mar Linux Mint rebuild

Had to install mdadm and lvm2 but then it recognized the LVM drives All I had to do was mount the RAID on /green.

sudo apt-get install synaptic nfs-kernel-server ssh mysql-server phpmyadmin ntp winbind smartmontools postfix

Re-install dropbox

Re-install squeezeboxserver from Logitech. http://bellman:9000/

Set up cups again

Copy over /etc/exports file

Need AFP support for Apple Timemachine. See Netatalk 3 on Debian

December 2011 upgrade

Bellman had an Intel Little Falls Atom 230 mini-itx main board + 2GB RAM until Dec 2011. Bellman used to be an Athlon desktop system, I recycled the name because I like it.

Hardware

Newegg 10/16/2016 Inv 143374043

  • sdb = Seagate BarraCuda ST2000DM006 2TB 64MB (Installed 10-26-16)

Newegg 11/21/2014 Inv #120335149

  • SUPERMICRO SYS-5018A-FTN4 1U Rackmount Server Barebone FCBGA 1283 DDR3 1600/1333
  • SUPERMICRO MCP-220-00051-0N Single 2.5" Fixed HDD Mounting Bracket
  • Kingston 8GB 204-Pin DDR3 SO-DIMM ECC Unbuffered DDR3 1600 (PC3 12800) Server Memory Model KVR16LSE11/8KF
  • sda = Samsung MZ7WD120HCFV-00003 120GB

eth0 00:25:90:F7:37:72

Bellman is configured to bring up a management interface on this ethernet interface too. (Optionally there is a separate management interface. This server has 5 ethernet ports, 4 on the motherboard and 1 on the management card.)

Operating system

  • Debian 8

Using BTRFS now on the Seagate drive. Sort of just to be consistent with what is on Tern though this is not RAID 0. Just one drive. I partitioned the Seagate this time, partition 1 could be a 50GB OS install, 2 is 50GB swap, and 3 is data (/green)

fstab


Printing

Canon MX330 "All in one" -- CUPS finds and sets it up if you plug it in and power it on.

This is my current /etc/cups/printers.conf

# Written by cupsd
# DO NOT EDIT THIS FILE WHEN CUPSD IS RUNNING
<Printer Brother_HL-2140_series>
UUID urn:uuid:24067d9a-1b41-370d-5ecf-dbb408aaa659
Info Brother HL-2140 series
Location Electronic Chronometry Laboratory
MakeModel Brother HL-2140 Foomatic/hl1250
DeviceURI usb://Brother/HL-2140%20series?serial=J8J894840
State Idle
StateTime 1388723199
Type 8433668
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy stop-printer
</Printer>
<Printer MX330-series>
UUID urn:uuid:54a86dc0-0994-37af-7d65-f084999a7307
Info Canon MX330 series
Location Electronic Chronometry Laboratory
MakeModel Canon PIXMA MX330 - CUPS+Gutenprint v5.2.9
DeviceURI usb://Canon/MX330%20series?serial=22F601&interface=1
State Idle
StateTime 1388637781
Type 4
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy retry-job
</Printer>

Software

Media server: it hosts my music collection. I keep the files in MP3 format, having transferred them from my CD's using grip. Music collection

File server: I keep my home directory here and NFS mount it on the desktop machine Raven. Bellman also runs Samba so that my laptop can access files on it.

I edit files with emacs23

Spin down the Seagate drive

To reduce wear on the spinning hard drive, I am setting "apm" down to 127 (default is 254) so that it can spin down the drive. I use the server mostly for ownCloud and media storage so it can go to sleep at night and during long breaks this should make it last longer.

smartctl -s apm,127 /dev/sdb

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
APM set to level 127 (intermediate level with standby)

smartctl -A /dev/sdb

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   100   006    Pre-fail  Always       -       92858576
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       149013
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       619
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       32
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   056   045    Old_age   Always       -       35 (Min/Max 19/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       67
194 Temperature_Celsius     0x0022   035   044   000    Old_age   Always       -       35 (0 17 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       619 (178 156 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1169083792
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       156038821

Automatic boot

I know it's possible to get this system to boot every day at a specific time because it's set to do that right now. I cannot find the setting! It's not in BIOS anywhere that I can see and I can't find in ipmitool either.

Sometimes I shut Bellman down at night, but it needs to boot in the morning before we get up so that the Logitech radio will work. The way to set it is NOT from BIOS, there is no user interface there. It's not from the IPMI web page either.

Maybe it's in here http://www.accuratesolution.net/asd/resume.htm

IPMI

Tips from Oracle: http://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html

Using ipmitool you can connect remotely so if the system is off you can turn it on. This means I could just script the turn on from another server...

ipmitool -H 192.168.1.3 -U ADMIN -P password chassis status
System Power         : off
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : always-off
Last Power Event     : 
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false

Read environmental sensors

ipmitool -I lanplus -H 192.168.1.3 -P password -U ADMIN sdr elist full

CPU Temp         | 01h | lnr |  3.1 | 36 degrees C
System Temp      | 0Bh | ok  |  7.1 | 33 degrees C
Peripheral Temp  | 0Ch | ok  |  7.2 | 34 degrees C
DIMMA1 Temp      | B0h | ok  | 32.64 | 29 degrees C
DIMMA2 Temp      | B1h | ns  | 32.65 | No Reading
DIMMB1 Temp      | B4h | ns  | 32.68 | No Reading
DIMMB2 Temp      | B5h | ns  | 32.69 | No Reading
FAN1             | 41h | ok  | 29.1 | 3200 RPM
FAN2             | 42h | ns  | 29.2 | No Reading
FAN3             | 43h | ns  | 29.3 | No Reading
VCCP             | 20h | ok  |  3.2 | 0.82 Volts
VDIMM            | 24h | ok  | 32.1 | 1.33 Volts
12V              | 30h | ok  |  7.17 | 12.32 Volts
5VCC             | 31h | ok  |  7.33 | 4.95 Volts
3.3VCC           | 32h | ok  |  7.32 | 3.30 Volts
VBAT             | 33h | ok  |  7.18 | 2.97 Volts
5V Dual          | 37h | ok  |  7.15 | 4.95 Volts
3.3V AUX         | 38h | ok  |  7.12 | 3.28 Volts
Chassis Intru    | AAh | ok  | 23.1 | 

System event log (SEL)

ipmitool -I lanplus -H 192.168.1.3 -P password -U ADMIN sel list last 10

  43 |  Pre-Init  |0004692099| Unknown #0xff |  | Asserted
  44 |  Pre-Init  |0004692100| Unknown #0xff |  | Asserted
  45 |  Pre-Init  |0004692106| Unknown #0xff |  | Asserted
  46 |  Pre-Init  |0004692108| Unknown #0xff |  | Asserted
  47 |  Pre-Init  |0004692109| Unknown #0xff |  | Asserted
  48 |  Pre-Init  |0004692110| Unknown #0xff |  | Asserted
  49 |  Pre-Init  |0004692116| Unknown #0xff |  | Asserted
  4a |  Pre-Init  |0004692118| Unknown #0xff |  | Asserted
  4b |  Pre-Init  |0004692119| Unknown #0xff |  | Asserted
  4c |  Pre-Init  |0004692120| Unknown #0xff |  | Asserted

Backups

I am about to try Using Bacula for backups