Maintenance late on a Sunday night. It’s like rushing to get your homework done before school.

Sep 22, 2019 | Scripting and programming, Server administration, Technology | 0 comments

So, what did you do tonight? watch something on the dodgy box? Read a book? Go for a walk?  As the dark nights role in, I’m a bit at a loss as to what to get up to when I get back from work and put the children to bed. But I’ll leave all that for another post.  Tonight I decided that it was way past time that I go looking around the various servers I run outside of work to make sure everything was running properly.  Some of these servers hadn’t been touched in 200+ days.  That’s not to say that they aren’t getting updates and that I don’t check in on them, I have a great system called Pulseway that I use for all that kind of thing but there are tasks that you should really check in from time to time and with 2019 being a very busy year, I haven’t really been keeping on top of everything. I’ve been trusting that the scripts and tasks I put in place were running as expected and if there was a failure, something would have notified me by now.

Right, let’s get to it. What did I check.  For the one or two people who might read this and be in any way interested.

  •  Database backups.  I use a tool that does exactly what it says in the name.  It’s called SQL Backup and FTP.  I love this tool. It’s simple, fast, reliable and just does what I need without any fuss. It cost me about €36 to subscribe for the year and in my opinion, it’s money well spent.  I keep database backups indefinitly for most databases.  Yes, this uses a lot of space, but some of the data changes regularly in the applications that I’m hosting so having the ability to go back and restore data from a year ago is very useful.  Of course, there are systems that can’t have backup retention for this long because of GDPR rules so their backups get tipically kept for a week or two at most.  I logged in tonight to make sure all the schedules and retention rules were working as expected.
  • File system backups.  I’m hosting around 4tb of data.  Every night I do a full backup.  I would only normally keep most of these backups for about a week. But I ship every weekly backup off to another server outside Ireland.  I’ve scripted all of this using powershell for the most part.  My bash scripts for backups that I wrote a good few years ago are still working with a few improvements here and there. But again, although I have monitoring and alerting uilt in, to reassure myself that everything was working as it should, I logged in to be absolutely certain.
  • It might seem absolutely crazy, but I also logged into the virtual machine hosts to make sure all their volumes, all the RAM and all the CPU’s were still present and accounted for.  I also checked the system, security and application logs for any weirdness.  Pulseway and other monitoring tools are fantastic but sometimes they can be a bit weird.  For example, in Nagios, you can lose an entire disk and not know about it unless you explicitly add that check in.  So, I always sleep a little better when I’ve checked the basics myself just to be absolutely certain.

So there you have it. I really am a complete and utter bore with nothing better to do on a Sunday night but check the health of my servers. In work, nearly 20,000 students are coming back to university tomorrow so when I’m in my 9 to 5 job, I really can’t afford to have something go bump in the night in my non 9 to 5 activities.  In saying that, all the checks in the world can’t account for someone allocating the same IP address as I’m already using to another customer. Yes. this happened twice to me in the past month and once before in the past year. Someone has also done an A and a B test on the main core switch in the datacentre and forgot that the switch is a temporary replacement and only has an A feed.   Oh, and because it was a temporary replacement, the brilliant system administrator working for the hosting company forgot to commit his changes when he configured the switch so I was off line for two hours while they fixed their silly mistake.  Sorry. I’ll stop ranting.

So, what did you do this evening? walk the dog, watch television, start a new series on Netflix, get the children’s lunches ready? Tell me in the comments. Or don’t. Because most sane people by now would have stopped reading this absolutely mind numming rubbish. 🙂

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.