complex technical fuck up.

Mar 4, 2022 | Linux, Server administration, Smart home, Technology | 1 comment

Ordinarily, I would say that adding bad language into a post title is something that should be avoided. But this was indeed a complex technical fuck up. Hey, on a slightly different topic, ever feel like walking into a meeting after something goes wrong and just saying bluntly something like: “That stupid fucken problem was a pain in my ass for days because the stars aligned to screw me. It’s like some divine ass hole thought to it’s self; “Hey Darragh over there is not quite busy enough. Let’s throw way more shit at him to see what happens”. Well better luck divine-o-ass. I’m not giving up that easily”. Damn. There were nested quotes in that rant. That’s a new best in shit writing isn’t it?

Okay. Okay. I’m calm now. I just needed to get that out of my system.

I hear you ask: who pissed in Darragh’s cornflakes this morning? Well, it was Docker. For the past two days.

Here is the rough outline of the crap I have had to deal with outside of work over the past few days.

Firstly, this all relates to HomeAssistant and docker.

  • It all started at the beginning of the week. I was at a wedding for two days and during that time, I noticed that the cert for HomeAssistant expired. That usually means that it has lost it’s connection to the cloud service. When I got back on Wednesday, I found that the subscription was valid but it had indeed lost the connection. I checked for logs that would indicate the source of the problem but there was no luck. Not a single log was written to suggest where the problem was. I was running on 2022-1 and 2022-3 was out so I suspected the container either needed a reboot or it needed the latest version installed. So that’s what I did. First, I restarted the container. That didn’t work. Second I updated. That didn’t work. Finally, I rebooted the host server. This is where the world went into free fall and everything broke.
  • The server came back up and I was met with a default “onboard” page for HomeAssistant. The air turned a shade of blue while I cursed. Thinking that it had reset the HomeAssistant install or something crazy like that. But no. I was able to find my files in the container. Here’s where everything went stupidly bad.
  • I have a few other things running on this Docker host. Yes. I know that really isn’t supported by HomeAssistant but I’m confident enough with Linux to make this work. I say this. But if you keep reading, you will seee that perhaps although I’m confident with Linux, maybe I have no right to be. Did I mess up? I’ll let you decide.
  • I ran docker ps to show the list of running containers. I could see hassio (short for home-assistant.io had four running containers. hassio_audio, hassio_multicast, hassio_samba and hassio_supervised. It looked like these containers were pointing back to where I had HomeAssistant stored. But it wasn’t picking up the right config. I thought to myself, where the hell are my other containers for Pihole, streaming and Unifi? But anyway. I didn’t think much about it. This is where I completely messed up. I should have stopped, thought and realized that if those containers were running they should be shown by docker ps.
  • I relinquished thought’s of this being a quick fix though and set up HomeAssistant as a new installation with the intention of restoring from a backup. Do you take backups? I do. Every night. I was thankful for this. Anyway, I keep rambling. I log into the new installation only to find HomeAssistant Supervisor isn’t available. This is a Core only install of HomeAssistant. Alarm bells begin rinning. Why the hell is this only the Core installation and where has my installation gone?
  • I try to completely uninstall this. Knowing that I had a full backup, I was willing to get a bit agressive at this point. The problem is I get an access denied error when I try to remove any of the containers with docker rm hassio_samba for example. I find that this is because of the hassio_apparmor service. But stopping it with systemctl stop hassio_apparmor.service doesn’t work. I found that it needs to be stopped with aa-teardown. Only then could I remove the containers.
  • So. I remove the containers and I try to install with this command:
    docker run -d –name=homeassistant –restart=always –network=host -v /etc/homeassistant:/config homeassistant/home-assistant:stable
    That didn’t work. I got errors like this:
    Failed to start hassio-apparmor.service: Unit hassio-apparmor.service has a bad unit file setting.
    I’m still not sure what caused that. But I moved on. I found that for some reason, the hassio_apparmor and hassio_supervisor files weren’t removed from /etc/systemd/system/ so I deleted these and the problem went away.
  • I was encountering lots of weird errors so I took a step back and started looking at everything on the server. During the small hours of this morning, I finally found something that triggered an oh crap moment. I found a tutorial that mentioned installing HomeAssistant from the snap store in Ubuntu. I know I didn’t do this. But while I was looking for HomeAssistant files during one of the many times I manually uninstalled this, I remember seeing files in /snap. So I had a moment of realization. Snap must be installed! Now, I have checked my .bash_history and that of the root account. Not once did I issue a command with the word snap in it. SO I have no idea why this is installed. I ran one command and this answered all my questions.
    whereis docker
    Sure enough. there’s a second binary for Docker in /snap/docker. Running
    snap list
    shows that snap-docker is installed.
  • I remove this:
    snap remove docker
    Then I reboot
  • Victory! Now I run docker ps and I see my missing docker containers such as the oen for Ubiquity, Pihole etc. I also see the docker containers for the propper installation of HomeAssistant. But here’s where I shot myself in the foot. I had completely mangled those containers while rampaging through the file system looking for and purging anything that could be causing conflicts during those times that I was encountering errors. The problem now is that the origional and correctly set up docker containers are completely messed up. I try reinstalling using the propper version of docker but the images and the containers are in a terrible state. I’m not able to reinstall because there are images that still exist in a partial or damaged state. (Yes. I really screwed this up didn’t I?). However, I can’t give up. I manage to delete the images by finding the ID’s of each image and passing them to the ps rmi command. Sometimes these had dependencies that couldn’t be removed because they were too mangled. So I used docker rmi -f (imageID).
  • Afterwork, I used updatedb and locate to find all existing homeassistant and hassio files related to a container. I manually removed these and started the installation again.
  • for the record, I find that the most reliable way to isntall the HomeAssistant with HomeAssistant-Supervisor docker containers is to use these Deb installers:

Don’t do what I did. After 3am this morning, I was tired and I installed the container first then the os agent. HomeAssistant complained that the supervisor wasn’t running in privledged mode. But a quick restart of the container fixed this.

What a complete pain in the ass. This blog post is long. But this pales in comparason to the hours and hours I spent on this until the early morning hours for the past few days.

I will say one more thing. I read a post a few months ago where someone said that they started off with a Combee II Zigbee USB device but then upgraded to something a little more serious. In my firm opinion, the Combee II stick is simply amazing and I doubt there is anything else on the market like it. I restored my HomeAssistant config and because the Combee II keeps an independent record of all the Zigbee devices that are connected to it, once the HomeAssistant config was reapplied, the Combee stick just worked. No fuss, no complaints. Having this independent bridge outside the HomeAssistant ecosystem has saved me from a lot of work twice now. Now, of course, I regularly take backups of that config as well. Just in case.

1 Comment

  1. Annie Mac

    Ouch!!!

    Reply

Leave a Reply to Annie Mac Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.