I’ve been using Home Assistant for a while now. It is not ideal, but it does work and you can pretty much make it do anything you need for your home automation.
My HA runs on a Raspberry Pi4, and I use a CC2531 ZigBee USB dongle to communicate with the Ikea and Philips lights, switches and motion sensors. It all works well. But the one downside of Raspberry Pi is the super slow micro SD card that it uses for storage.
A while ago I noticed that my setup was running slower than usual and the system graphs showed increase in IO wait.
This was interesting, as I have not done any changes, and while indeed the micro SD storage is slow, why would it suddenly become slower?
The Raspberry does some other things besides running HA, so there were multiple possibilities.
The system metrics were only telling me that there is IO wait, but who and why is responsible for it, was not quite clear.
There were no other obviously noticeable changes.
Thankfully there are the excellent Brendan Gregg’s perf-tools and I was eager to see how they will work on my RPi.
To my disappointment, some of the scripts work, some don’t, because the default Raspbian kernel is compiled without CONFIG_FTRACE_SYSCALLS
enabled.
First order of business was to build a new kernel. Thankfully the official documentation covers the RPi nuances and building it is straight forward.
Once I had the kernel with all ftrace awesomeness, I could do a bit of investigation.
Here’s the thinking.
Something is doing IO and generating IO wait. So it is probably either disk or ethernet. Since I have a slow micro SD storage, probably that is the culprit.
I want to know which process is responsible and what it is doing.
First, finding which process is doing the IO.
I used iosnoop
for this and sorted by latency
sudo ./iosnoop -Q 60 | sort -g -r -k 7 | head
This gave me:
fx@lighthouse:~/perf-tools $ sudo ./iosnoop -Q 60 | sort -g -r -k 7 | head hass 787 WS 179,0 20517872 4096 866.34 hass 787 WS 179,0 20514936 4096 830.58 hass 787 WS 179,0 20513552 4096 792.33 hass 787 WS 179,0 11370728 4096 754.18 hass 787 WS 179,0 22593888 4096 748.53 hass 787 WS 179,0 20534240 4096 718.24 hass 787 WS 179,0 10494416 4096 717.63 hass 787 WS 179,0 20534240 4096 697.45 hass 787 WS 179,0 10028824 4096 679.86 hass 787 WS 179,0 20534048 4096 668.75
So, OK, it is clear it is the HA loading my little RPi, but why and how come it did not do that before? What changed?
Now, my HA process was 528
, so I ran opensnoop
to find out what files it is manipulating and how slow it is.
One nuance, opensnoop
wants to use mawk
with arguments that did not work on my raspbian, so I replaced it with gawk
, which works great.
sudo ./opensnoop -p 528 -d 60
It gave me
fx@lighthouse:~/perf-tools $ sudo ./opensnoop -p 528 -d 60 Tracing open()s issued by PID 528 for 60 seconds (buffered)... COMM PID FD FILE hass 787 0x12 /home/ha/.homeassistant/home-assistant_v2.db hass 787 0x17 /home/ha/.homeassistant/home-assistant_v2.db-wal hass 787 0x18 /home/ha/.homeassistant/home-assistant_v2.db-shm hass 787 0x19 /home/ha/.homeassistant ....truncated.... Ending tracing...
SQLite database is the culprit. What’s inside of it?
I looked at what tables it has and did a simple count to see which are the biggest.
sqlite> .tables events recorder_runs schema_changes states sqlite> select count (*) from events; 228465 sqlite> select count (*) from states; 221744 sqlite> select count (*) from recorder_runs; 67 sqlite> select count (*) from schema_changes; 1
OK, so there you have it, states
and events
tables have grown rather big and this being and micro SD card, performance is what it is. The whole database file was ~ 500mb
.
I stopped HA, cleaned the tables by deleting oldest events and ran vacuum,
Afterwards performance was back to normal.
Guess this is what I get for not reading the fine manual on history for Home Assistant
Cleanup commands I used from events table:
delete from events where created < '2020-02-20 03:04:53.662688'; delete from states where last_updated < '2020-02-20 03:04:53.662688'; vacuum;
Perf-tools also comes to help with understanding the performance before and after.
Before cleaning the database:
fx@lighthouse:~/perf-tools $ sudo ./iolatency -Q 60 Tracing block I/O. Output every 60 seconds. Ctrl-C to end. >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | 1 -> 2 : 19 |###### | 2 -> 4 : 36 |########### | 4 -> 8 : 4 |## | 8 -> 16 : 7 |## | 16 -> 32 : 10 |### | 32 -> 64 : 133 |######################################| 64 -> 128 : 84 |######################## | 128 -> 256 : 65 |################### | 256 -> 512 : 108 |############################### | 512 -> 1024 : 65 |################### | 1024 -> 2048 : 4 |## |
And after:
fx@lighthouse:~/perf-tools $ sudo ./iolatency -Q 60 Tracing block I/O. Output every 60 seconds. Ctrl-C to end. >=(ms) .. <(ms) : I/O |Distribution | 0 -> 1 : 0 | | 1 -> 2 : 269 |######################### | 2 -> 4 : 410 |######################################| 4 -> 8 : 305 |############################# | 8 -> 16 : 341 |################################ | 16 -> 32 : 241 |####################### | 32 -> 64 : 396 |##################################### | 64 -> 128 : 80 |######## | 128 -> 256 : 27 |### | 256 -> 512 : 15 |## | 512 -> 1024 : 12 |## |
And now, it is also time to read the fine manual on how to properly configure history retention.