Centos Project Cancelled As Well As Kubernetes Project is Removing Docker Support
Recent news has informed that Devops and System administrators will be working extra for the next few months. In the build up to becoming defunct Centos is becoming as unstable as leading edge beta distros
Our server was becoming a bit sluggish for no good reason, so I figured that I should check how full the partitions were. I immediately saw that
$ df -h
Showed me that the /boot partition was %100. This likely happened due to a failed update. The command given all over the net no longer works on centos 8. But they also said to edit /etc/yum.conf set the number of kernels lower than it is then all should be fine.
Okay, I checked up how to remove some old kernels
command to remove kernels didnt work
figured /etc/yum.conf lowering the number of installonly_limit of kernels should help.
I announced to the universe that I will be doing maintenance in the morning, before people start working. I made a concrete plan to reboot, thus removing the offending kernels. Then I planned to give it a dnf update for good measure as long as was doing maintenance.
In the morning I woke up and got straight to work, reboot should be followed by dnf update, but the word “followed” took about 2 hours.
The Nightmare started only after I woke up
After rebooting /boot was still full. So I attempted to run dnf commands. However, I was repeatedly warned that it was going to delete the kernel
# dnf update Last metadata expiration check: 1:36:12 ago on Thu 17 Dec 2020 09:07:44 AM IST. Error: Problem: The operation would result in removing the following protected packages: kernel-core (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Apparently this error is caused by changing the yum.conf installonly_limit to too low, I think I set it to 1 .
It turns out that running the installonly_limit as part of dnf command will indeed work!
dnf remove --oldinstallonly --setopt installonly_limit=2
Immediately, the /boot partition was cleaned, now a new surprises waited for me.
Now ZFS Got Messed Up
Apparently a recent Centos update neutralized and utterly removed zfs and zpool packages so none of the Vms could boot without there storage.
Docker compose on the same server was using the zfs mount for storage, since the storage wasn’t there docker was kind enough to create an empty dir owned by root.
Docker loves to drop empty directories that were mistakenly designated as volumes in the wrong place. BTW, this is a feature. I already knew about these droppings, so I figured that I can’t mount the zfs with the docker dirictory dropped in the place that the real directory was hiding inside the broken zfs volume. Docker replacement CRI-io is not supposed to run as root like docker does, so I hope that whatever volumes it creates by mistake can be deleted for ordinary users.
At the end of this story, I forgot to run the docker compose that indeed is needed by other apps and users.
At some point in recent history, after the last reboot, the ZFS and ZPOOL yum packages were removed, due to incompatibility with the new kernel. Reminds me of suicidal Ubuntu Gnome of Grampa Yore.
# yum install zfs
worked but the kernel was unaffected, so there was still no access to the VM storage.
Research showed that ZFS does not yet officially support centos 8.3. (CENTOS 8 will NOT be LTS)
# modprobe zfs modprobe: FATAL: Module zfs not found in directory /lib/modules/4.18.0-193.10.el8.x86_64
modprobe zfs gave that nasty error. It also Did nothing, as the contents of the following was empty.
# lsmod | grep zfs
The solution Is To Install The Testing Version of ZFS.
After searching and stressing, I finally found out that an ordinary ZFS will not be supported in the now demoted Centos 8 LTS, which itself plans to beta stream.
So I ran:
yum install http://download.zfsonlinux.org/epel/zfs-release.el8_3.noarch.rpm sudo dnf config-manager --enable zfs-testing sudo dnf install kernel-devel zfs
Now the ZFSis back up , yay, thrills!
So I moved on now to try to launch my VMs
virsh start vmsteve
error: operation failed: guest CPU doesn't match specification: missing features: hle,rtm,tsx-
I guess that dnf update messed even more stuff up, even though this seems like a bug that was seen many months ago in Fedora, nothing prevented it from getting into Centos 8 LTS.
virsh edit vmsteve
showed that the vm should show some flexibilty since is it had:
virsh managedsave-dumpxml vmsteve .... <cpu mode='host-model' check='partial'> <topology sockets='2' dies='1' cores='1' threads='1'/> </cpu>
but apparently kvm changes this in virsh managedsave xml settings.
My older vms were able to come up after editing the managedsave xml <cpu> settings
virsh managedsave-edit vmsteve <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Cascadelake-Server</model> <vendor>Intel</vendor> <topology sockets='2' dies='1' cores='1' threads='1'/> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='umip'/> <feature policy='require' name='pku'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='xsaves'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='rdctl-no'/> <feature policy='require' name='ibrs-all'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='require' name='mds-no'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='hle'/> <feature policy='disable' name='rtm'/> </cpu>
however the newer vms had never been managesaved, in which case it did work to do (this didn’t work for the vms that had managedsave xmls.
virsh edit vmsteve
Putting the above xml in place of simple <cpu> settings worked.
Now my vms started.
Finally, No Network Connectivity
The Vms now started , but still they lacked network connectivity, not getting ip address and can’t communicate at all.
Alas! Firewalld was disabled, enabling it made everything work like before this saga. The Firewall handles the routing for the Vms. So they won’t get any ip address from LAN dhcp without the firewall working on the host.
systemctl enable firewalld --now
I fixed a little more stuff that I had expected to deal with from the beginning and everything was back as it was.
Salvation has now come to the land of Zion
ובא לציון גואל