Upgrading a ZFS Raid-1 Root Drive in Proxmox
When first making my Proxmox server I just used drives I had left over from upgrading previous builds, I had the boot drives as a set of 120gb Sandisk SSDs, 2 Crucial 250gb SSDs as a ZFS Raid-1 holding my VMs, and a 250gb Corsair Force MP510 setup as a Directory running backups because I thought it would help the speeds when backing up my VMs(it really didn't). As using an NVMe drive in the onboard m.2 slot on the Gigabyte B450M Aorus motherboard limits the number of SATA ports to 4 I found myself running low on disks, so the plan is to migrate the boot mirror(which Proxmox calls rpool) to the 2 Crucial SSDs and run most of the VMs directly off that.
The Proxmox official instructions for replacing a failed disk in the root pool covered majority of the process, but I had to take a few extra steps as I was upgrading the size of the pool. Once I made sure all the VMs were backed up on the nvme I pulled one of the Sandisks and opened up the shell in proxmox. Since both of my boot drives were currently fine the pulled Sandisk served as my failsafe incase I messed anything up in the process, as it wouldn't be touched until the process was complete.
root@einherjar:~# zpool status -v
pool: rpool
state: ONLINE
scan: resilvered 63.5G in 0 days 00:27:32 with 0 errors on Sat Apr 4 09:24:53 2020
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdd3 UNAVAILABLE 0 0 0
sdc3 ONLINE 0 0 0
Whatever disk you pull is going to show up unavailable. They show up here as partitions /dev/sdd3 and /dev/sdc3 but disk id's can also show up, just go off whatever your pool status shows. After identifying the pulled drive it needs to be put offline in the pool, run status again to confirm.
root@einherjar:~# zpool offline rpool /dev/sdd3
root@einherjar:~# zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 63.5G in 0 days 00:27:32 with 0 errors on Sat Apr 4 09:24:53 2020
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdd3 OFFLINE 0 0 0
sdc3 ONLINE 0 0 0
From here plug the replacement drive in(my Crucial 250 in this case) and get the disk location either in the Proxmox GUI or by running fdisk -l in the shell(we'll go with /dev/sdb for the guide)
Next step was to copy the partition table over from the current Sandisk 120gb SSD to the crucial. As the official wiki stresses, the order the drives are placed in the sgdisk command are important as doing it wrong will wipe your current boot disk and that pool, taking all the proxmox configs with it.
sgdisk --replicate=/dev/target /dev/source
So the target disk goes first, and the source second. In the shell in my case it goes
root@einherjar:~# sgdisk --replicate=/dev/sdb /dev/sdc
This should copy the partitions over. Now at this step if you are like me and upgrading the disks to bigger size, you will need to resize the partitions of the drive(which in my case in Proxmox is sdb3) as its copying over a 120gb partition over and that means we have roughly 130gb of free space on this drive that won't be used otherwise. I took the wikis advice and used parted to resize it, which I had to install.
Run parted first on your current root disk to see something similar to the following
root@einherjar:~# parted /dev/sdc
GNU Parted 3.2
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ATA CT250MX500SSD1 (scsi)
Disk /dev/sdc: 250GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 17.4kB 1049kB 1031kB bios_grub
2 1049kB 538MB 537MB fat32 boot, esp
3 538MB 249GB 248GB zfs
Of note is the disk size in line 7, and the end sector for partition 3 in line 15 as that is the partition we are going to resize. Run parted or fdisk on your new disk to verify the disk size and then run the following commands
root@einherjar:~# parted /dev/sdb
GNU Parted 3.2
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) resizepart
Partition number? 3
Then specify the size, which for me I input 248GB but in actuality it should have been 249GB, as looking at the free space available on the disk in fdisk I left 1gb out. So now the zfs pool partition on the disk is the correct size.
root@einherjar:~# sgdisk --randomize-guids /dev/sdb
The operation has completed successfully.
Wiki suggests to do this to avoid confusion in the server, and I didn't want to deal with a kernel panic so there it goes. And next step which didn't work for me
grub-install /dev/sdb
If this completes for you, great. If it doesn't, you need to fix this otherwise the drive won't be able to boot the OS as I came to find out. Easiest way I got this going was to just copy the working bios_grub and EFI partitions off the currently working boot drive, which are partitions 1 and 2. Thankfully after running sgdisk command the new drive has the same partitions in the same start sectors
# dd if=/dev/sdc1 of=/dev/sdb1
The above command tells dd to use /dev/sdc1 as input file and write it to output file /dev/sdb1. Do this again for partition 2, and it should be able to properly boot now.
We can now replace the pulled sandisk with the crucial drive in the rpool. Syntax for the command is
zpool replace <pool> <device> [new-device]
Where first device is the name of the drive you pulled, and new device is the one we've been formatting all this time. Again you need to write down the device exactly as it shows up in zpool status -v, so if the pool is listing disk-id's or UUIDs for either drive we are swapping go with that or it won't work.
root@einherjar:~# zpool replace rpool /dev/sdd3 /dev/sdb3
Make sure to wait until resilver is done before rebooting.
You might need a -f flag at the end of the replace line, if the shell prompts for it then add it in as I needed to do for this first swap. This will start the resilvering process where the data from the current root drive gets mirrored over to the new root drive. Do not power down the server until this is done. Check on the status and you should see a similar output to below depending on your drives/pool size
root@einherjar:~# zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Apr 4 09:27:53 2020
27G scanned out of 64G at 4.46M/s, 10m to go
26G resilvered, 40.60% done
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdc3 ONLINE 0 0 0
replacing-1 OFFLINE 0 0 0
sdd3 OFFLINE 0 0 0
sdb3 ONLINE 0 0 0 (resilvering)
When its finally done should see something similar to
root@einherjar:~# zpool status -v
pool: rpool
state: ONLINE
scan: resilvered 63.5G in 0 days 00:27:32 with 0 errors on Sat Apr 4 09:37:53 2020
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
sdc3 ONLINE 0 0 0
Power the system off and pull the old root drive and power it on to make sure the new drive boots and all your server settings are the same. If it did then great, next step is to repeat everything all over again with the 2nd new drive to copy everything over and resilver the pool again. This will give you a new boot pool with no errors in the zpool.
Last step is to turn on autoexpand for the pool, otherwise in my case I am stuck with 120gb available for use with VMs when the disks are obviously much bigger than that now. Running zpool list will confirm whats actually available for use. A very simple
root@einherjar:~# zpool set autoexpand=on rpool
As rpool is the name for the boot pool in Proxmox will turn autoexpand on(which when checking before was default to off) and running zpool list should now show a pool size almost equal to the size of your drives(barring a few MB for the boot and EFI partition)
root@einherjar:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 231G 87.4G 144G - - 2% 37% 1.00x ONLINE
With that my boot pool has now been upgraded and I have some more flexibility in my setup. After migrating the VMs over from the NVMe drive to the local-zfs pool on the boot drives I used fdisk to nuke it and rebooted it so changes would take effect, and then created a LVM storage on it to run VMs that can take advantage of the higher speeds of the drive, or ones that would be negatively affected by other VMs on the local-zfs storage causing an I/O delay. Anything like a Minecraft server or music/video server that is handling things in real time can suffer delays if even SSDs are busy with things like scheduled backups or heavy writes from 1 or more of the VMs running on it. Moving it to a separate drive will reduce that and most game servers in particular benefit from running off an NVMe drive when its available.
I replaced one of the Sandisks with a 1TB Hard Drive I pulled from a PS4 Pro when I upgraded it to a 2TB drive and made a new Directory called Backup with that, and even though its a 5400RPM drive I was still getting a constant 160MB in read/writes when backing up from the SSDs, and alittle faster when backing up from the VMs on the NVMe. Rebuilding is abit slower from it but since this is for home use it's a good trade-off in exchange for more space to hold backups. The 4th slot in my drive cage is just occupied by a sandisk I formatted empty for airflow purposes, but I already plan to replace it with another SSD later on when I take on my next hardware upgrade for the Proxmox server in trying to setup a separate Windows VM with GPU passthrough to play certain games I won't have access to when I migrate my desktop OS to Linux again.
I had an original boot disk and backups of all my VMs on a separate Directory so this was a good learning experience in managing zpools without much fear of losing everything. Since Proxmox is essentially Debian at its core the process carries over to other Linux distros that support ZFS so it's at least something that I can call back on going forward.