Once upon a time I'd set up my infrastructure to be running everything in one system, at least partly to cut down on power usage. It was a bit of a disaster.

WHY it was a bit of a disaster, though, is an interesting thing. See, I was going with a hypervisor on the bare metal (specifically, ESXi - I believe at the time it was 6.0), with my NAS in a VM on top of it and any other stuff I was running also on VMs. There were actually several iterations/configurations of this absolute nightmare.

  • First, I did it with Physical RDM - Raw Device Mappings. In theory, this takes a bare hard drive and just... passes it through to a particular VM. In reality too, actually. Result? Data on the drive can be read in another system without needing ESXi. It also runs like an absolute PIG.
  • Then, I just passed an HBA through to the VM (the "traditional" method for virtualizing a NAS that uses ZFS) and ran it that way. That worked fine. The problem was, that was ALL my bulk storage. Which meant I needed a small VM for the NAS boot drive that could sit on a single 2.5" drive (in my case a 64GB SSD), which needed to boot first, before ANY of the other VMs which were living on NAS storage could be powered up. In practice this was a nightmare after every reboot and/or power loss.

I'm once again considering this, but coming at it from a different angle this time. There's a couple of reasons for this, not least of which being hey, project! Funding for this is going to come from selling off a lot of stuff that I'm sitting on and not using at all. Still sorting out how I'm going to sell it all - I'm too lazy to pack and ship multiple servers and switches - but I'll get it figured out. I'm hoping to build a modern, high-power and high-function system that's also more efficient than my current setup.

But that's not the only motivation; I'm starting to have issues with Odin, my primary hypervisor. It's running ESXi, and functionality just seems to... lock up? All the VMs running on it are fine, including VCSA - I can access them all via web or ssh, all the services they're providing work fine, but VCSA can't communicate with the host its running from, and the host UI never completes login. Reboot fixes, but is a serious brute-force fix.

My plan as it currently exists is as follows:

  1. Software will be Ubuntu 20.04 running Cockpit, possibly Webmin (it's still janky), and specifically the Cockpit-ZFS-Manager plugin. Of course, that requires Ubuntu 20.04 to be out, but if I get the bits before April, I may just install as 19.10 and upgrade. On top of that will be probably Kimchi, but possibly something else; still researching it. Basically, I'm going to take my current NAS setup, update it, and add hypervisor support. That way the hypervisor is also the storage source.
  2. Hardware: Going to go kind of all-out - for values. I'm planning to go with an Epyc CPU (as one might've guessed by the banner) and appropriate board. Ideally I'd like an Epyc 7302P, but that's pushing a thousand dollars, and I don't think I can bankroll that when I have to get a board and RAM. Ideally I'm going with 16 or more cores, at least vaguely decent clock speeds, and either an Asrock EPYCD8 or a Supermicro H11SSL-i for the motherboard. I may also look at getting a 4U chassis to replace my 3U SC836 chassis. I don't strictly speaking need the extra bays, but not only are they nice, it's hard to get an inexpensive graphics card (for Plex transcodes) that will fit in a 3U chassis - the coolers all stick up just a HAIR too far. We'll see. It's an expense I really don't need, so I probably won't.

Briefly, because I know someone will want to know, I'll talk about Proxmox - or specifically why I'm not just using it. Frankly, Proxmox's ZFS support in the Web UI is primitive at best - the same can NOT be said of Cockpit-ZFS-Manager, despite its being a very young project. Also, Proxmox is a hypervisor first, and anything else would be tacked on. That's exactly backwards of what I'm doing here; the Docker containers and storage are the primary items here, and KVM/hypervisor functions are to make life a bit simpler. Tacking on the primary goals is silly when it doesn't need to be done and gives no advantage.

Currently, I'm mostly just planning this out, doing inventory and figuring out how and where to sell most of this stuff to come out ahead. I suspect I'll end up documenting it all through here as I go.