Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing ZFS volume fails on OSv #918

Closed
miha-plesko opened this issue Oct 13, 2017 · 11 comments
Closed

Importing ZFS volume fails on OSv #918

miha-plesko opened this issue Oct 13, 2017 · 11 comments

Comments

@miha-plesko
Copy link

miha-plesko commented Oct 13, 2017

I have following usecase that I'd like to realize:

  1. prepare a volume test1.img with zpool named test1-zpool with mountpoint set to /dev/test1
  2. attach the volume to the OSv
  3. OSv should import the zpool and mount it to /dev/test1 automatically on boot

I'm able to prepare a volume as described using following commands:

$ sudo guestfish -N disk:2GB # <--- creates empty ./test1.img
$ sudo zpool create test1-zpool -m /dev/test1 $PWD/test1.img
$ sudo zpool export test1-zpool # <---prepares $PWD/test1.img for importing

But when I then attach it to the OSv and attempt to run the CLI, I get following error:

OSv v0.24-448-g829bf76
internal error: Value too large for data type
Aborted

[backtrace]
0x0000100000421e97 <???+4333207>
0x0000100000422716 <zpool_standard_error_fmt+742>
0x000010000041eec2 <zpool_import_props+1346>
0x000010000000615f <???+24927>
0x000010000000bde8 <???+48616>
0x0000100000004ac1 <???+19137>
0x000000000041b1b1 <osv::application::run_main(std::string, int, char**)+385>
0x000000000041da18 <osv::application::run_main()+568>
0x000000000020c5a6 <osv::application::main()+86>
0x000000000041dec2 <osv::application::start_and_join(waiter*)+322>
0x000000000041e32a <osv::application::run_and_join(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > > const*, waiter*)+234>
0x0000000000413334 <osv::run(std::string, std::vector<std::string, std::allocator<std::string> >, int*, bool, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > > const*)+36>
0x000000000042daee <???+4381422>
0x0000000000212b10 <do_main_thread(void*)+160>
0x000000000044d6d5 <???+4511445>
0x00000000003f5477 <thread_main_c+39>
0x00000000003957c5 <???+3758021>
0xc11879ffbeffffff <???+-1090519041>
0x00000000003f4b6f <???+4148079>
0xfb89485354415540 <???+1413567808>

Do you perhaps have any idea why an error like this would occur?

My debugging

In the OSv source code I can see that ZFS pools are attempted to be imported here:

osv/fs/vfs/main.cc

Lines 2195 to 2216 in 8f82c93

static void import_extra_zfs_pools(void)
{
struct stat st;
int ret;
// The file '/etc/mnttab' is a LibZFS requirement and will not
// exist during cpiod phase. The functionality provided by this
// function isn't needed during that phase, so let's skip it.
if (stat("/etc/mnttab" , &st) != 0) {
return;
}
// Import extra pools mounting datasets there contained.
// Datasets from osv pool will not be mounted here.
vector<string> zpool_args = {"zpool", "import", "-f", "-a" };
auto ok = osv::run("zpool.so", zpool_args, &ret);
assert(ok);
if (!ret) {
debug("zfs: extra ZFS pool(s) found.\n");
}
}

And if I perform the very same import operation on my host, everything seems to be working normally:

$ sudo zpool import -d .
   pool: test1-zpool
     id: 1371947131760445919
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	test1-zpool                                ONLINE
	  ./test1.img  ONLINE

so I'm not sure why OSv would have any problems with it and I kindly ask for help.

/cc @justinc1 @gberginc

@miha-plesko
Copy link
Author

miha-plesko commented Oct 13, 2017

Turns out that if we ZFS-format volume on Ubuntu, then it's unusable for OSv. But if we ZFS-format the volume inside OSv, then it will render as "damaged" for Ubuntu, while any OSv unikernel will be able to use it with no problems at all! @nyh was OSv's /zpool.so perhaps adjusted at any point of OSv development? Should OSv's ZFS implementation play well with any ZFS implementation, or did we make it OSv-specific?

I've also noticed that /zpool.so inside OSv has less ZFS features listed as supported, compared to zfs on my Ubuntu, but OSv crashes even if I disable them all (by providing -d flag to the zpool create command).

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented Oct 13, 2017 via email

@miha-plesko
Copy link
Author

Hi @wkozaczuk , please do send the notes that you've mentioned, I'm looking forward to read them! Currently my goal is to be able to prepare a volume on Ubuntu that I can then mount in the OSv (additional volume, not the one that the OSv boots from), so perhaps your approach is already good enough to cover my case.

@justinc1
Copy link
Contributor

I can't add any significant comment, but anyway. A long time ago (about 2 years?) I tried to prepare OSv image without booting it - e.g. mount ZFS on Linux host, and add some files. I don't really remember all problems I got, but I do remember that I quit that task. Similar to wkozaczuk - OSv didn't like the modified image.

@rickpayne
Copy link
Contributor

rickpayne commented Nov 7, 2017

When I've needed to mount the zfs filesystem from an image, this is what I've done:

sudo modprobe nbd max_part=63
sudo qemu-nbd -c /dev/nbd0 osv.img
sudo zpool import -d /dev osv
sudo zfs set mountpoint=/media/osv osv
sudo zfs mount osv/zfs

Then unmounting:

sudo zfs umount osv/zfs
sudo zpool export osv
sudo qemu-nbd -d /dev/nbd0

The problem is that once this is done, the resulting image is unusable for the reasons mentioned above.

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented Nov 7, 2017 via email

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented Apr 8, 2018

Recently I spent some time investigating this issue and found out following:

  1. OSv can actually mount ZFS pool created on host system. However it needs to be created using qemu-nbd. The original recipe that @miha-plesko tried to execute results in zfs pool with vdev of type file which simply does not work with OSv. OSv requires vdev of type disk which can be created like this:

sudo guestfish -N disk:128M
sudo qemu-img convert -O qcow2 test1.img test1_QCOW2.img
sudo qemu-nbd --connect /dev/nbd0 test1_QCOW2.img
sudo zpool create test1-zpool -d -m /test1 nbd0
sudo zfs create test1-zpool/test1
sudo umount test1-zpool/test1
sudo zpool export test1-zpool
sudo qemu-nbd --disconnect /dev/nbd0

The only problem is that any zpool or zfs command work really slow with qemu-nbd connected devices. I am not sure what the solution to this is. Or if there is a way to create ZFS pool with vdev of type disk without qemu-nbd.

  1. The OSv root ZFS disk could be mounted on host using qemu-nbd, modified and used on OSv again. However I discovered that mounting OSv ZFS image on host changes its vdev disk device name from /dev/vblk0.1 to /dev/nbd0p1. ZFS logic in OSv uses device name from disk vdev to find corresponding device in OSv which does not exist. That is why right now OSv fails to use ZFS file system mounted/modified on host. So ft I hard-code this line in https://github.com/wkozaczuk/osv/blob/debug_zfs/bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c#L70-L96 it all works:

error = device_open(vd->vdev_path + 5, DO_RDWR, &dvd->device);
//error = device_open("vblk0.1", DO_RDWR, &dvd->device);

I am not sure what the correct solution is. I am not sure how to enforce the vdev name to be /dev/vblk0.1 after it is mounted on host.

So overall it looks like that OSv ZFS is compatible with ZFS artifacts of newer version created/modfiied on host.

@justinc1
Copy link
Contributor

justinc1 commented Apr 9, 2018

Nice finding :) I tried

cat ./t0b.sh 
#!/bin/bash
set -eux
dd if=/dev/zero of=test1.img bs=1M count=128
sudo qemu-img convert -O qcow2 test1.img test1_QCOW2.img
sudo qemu-nbd --connect /dev/nbd0 test1_QCOW2.img
sudo zpool create test1-zpool -m /test1 nbd0 # -d
sudo zfs create test1-zpool/test1
#
echo "hello `date`" >> /test1/test1/hello.txt
#
sudo umount test1-zpool/test1
sudo zpool export test1-zpool
sudo qemu-nbd --disconnect /dev/nbd0

./t0b.sh # takes about 1 secunde

sudo ./scripts/run.py -d -nv -V --cloud-init-image ../test1_QCOW2.img  -e "/zpool.so list; /cli/cli.so"
...
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
osv          9.94G  7.96M  9.93G     0%  1.00x  ONLINE  -
test1-zpool   123M  1.23M   122M     1%  1.00x  ONLINE  -
...
/# cat /test1/test1/hello.txt
hello Mon Apr  9 10:52:26 CEST 2018

Seems it works without flaw for me. One detail, zpool create test1-zpool -d -m /test1 nbd0 gave me invalid option 'd'. I have zfs-fuse-0.7.2.2-6.fc27.x86_64 package installed.

@nyh
Copy link
Contributor

nyh commented Apr 15, 2018

Interesting find. If qemu-nbd on a qcow image is slow, I wonder if you can't do it much faster using a loop device (see https://en.wikipedia.org/wiki/Loop_device) on a raw image. E.g., something like: (I didn't try!)

dd if=/dev/zero of=test1.img bs=1M count=128
losetup /dev/loop0 test1.img
zpool create test1-zpool -m /test1 loop0
zfs create test1-zpool/test1

@wkozaczuk
Copy link
Collaborator

I think one implication of this finding is that the ZFS-based images in theory alternatively could be built on a host without having to run OSv with zpool/zfs, etc.

@justinc1
Copy link
Contributor

True. I was asked once to try to building OSv image without running it, but failed due to problems with ZFS not being valid for OSv anymore (after I hand-edited it).

@nyh nyh closed this as completed in c9640a3 Dec 23, 2019
nyh pushed a commit that referenced this issue Jul 17, 2022
The commit c9640a3 addressing the issue
#918, tweaked the vdev disk mounting logic to default to import the root
pool from the device /dev/vblk0.1. This was really a hack that was
satisfactory to support mounting a ZFS image created or modified on host.

However, if we want to be able to import root pool and mount ZFS
filesystem from arbitrary device and partition like /dev/vblk0.2 or
/deb/vblk1.1, we have to pass the specific device path to all places
in ZFS code where it references it. There are 4 code paths that end up
calling vdev_alloc() but unfortunately changing all relevant functions
and its callers to pass the device path would be quite untenable.

So instead, this patch adds new field spa_dev_path to the spa structure
that holds the information about the Storage Pool Allocator in memory.
This new field is set to point to the device we want to import the ZFS
root pool from in spa_import_rootpool() function called by ZFS mount
disk process and then used by vdev_alloc() downstream.

Refs #1200

Signed-off-by: Waldemar Kozaczuk <[email protected]>
Message-Id: <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants