Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mount additional disks when using vz #1405

Merged
merged 1 commit into from
Mar 27, 2023

Conversation

pendo324
Copy link
Contributor

@pendo324 pendo324 commented Mar 2, 2023

This PR adds support for using the additional disks feature with Virtualization.framework. I noticed it was missing, so I figured I'd add it.

Considerations:

  • Virtualization.framework can only mount RAW disk images. The additional disk feature currently creates QCOW2 images. Since QEMU can support RAW images, is it ok to change the additional disk feature to create RAW images by default?
    • This change already included in this diff, but I can remove it
  • Should Lima automatically convert QCOW2 disks to RAW before trying to mount a QCOW2 additional with vmType: vz? There's already precedent for runtime conversion of disks in imgutil, just wanted to make sure we want to do it for additional disks before adding
    • Auto conversion is already included in this diff, but I can remove it
    • I was having issues doing an in place conversion (I saw in place conversion in other uses of imgutil.QCOWToRaw), causing data loss, which is why there's an intermediate file right now. Not sure if there's a better way to do this?

Test:

$ qemu-img info --output=json ~/.lima/_disks/test/datadisk
{
    "virtual-size": 53687091200,
    "filename": "/Users/alvajus/.lima/_disks/test/datadisk",
    "cluster-size": 65536,
    "format": "qcow2",
    "actual-size": 14221312,
    "format-specific": {
        "type": "qcow2",
        "data": {
            "compat": "1.1",
            "compression-type": "zlib",
            "lazy-refcounts": false,
            "refcount-bits": 16,
            "corrupt": false,
            "extended-l2": false
        }
    },
    "dirty-flag": false
}

# Create an instance using the default fedora.yaml template with addition of additionalDisks: - test (note: test was created as QCOW2 as seen above)
$ ./_output/bin/limactl start --name=fedora template://fedora
...
INFO[0129] READY. Run `limactl shell fedora` to open the shell.

$ ./_output/bin/limactl shell fedora
[alvajus@lima-fedora lima]$
[alvajus@lima-fedora lima]$ ls /mnt/lima-test/
lost+found
[alvajus@lima-fedora lima]$ echo "THIS IS MY TEST FILE" | sudo tee /mnt/lima-test/testfile
THIS IS MY TEST FILE
[alvajus@lima-fedora lima]$ cat /mnt/lima-test/testfile
THIS IS MY TEST FILE
[alvajus@lima-fedora lima]$ exit
logout

$ ./_output/bin/limactl stop fedora
...
INFO[0002] [hostagent] QEMU has exited

$ ./_output/bin/limactl remove fedora
...
INFO[0000] Deleted "fedora" ("/Users/alvajus/.lima/fedora")

# Modify fedora.yaml to still use the same disk, but also include vmType: "vz" and recreate the VM
$ ./_output/bin/limactl start --name=fedora template://fedora
? Creating an instance "fedora" Proceed with the current configuration
WARN[0000] `vmType: vz` is experimental
...
INFO[0146] READY. Run `limactl shell fedora` to open the shell.

$ ./_output/bin/limactl shell fedora
[alvajus@lima-fedora lima]$ ls /mnt/lima-test/
lost+found  testfile
[alvajus@lima-fedora lima]$ cat /mnt/lima-test/testfile
THIS IS MY TEST FILE
[alvajus@lima-fedora lima]$ exit
logout

$ qemu-img info --output=json /Users/alvajus/.lima/_disks/test/datadisk
{
    "virtual-size": 53687091200,
    "filename": "/Users/alvajus/.lima/_disks/test/datadisk",
    "format": "raw",
    "actual-size": 6107136,
    "dirty-flag": false
}

@AkihiroSuda AkihiroSuda added this to the v0.16.0 milestone Mar 3, 2023
@balajiv113
Copy link
Member

Should Lima automatically convert QCOW2 disks to RAW before trying to mount

IMO The goal of additionalDisks are to share disk between instances doing conversion from QCOW2 - RAW defeats that purpose

is it ok to change the additional disk feature to create RAW images by default

My take on this would be to provide support for creation of additionalDisks with formatType.

limactl disk create --type raw

If raw is used, it can be shared across drivers.

About default, my take would be to keep qcow2 as default as they have optimal storage.

pkg/vz/vz_driver_darwin.go Outdated Show resolved Hide resolved
pkg/vz/vm_darwin.go Outdated Show resolved Hide resolved
@pendo324 pendo324 force-pushed the additional-disk-vz-support branch from 602caab to 670a268 Compare March 3, 2023 20:53
@pendo324
Copy link
Contributor Author

pendo324 commented Mar 3, 2023

Thanks for the review, @balajiv113, I pushed a new revision to incorporate your comments.

Should Lima automatically convert QCOW2 disks to RAW before trying to mount

IMO The goal of additionalDisks are to share disk between instances doing conversion from QCOW2 - RAW defeats that purpose

is it ok to change the additional disk feature to create RAW images by default

My take on this would be to provide support for creation of additionalDisks with formatType.

limactl disk create --type raw

If raw is used, it can be shared across drivers.

About default, my take would be to keep qcow2 as default as they have optimal storage.

While QCOW2 is better optimized for storage, I wonder if RAW as a default is still viable given that RAW images can also be sparse?

RAW images are a little harder to deal with, in that if you (for example) cp them, the entire disk will be copied, even if its not all in use. This can be avoided by using something like rsync -S, but that's not exactly obvious.

The only reason I'm proposing this is for interoperability between QEMU and Virtualization.framework VMs, but its possible that Lima can just leave this for individual users to decide too.

Instead of automatic runtime conversion, I can:

  • remove the runtime conversion code
  • add the --format raw option to the disk create command
  • add notes on how to do the conversion manually, if you want to share data between the two VM types

Let me know if that sounds better, thanks!

@pendo324 pendo324 requested a review from balajiv113 March 3, 2023 21:03
@pendo324 pendo324 force-pushed the additional-disk-vz-support branch from 670a268 to ef0f9af Compare March 3, 2023 21:05
@jandubois
Copy link
Member

IMO The goal of additionalDisks are to share disk between instances doing conversion from QCOW2 - RAW defeats that purpose

How does it defeat the purpose? The images would still work with QEMU.

I just tested this by switching an Alpine instance from QEMU to VZ, which converted the diffdisk to RAW format. I made some changes to the image, stopped the instance and switched it back to QEMU. The disk remained in RAW format, but the modified data was still there and everything seemed to work normally.

What is the functionality we would lose by switching to RAW format?

@balajiv113
Copy link
Member

@jandubois

doing conversion from QCOW2 - RAW defeats that purpose

With automatic conversion we will endup with 2 different disks (one with qcow2 format and other will be raw).

From vz, whatever we change that won't be reflected to qcow2 disk.

As you said, we can mount qemu also with raw format in that case it is fine.
Considering this, it would be better to decide on the disk format on disk create itself limactl disk create --format raw instead of doing auto conversion to raw on the fly

@jandubois
Copy link
Member

With automatic conversion we will endup with 2 different disks (one with qcow2 format and other will be raw).

Oh, I didn't catch this. To me "automatic conversion" meant: the first time the disk is mounted to a VZ instance it will be replaced with the converted RAW disk, and from then on both QEMU and VZ instances would only use the RAW disk going forward. This should be transparent to the QEMU instances and guarantee data consistency because there is always only a single copy of the data.

I didn't look at the code in the PR yet, so I didn't realize this is not how it was implemented.

My main concern is about what happens if there is an error during the conversion (e.g. you run out of disk space). The old QCOW2 disk must remain in working state until the conversion has completed successfully.

Storage size

I don't have any knowledge about what advantages QCOW2 would have for storage size. Given that both formats have support for sparse files, how is QCOW2 optimized for storage? Do you have any references I can read up on this? Any comparison on storage efficiency and access speed (e.g. if QCOW2 would store data in compressed format, I would expect the disk to be slower than RAW).

@balajiv113
Copy link
Member

Oh, I didn't catch this

My bad, Am the culprit here. I overlooked and missed that the PR already takes care of moving QEMU disk to a new extension .qcow2. So it is as per you said only.

formats have support for sparse files

Again my bad :D I was looking into ls which was giving 100G for raw disks and qcow was showing only the used space and this was misleading. But checked the disk utility now it is fine.

@jandubois
Thanks for the details :) Its mostly things missed from my end. The current approach of using raw format looks good to me

@jandubois
Copy link
Member

I was looking into ls

This is a good point though: If you are only using QEMU, and you are not excluding the Lima VMs and disks from your backups, then you may end up backing up the full sparse file for a RAW disk if you have overallocated it significantly, and much of the space remains permanently unused.

So I think it makes sense to allocate additional disks as QCOW2 by default (but provide a --raw option to override), and auto-convert to RAW the first time it is mounted to a VZ instance. That way everyone automatically gets the best setup:

  • Users only using QEMU will still use QCOW2
  • Users using VZ (or both) will convert to RAW and have interoperability

So this would be my recommendation, as long as the auto-conversion to RAW is done in a way that any failure keeps the QCOW2 copy unharmed.


PS: Thinking about this some more, I think we are already currently at risk of damaging the VM disk if we ever run out of host disk space while trying to commit additional sparse data blocks. I wonder if limactl start should display a warning when you start an instance and the remaining free space is not enough to completely allocate the sparse space for the disk (plus an additional buffer of e.g. at least 1GB)?

@pendo324 pendo324 force-pushed the additional-disk-vz-support branch 2 times, most recently from 6271dbe to 07ca3a3 Compare March 6, 2023 19:33
@pendo324
Copy link
Contributor Author

pendo324 commented Mar 6, 2023

Thank you for the great discussion. QCOW2 disks are definitely more user friendly, in that users can just use default/familiar tools like cp and ls to interact with them. Even if they can be sparse, that might not be obvious on first glance.

I updated the code to allow a new --format option for disk create, it defaults to qcow2. Right now we only really need qcow2 or RAW, but I figured it might be good to make it more extensible now rather than changing it later if needed. Let me know what you think.

I also added a bunch of information on how to best deal with sparse RAW files to the docs. Let me know if it looks good, or if I should move that info to another location.

@jandubois
Copy link
Member

@pendo324 On macOS you can use clonefile (2) (via cp -c) to copy a sparse file. No need to use GNU tools or rsync. It even shares the data blocks between the source and destination until either of them is modified (copy-on-write), and of course preserves the sparse-ness as well.

@pendo324 pendo324 force-pushed the additional-disk-vz-support branch from 07ca3a3 to c803fe7 Compare March 6, 2023 20:44
@pendo324
Copy link
Contributor Author

pendo324 commented Mar 6, 2023

@pendo324 On macOS you can use clonefile (2) (via cp -c) to copy a sparse file. No need to use GNU tools or rsync. It even shares the data blocks between the source and destination until either of them is modified (copy-on-write), and of course preserves the sparse-ness as well.

Oh wow, I couldn't find anything like that on macOS when I looked through the docs for cp. This is much better for users. Thanks for the info! I just updated the docs.

pkg/driver/driver.go Outdated Show resolved Hide resolved
pkg/vz/vm_darwin.go Outdated Show resolved Hide resolved
@afbjorklund
Copy link
Member

I updated the code to allow a new --format option for disk create, it defaults to qcow2. Right now we only really need qcow2 or RAW, but I figured it might be good to make it more extensible now rather than changing it later if needed. Let me know what you think.

I used .vdi format for the VirtualBox driver, it should support raw as well - as long as you add a .img suffix to the file...

@pendo324 pendo324 force-pushed the additional-disk-vz-support branch from c803fe7 to 427f556 Compare March 7, 2023 17:39
@pendo324
Copy link
Contributor Author

pendo324 commented Mar 7, 2023

I updated the code to allow a new --format option for disk create, it defaults to qcow2. Right now we only really need qcow2 or RAW, but I figured it might be good to make it more extensible now rather than changing it later if needed. Let me know what you think.

I used .vdi format for the VirtualBox driver, it should support raw as well - as long as you add a .img suffix to the file...

I can add vdi to the list of supported formats, or add all of the formats from this list. Might be more suitable in a separate PR though?

balajiv113
balajiv113 previously approved these changes Mar 13, 2023
Copy link
Member

@balajiv113 balajiv113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@balajiv113
Copy link
Member

I can add vdi to the list of supported formats

I don't think it is required as of now.
Maybe once we support virtualbox driver then we can add the required format later in a separate PR

AkihiroSuda
AkihiroSuda previously approved these changes Mar 14, 2023
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda requested a review from jandubois March 14, 2023 00:22
@pendo324
Copy link
Contributor Author

Hey @jandubois, please let me know if there's any further changes you think should be made to get this merged when you get a chance. Thanks!

@jandubois
Copy link
Member

@pendo324 I've been extremely busy the last 2 weeks, so I barely looked at any outstanding code reviews; sorry about that!

I've looked at the PR now once more, and there is one thing I don't get: After converting the qcow2 disk to raw format, you rename the old disk to datadisk.qcow2:

$ ls -l ~/.lima/_disks/foo/
total 9360
-rw-r--r--@ 1 jan  staff  10737418240 17 Mar 21:36 datadisk
-rw-r--r--@ 1 jan  staff       196768 17 Mar 21:35 datadisk.qcow2
lrwxr-xr-x@ 1 jan  staff           24 17 Mar 21:35 in_use_by -> /Users/jan/.lima/default

What is the point of keeping the old disk around. It will never be accessed by Lima again, and there is no way delete it with any lima commands, except when you delete the raw disk with limactl disk delete .... Why don't you want to delete it right away, to save the additional disk space (in case it was converted after data was already copied into it)?


Probably unrelated to this PR: the first time I stopped the VM and tried to delete the disk, the in_use_by symlink was still in place. I don't understand how that could have happened, and also couldn't reproduce after force-deleting and re-creating the disk, and then starting/stopping the VM once more. Just in case anybody else has any ideas:

$ l stop
INFO[0000] Sending SIGINT to hostagent process 205
INFO[0000] Waiting for the host agent and the driver processes to shut down
INFO[0000] [hostagent] 2023/03/17 21:17:17 tcpproxy: for incoming conn 127.0.0.1:62994, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent
INFO[0000] [hostagent] Shutting down the host agent
INFO[0000] [hostagent] Stopping forwarding "/run/lima-guestagent.sock" (guest) to "/Users/jan/.lima/default/ga.sock" (host)
INFO[0000] [hostagent] Unmounting disk "foo"
INFO[0000] [hostagent] Unmounting "/Users/jan"
INFO[0000] [hostagent] Unmounting "/tmp/lima"
INFO[0000] [hostagent] Shutting down VZ
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE
INFO[0000] [hostagent] [VZ] - vm state change: stopped

$ l disk delete foo
WARN[0000] Skipping deleting disk "foo", disk is referenced by one or more non-running instances: ["default"]
WARN[0000] To delete anyway, run "limactl disk delete --force foo"

@pendo324
Copy link
Contributor Author

@pendo324 I've been extremely busy the last 2 weeks, so I barely looked at any outstanding code reviews; sorry about that!

No problem, thanks for taking a look at it again 👍.

I've looked at the PR now once more, and there is one thing I don't get: After converting the qcow2 disk to raw format, you rename the old disk to datadisk.qcow2:

$ ls -l ~/.lima/_disks/foo/
total 9360
-rw-r--r--@ 1 jan  staff  10737418240 17 Mar 21:36 datadisk
-rw-r--r--@ 1 jan  staff       196768 17 Mar 21:35 datadisk.qcow2
lrwxr-xr-x@ 1 jan  staff           24 17 Mar 21:35 in_use_by -> /Users/jan/.lima/default

What is the point of keeping the old disk around. It will never be accessed by Lima again, and there is no way delete it with any lima commands, except when you delete the raw disk with limactl disk delete .... Why don't you want to delete it right away, to save the additional disk space (in case it was converted after data was already copied into it)?

Good point, I just updated the PR to delete the old qcow2 disk after swapping the RAW disk to be named datadisk. I think only removing the qcow2 disk after the RAW disk is in a usable state makes it more of a transaction, and ensures that at least one disk is usable (or at least recoverable) if the rename / deletion somehow fails. Let me know what you think. The alternative is to just delete the qcow2 disk as soon as the RAW disk is created, I wouldn't be opposed to that either.


Probably unrelated to this PR: the first time I stopped the VM and tried to delete the disk, the in_use_by symlink was still in place. I don't understand how that could have happened, and also couldn't reproduce after force-deleting and re-creating the disk, and then starting/stopping the VM once more. Just in case anybody else has any ideas:

$ l stop
INFO[0000] Sending SIGINT to hostagent process 205
INFO[0000] Waiting for the host agent and the driver processes to shut down
INFO[0000] [hostagent] 2023/03/17 21:17:17 tcpproxy: for incoming conn 127.0.0.1:62994, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0000] [hostagent] Received SIGINT, shutting down the host agent
INFO[0000] [hostagent] Shutting down the host agent
INFO[0000] [hostagent] Stopping forwarding "/run/lima-guestagent.sock" (guest) to "/Users/jan/.lima/default/ga.sock" (host)
INFO[0000] [hostagent] Unmounting disk "foo"
INFO[0000] [hostagent] Unmounting "/Users/jan"
INFO[0000] [hostagent] Unmounting "/tmp/lima"
INFO[0000] [hostagent] Shutting down VZ
ERRO[0000] [hostagent] dhcp: unhandled message type: RELEASE
INFO[0000] [hostagent] [VZ] - vm state change: stopped

$ l disk delete foo
WARN[0000] Skipping deleting disk "foo", disk is referenced by one or more non-running instances: ["default"]
WARN[0000] To delete anyway, run "limactl disk delete --force foo"

Not sure why this would happen, I've never seen that before, but I'll try to look into it.

@pendo324 pendo324 dismissed stale reviews from AkihiroSuda and balajiv113 via c46bead March 20, 2023 15:50
@pendo324 pendo324 force-pushed the additional-disk-vz-support branch from 427f556 to c46bead Compare March 20, 2023 15:50
@pendo324 pendo324 force-pushed the additional-disk-vz-support branch from c46bead to eca1613 Compare March 20, 2023 15:51
@pendo324 pendo324 requested review from balajiv113 and AkihiroSuda and removed request for jandubois and balajiv113 March 20, 2023 16:52
@pendo324
Copy link
Contributor Author

Accidentally removed some reviewers and it seems like I can't re-request for some reason. All tests passed, please take a look at this and let me know if you have any additional requests when you get a chance, thanks! @AkihiroSuda @jandubois @balajiv113

Copy link
Member

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

I have one comment, but I'm also fine with merging the PR as-is.

Comment on lines +360 to +363
qcow2Path := fmt.Sprintf("%s.qcow2", extraDiskPath)
if err = imgutil.QCOWToRaw(extraDiskPath, rawPath); err != nil {
return fmt.Errorf("failed to convert qcow2 disk %q to raw for vz driver: %w", diskName, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we don't know that the current format is qcow2, except for the fact that limactl disk create currently only creates either raw or qcow2 formatted volumes. So this code could be kept more generic, like this:

Suggested change
qcow2Path := fmt.Sprintf("%s.qcow2", extraDiskPath)
if err = imgutil.QCOWToRaw(extraDiskPath, rawPath); err != nil {
return fmt.Errorf("failed to convert qcow2 disk %q to raw for vz driver: %w", diskName, err)
}
oldFormatPath := fmt.Sprintf("%s.%s", extraDiskPath, extraDiskFormat)
if err = imgutil.ConvertToRaw(extraDiskPath, rawPath); err != nil {
return fmt.Errorf("failed to convert %s disk %q to raw for vz driver: %w", extraDiskFormat, diskName, err)
}
...

Copy link
Contributor Author

@pendo324 pendo324 Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I created a new issue to track this, I'll implement it after we get this merged if that's alright #1438

@jandubois jandubois merged commit f94c8ec into lima-vm:master Mar 27, 2023
pendo324 added a commit to runfinch/finch that referenced this pull request Mar 28, 2023
Issue #, if available: Closes #218 

*Description of changes:*
Adds support for Lima's Virtualization.framework and Rosetta features,
through the use of new finch.yaml configuration options (`vmType` and
`rosetta`). `vmType` also sets the `mountType` to `virtiofs`, since that
is only available when using Virtualization.framework.

To support this, a few things needed to be changed on our side:
- Disk migration. Although lima-vm/lima#1405
(which adds persistent disk support to `vmType: vz` in Lima) will
auto-convert persistent disks from QCOW2 to RAW when they are attempted
to be used with `vmType: vz`, because of the way our disks are persisted
with symlinks, we also have to do this
- Move some hardcoded lima config from finch.yaml to be programatically
toggled in `pkg/config/lima_config_applier`
- This allows things like the qemu user mode scripts to be installed
only when they are needed. Installing them all the time, and then trying
to use Rosetta as a binformat_misc handler causes conflicts
- This also opens up possibilities of future customization based on
Finch's config values

Currently, because lima-vm/lima#1405 is not
merged yet, this PR references [a specific branch of my own finch-core
repo](https://github.com/pendo324/finch-core/tree/lima-with-vz-extra-disk)
which includes the Lima change already. It also edits the Makefile to do
a build of Lima from the submodule directly, and overwrite the Lima
downloaded from the archive. These changes will be removed once the Lima
change is merged upstream.

*Testing done:*
   - unit tests
   - e2e tests
   - local testing on Intel and Apple Silicon machines

- [x] I've reviewed the guidance in CONTRIBUTING.md


#### License Acceptance

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: Justin Alvarez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants