Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file volume driver still has "file pool cannot export dirty volumes" issue #7325

Closed
Rudd-O opened this issue Mar 6, 2022 · 18 comments
Closed
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue.

Comments

@Rudd-O
Copy link

Rudd-O commented Mar 6, 2022

How to file a helpful issue

Qubes OS release

4.1

Brief summary

Trying to cloning my Fedora (F34 at this point) template produces the error above.

Examination of the source code:

    def export(self):  # pylint: disable=invalid-overridden-method
        if self._export_lock is not None:
            assert self._export_lock is FileVolume._marker_running, \
                'nested calls to export()'
            raise qubes.storage.StoragePoolException(
                'file pool cannot export running volumes')
        if self.is_dirty():
            raise qubes.storage.StoragePoolException(
                'file pool cannot export dirty volumes')
        self._export_lock = FileVolume._marker_exported
        return self.path

Hits here:

    def is_dirty(self):
        if self.save_on_stop:
            with suppress(FileNotFoundError), open(self.path_cow, 'rb') as cow:
                cow_used = os.fstat(cow.fileno()).st_blocks * BLKSIZE
                return (cow_used > 0 and
                        (cow_used > len(EMPTY_SNAPSHOT) or
                         cow.read(len(EMPTY_SNAPSHOT)) != EMPTY_SNAPSHOT or
                         cow_used > cow.seek(0, os.SEEK_HOLE)))
        return False

Here is what's causing the issue -- the private volume:
Screenshot_20220306_143102

@Rudd-O Rudd-O added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Mar 6, 2022
@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

private-cow.img does not start with the characteristic SnAp junk. It's all NULLs at the beginning. WTAF.

@marmarek
Copy link
Member

marmarek commented Mar 6, 2022

It is created as a sparse file, so it isn't surprising it's NULLs. I wonder why 1 block is used instead of 0. Can you find anything non-NULL there? cow_used > len(EMPTY_SNAPSHOT) fails - not because it's equal, but because it's less.
Maybe the check should also include comparing magic at the beginning? But still, without magic (never used with dm-snapshot), usage should be 0...

Have you ever started this particular template?

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

Yes of course I have started the template, this is my default template for all VMs.

At the time of check, no loopback devices show this private-cow.img file in use.

Can I just delete the stupid file? How do I zero it out? Obvs if there is only one disk block in use, it can't contain any valid data that can get lost.

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

Checking the stuff in the file...

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

I have truncated the file to zero bytes. It still shows 1 block used with du. WTF.

@marmarek
Copy link
Member

marmarek commented Mar 6, 2022

I have truncated the file to zero bytes. It still shows 1 block used with du. WTF.

That's definitely source of confusion here... What FS? ext4?

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

OK I know what it is. A file created in a ZFS file system will always show at least one block of usage, even if it was truncated to zero bytes, or any N bytes. Just tested it against tmpfs and in tmpfs the result is 0 instead of 1. When creating a new sparse file with truncate, the file always occupies 1 block, no matter what apparent size I give it.

The code depends on file system implementation details. That's the bug.

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

If we still want to depend on file system side effects, then maybe it has to be special-cased so that the expectation "used disk space is zero" gets a new "used disk space is 1 block", a case in which we know cannot be a valid file system backing device anyway.

@marmarek
Copy link
Member

marmarek commented Mar 6, 2022

Yeah, sounds like that's it. Maybe we can use SEEK_DATA / SEEK_HOLE to check for emptiness of a sparse file, instead of checking used blocks? Alternatively, some special casing for 1 block, like you said here.
@rustybird do you have some ideas?

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

my snippet of code xx.py yields False as expected for is_dirty when I check for > 512 instead of > 0. That would solve my immediate problem.

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

Aha! Seek data works. If the file is truly empty you get an exception:

print(cow.seek(0, os.SEEK_DATA))
Traceback( most recent call last):
...
OSError: [Errno 6] No such device or address

Let me get you some code.

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

def has_any_data(fobject):
    """Returns True if the file has any non-hole data, False if it has no data."""
    pos = fobject.tell()
    try:
        # Move file pointer to the beginning of the next data byte.
        cow.seek(0, os.SEEK_DATA)
        # There was a data byte -> the file is non-empty.
        return True
    except OSError as e:
        if e.errno == 6:  # errno.ENXIO
            # The file was completely empty (zero bytes or 100% sparse).
            return False
        # Oh noes!
        raise
    finally:
        # Restore the position of the file to preserve the invariant.
        fobject.seek(pos, 0)

This. Replace checks for st_blocks stat member with a call to this function.

@marmarek
Copy link
Member

marmarek commented Mar 6, 2022

According to lseek(2), you should get specifically ENXIO error if no data is there.

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

Yes, and that return value applies for all of tmpfs, ext4 and zfs as I tested.

EDIT: this looks promising, and it doesn't look like we need to change any other uses of st_blocks stat member anywhere else in this repo, or even tests. Should I make a PR? Or should I leave this into your hands?

@marmarek
Copy link
Member

marmarek commented Mar 6, 2022

Please open a PR :)

@DemiMarie
Copy link

@Rudd-O and please finish the ZFS pool if you want to keep using ZFS; the file pool will go away in R4.2.

@Rudd-O
Copy link
Author

Rudd-O commented Mar 6, 2022

Asks fulfilled.

@andrewdavidwong andrewdavidwong added C: core diagnosed Technical diagnosis has been performed (see issue comments). pr submitted A pull request has been submitted for this issue. labels Mar 6, 2022
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Mar 6, 2022
@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
@andrewdavidwong
Copy link
Member

Closing as completed. If anyone believes this issue is not yet completed, or if anyone is still affected by this issue, please leave a comment, and we'll be happy to reopen it. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue.
Projects
None yet
Development

No branches or pull requests

4 participants