Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshotting root of drive fails when Volume Shadow Copy enabled #3842

Open
Hakkin opened this issue May 2, 2024 · 16 comments
Open

Snapshotting root of drive fails when Volume Shadow Copy enabled #3842

Hakkin opened this issue May 2, 2024 · 16 comments

Comments

@Hakkin
Copy link

Hakkin commented May 2, 2024

Snapshotting fails with ERROR upload error: The parameter is incorrect. when snapshotting the root of the drive with --enable-volume-shadow-copy set to always. Snapshotting a sub directory in the root works correctly. All commands were run with administrator privileges.

$ kopia --version
0.17.0 build: 89c8eb47af2e1d5c1d14fe299a0cf7eaac095abf from: kopia/kopia

$ kopia policy set --enable-volume-shadow-copy always C:\
Setting policy for Hakkin@desktop:C:\
 - setting "enable volume shadow copy" to always.

$ kopia snapshot create C:\
Snapshotting Hakkin@desktop:C:\ ...
creating volume shadow copy of C:\
new volume shadow copy id {8FB0579B-5364-4138-8CEA-A2CB2F553E42}
removing volume shadow copy id {8FB0579B-5364-4138-8CEA-A2CB2F553E42}
ERROR upload error: The parameter is incorrect.

$ kopia snapshot create C:\Users\
Snapshotting Hakkin@desktop:C:\Users ...
creating volume shadow copy of C:\
new volume shadow copy id {3FF5EB4F-8E6A-441A-87DD-A4975494BBBB}
 | 3 hashing, 1778 hashed (250 MB), 0 cached (0 B), uploaded 219.1 MB, estimating...
@tekert
Copy link

tekert commented May 3, 2024

Same case on windows server 2019, doesn't seem to happen on Windows 10 on my other pc.

@Hakkin
Copy link
Author

Hakkin commented May 3, 2024

After running with debugging, I may know the issue.

DEBUG shadow copy root is \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10
DEBUG snapshotted directory     {"path":".","error":"GetFileInformationByHandleEx \\\\?\\GLOBALROOT\\Device\\HarddiskVolumeShadowCopy10: The parameter is incorrect.","dur":"512.2µs"}

I believe I have ran into an issue like this before, it is caused by the shadow volume path not ending in \. For whatever reason Windows won't treat it as a folder without the ending \. You can observe the same behavior by trying to mount a Shadow Copy volume via mklink

$ mklink /j "vss" "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy4"
Junction created for vss <<===>> \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy4

$ cd vss
The parameter is incorrect.

$ rmdir vss

$ mklink /j "vss" "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy4\"
Junction created for vss <<===>> \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy4\

$ cd vss

$ cd
C:\...\vss

Notice the trailing \ on the second mklink command, without it, it doesn't work.

@Hakkin
Copy link
Author

Hakkin commented May 4, 2024

I tried to recompile with just a quick fix in upload_os_snapshot_windows.go by appending the \ manually, but it seems the path gets normalized somewhere, so it doesn't stick. Interestingly, I found a comment that seems to indicate a similar issue is known and is handled in other parts of the code, maybe it needs further handling to fix this issue as well:

// Paths such as `\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy01`
// cause os.Lstat to fail with "Incorrect function" error unless they
// end with a separator. Retry the operation with the separator added.

@Hakkin
Copy link
Author

Hakkin commented May 11, 2024

I've traced the source of the error to the (fsd *filesystemDirectory) Iterate() function here:

func (fsd *filesystemDirectory) Iterate(ctx context.Context) (fs.DirectoryIterator, error) {
fullPath := fsd.fullPath()
f, direrr := os.Open(fullPath) //nolint:gosec

The first time this is called on a VSS snapshot, the fullPath will be the root Shadow Copy volume without any trailing \, the os.Open will work fine, but then later in (it *filesystemDirectoryIterator) Next(), we will error here on a *os.File.ReadDir call:

func (it *filesystemDirectoryIterator) Next(ctx context.Context) (fs.Entry, error) {
for {
// we're at the end of the current batch, fetch the next batch
if it.currentIndex >= len(it.currentBatch) {
batch, err := it.dirHandle.ReadDir(numEntriesToRead)
if err != nil && !errors.Is(err, io.EOF) {
// stop iteration
return nil, err //nolint:wrapcheck
}

The underlying error is returned by windows.GetFileInformationByHandleEx in the stdlib ReadDir implementation for Windows.

I managed to write a patch that fixes this issue for me, but I'm not entirely convinced this is the right part of the code to fix it in. I'd love if someone more familiar with this part of the code could look into this more...

diff --git a/fs/localfs/local_fs_os.go b/fs/localfs/local_fs_os.go
index e790b5b3..2a3ffbb2 100644
--- a/fs/localfs/local_fs_os.go
+++ b/fs/localfs/local_fs_os.go
@@ -66,7 +66,21 @@ func (it *filesystemDirectoryIterator) Close() {
 func (fsd *filesystemDirectory) Iterate(ctx context.Context) (fs.DirectoryIterator, error) {
        fullPath := fsd.fullPath()

-       f, direrr := os.Open(fullPath) //nolint:gosec
+       // Trying to list the shadow volume root directory will fail unless it contains a trailing \
+       // We detect that here and fix it
+       var openPath = fullPath
+       if runtime.GOOS == "windows" {
+               const shadowVolumePrefix = `\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy`
+               if strings.HasPrefix(fullPath, shadowVolumePrefix) {
+                       remainder := fullPath[len(shadowVolumePrefix):]
+                       // Check if we're at the root of the shadow volume, any subdirectories should contain a path separator
+                       if !strings.ContainsRune(remainder, os.PathSeparator) {
+                               openPath = fullPath + string(os.PathSeparator)
+                       }
+               }
+       }
+
+       f, direrr := os.Open(openPath) //nolint:gosec
        if direrr != nil {
                return nil, errors.Wrap(direrr, "unable to read directory")
        }

@Whissi
Copy link

Whissi commented May 25, 2024

I also encountered this bug when attempting to create a backup of an entire volume using a Volume Shadow Copy, and then I found your bug report.

Thank you for the detailed analysis.

At first, I wondered why you didn't choose a similar construct as in the NewEntry function further down:

if err != nil {
// Paths such as `\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy01`
// cause os.Lstat to fail with "Incorrect function" error unless they
// end with a separator. Retry the operation with the separator added.
var e syscall.Errno
//nolint:goconst
if runtime.GOOS == "windows" &&
!strings.HasSuffix(path, string(filepath.Separator)) &&
errors.As(err, &e) && e == 1 {
fi, err = os.Lstat(path + string(filepath.Separator))
}

However, when I tried to apply the same solution, I realized why: As you correctly analyzed, the problem occurs when calling ReadDir -- at that point, it is not possible to respond to the issue, as the handler does not allow adjusting the path there.

Nonetheless, I slightly modified your proposed solution, as I don't see the necessity for introducing the remainder variable.
To stay consistent with the existing code style, I found the following solution better -- am I overlooking something?

--- a/fs/localfs/local_fs_os.go
+++ b/fs/localfs/local_fs_os.go
@@ -66,12 +66,24 @@ func (it *filesystemDirectoryIterator) Close() {
 func (fsd *filesystemDirectory) Iterate(ctx context.Context) (fs.DirectoryIterator, error) {
        fullPath := fsd.fullPath()

+       if runtime.GOOS == "windows" {
+               const shadowVolumePrefix = `\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy`
+
+               if strings.HasPrefix(fullPath, shadowVolumePrefix) &&
+                       !strings.HasSuffix(fullPath, string(filepath.Separator)) {
+                       fullPath = fullPath + string(filepath.Separator)
+               }
+       }
+
        f, direrr := os.Open(fullPath) //nolint:gosec
        if direrr != nil {
                return nil, errors.Wrap(direrr, "unable to read directory")
        }

-       childPrefix := fullPath + string(filepath.Separator)
+       childPrefix := fullPath
+       if !strings.HasSuffix(childPrefix, string(filepath.Separator)) {
+               childPrefix += string(filepath.Separator)
+       }

        return &filesystemDirectoryIterator{dirHandle: f, childPrefix: childPrefix}, nil
 }
 

However, in principle, I wonder if this is really the ideal place: Wouldn't it be better to patch

func (e *filesystemEntry) fullPath() string {
return e.prefix + e.Name()
}

and ensure that the return value ends with a trailing backslash if e.prefix == "\\?\GLOBALROOT\Device"?

@Hakkin
Copy link
Author

Hakkin commented May 25, 2024

Thanks for the additional eyes,

I don't see the necessity for introducing the remainder variable.

The remainder part ensures we only apply the trailing path separator to the top level shadow volume root directory, no directory below it requires it. The remainder code is checking if we're at the root or in a subdirectory.

I separated openPath and fullPath because only the Open call needs the trailing slash, all the rest of Kopia's code doesn't expect a trailing slash there. I don't know if it would actually break anything, but I didn't see a reason to change the behavior. This is the same reason I didn't modify the fullPath function itself.

@Hakkin
Copy link
Author

Hakkin commented May 25, 2024

After checking your patch again, I think your childPrefix check actually handles the second part of my comment. For some reason I was under the impression that fullPath got used or returned somewhere by itself, but since it's just being used for childPrefix, your check should be sufficient.

The first part of my comment is still valid I believe.

@Janzert
Copy link

Janzert commented May 28, 2024

In case it helps anyone, this seems to work fine in 0.16.1 (and also at least 0.13.0). It might be helpful to track down what changed in 0.17 that causes it to break as well.

@Hakkin
Copy link
Author

Hakkin commented May 30, 2024

@Janzert

In case it helps anyone, this seems to work fine in 0.16.1 (and also at least 0.13.0). It might be helpful to track down what changed in 0.17 that causes it to break as well.

I just tested with 0.16.1 and got the same error, and the --enable-volume-shadow-copy option wasn't available until v0.16.0, so I'm not sure what you mean by "also at least 0.13.0". Maybe you're confusing this with manually snapshotting a Shadow Copy volume via a external script? This issue is specifically about the built in --enable-volume-shadow-copy option.

@Janzert
Copy link

Janzert commented May 31, 2024

Sorry, I got confused looking at which versions I had previously installed. 0.13.0 is apparently the version I briefly installed last year and decided to wait until VSS got built in support before giving Kopia a real try.

I have 3 policies1 setup to snapshot C:\, D:\ and E:\. I did receive errors after upgrading to 0.17.0, by downgrading to 0.16.1 it is again working fine for me.

Running with 0.17.0 it does look the same as this bug:

> kopia snapshot create C:\
Snapshotting ****@****:C:\ ...
creating volume shadow copy of C:\
new volume shadow copy id {A612AB29-6D78-46F4-9B5F-6372AD5C7B9D}
removing volume shadow copy id {A612AB29-6D78-46F4-9B5F-6372AD5C7B9D}
ERROR upload error: The parameter is incorrect.

The same command with 0.16.1 finishes without error.

From kopia policy show --global or any of the volume paths I have:

OS-level snapshot support:
  Volume Shadow Copy:              always   (defined for this target)

or

OS-level snapshot support:
  Volume Shadow Copy:              always   inherited from (global)

I never did configure the external script method and instead waited for the built in support from #3543.

Footnotes

  1. Is that the correct Kopia term?

@Hakkin
Copy link
Author

Hakkin commented May 31, 2024

The same command with 0.16.1 finishes without error.

Oh, that's very interesting! I was able to reproduce your results using the official published v0.16.1 binary, I was originally building it myself from the v0.16.1 git tag, which results in the broken behavior! I see the Go version was updated to 1.22 between v0.16.1 and v0.17.0, and I'm also using Go 1.22 locally. I wonder if that's the culprit?

In the Go 1.22 changelog, I see various changes in the os package regarding Windows:

On Windows, the Stat function now follows all reparse points that link to another named entity in the system. It was previously only following IO_REPARSE_TAG_SYMLINK and IO_REPARSE_TAG_MOUNT_POINT reparse points.

On Windows, passing O_SYNC to OpenFile now causes write operations to go directly to disk, equivalent to O_SYNC on Unix platforms.

On Windows, the ReadDir, File.ReadDir, File.Readdir, and File.Readdirnames functions now read directory entries in batches to reduce the number of system calls, improving performance up to 30%.

When io.Copy copies from a File to a net.UnixConn, it will now use Linux’s sendfile(2) system call if possible, using the new method File.WriteTo.

The 3rd one might be the culprit seeing as the error seems to be stemming from the ReadDir call.

@Hakkin
Copy link
Author

Hakkin commented May 31, 2024

I wonder if that's the culprit?

I've confirmed that this is a regression in Go 1.22. Reverting 09415e0 and downgrading to Go 1.21.10 (along with a few fix ups) allowed me to build 0.17.0 on Go 1.21, and snapshotting using Shadow Copy works correctly.

There seems to be a bug with similar symptoms filed for Go here: golang/go#61907
This has already been fixed in 1.22 though, so our issue must be something else.

Sorry for the ping @jkowalski, but would you or another of the active contributors be willing to file an upstream issue for this? Also wonder what can be done in Kopia for now, reverting 09415e0 was fairly seamless and only required 2 or 3 merge conflicts + and extra fix up of a newer int range loop.

@jkowalski
Copy link
Contributor

The range loop is kind of a big change so this is not exactly easy to rollback :) On top of that we've upgraded linter and have other changes. I'll talk to @julio-lopez to see what to do here.

I think I would prefer to fix this forward and stay on go 1.22, but would definitely need help from folks who are able to actually debug and test this (I don't use Windows much, let alone VSS).

(I have to say Windows filesystem API is full of magic and features that most people don't even realize are even there).

@Hakkin
Copy link
Author

Hakkin commented May 31, 2024

Yes, ideally this would be fixed upstream in Go since it seems to be a regression in the stdlib, and then Kopia can just bump to that version in the next release cycle, but I'm not sure how much Windows magic edge cases the Go team is willing to handle. There are also Microsoft programs which exhibit the same error (see #3842 (comment)) so it could be argued that the previous behavior was incorrect and this is actually the intended behavior, in which case it would need a special case in Kopia to handle it.

@jkowalski
Copy link
Contributor

I think special-casing in Kopia sounds good. We already have MaybePrefixLongFilenameOnWindows that are Windows-specific (also because of Golang os package limitations in this regard), so similar workarounds make sense.

I think some of the code ideas in this thread make sense to put in a PR so we can review it and merge.

Longer term I think it makes sense to completely ditch os package and rewrite localfs in terms of Win32 APIs (or Unix APIs) to get better performance, fewer memory allocations, etc. but we're not ready for that yet.

@Hakkin
Copy link
Author

Hakkin commented May 31, 2024

Hi, I've opened a pull request with a fix at #3891
If anyone else that was experiencing this issue is willing to test it, that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants