Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In OS X 10.8 gsub in spec fails on Invalid ByteSequenceError #482

Open
nacengineer opened this issue Sep 11, 2013 · 23 comments
Open

In OS X 10.8 gsub in spec fails on Invalid ByteSequenceError #482

nacengineer opened this issue Sep 11, 2013 · 23 comments

Comments

@nacengineer
Copy link
Contributor

On this line

https://github.com/nacengineer/backup/blob/master/spec/support/sandbox_file_utils.rb#L239

The gsub is failing because this line

https://github.com/meskyanichi/backup/blob/master/spec/syncer/cloud/local_file_spec.rb#L31

@ghost
Copy link

ghost commented Sep 11, 2013

That's because the path here contains invalid UTF-8.
But, that error should not be raised if the path is within @tmpdir - which it is (or certainly should be) in this case.
Any idea why it's not?

@nacengineer
Copy link
Contributor Author

Yeah I was wondering if you were overloading the test with both a bad path and bad text all in one string. That was a bit confusing to me.

Anyway, the failure is specifically in the gsub on the HERE doc listed above.

https://github.com/nacengineer/backup/blob/master/spec/support/sandbox_file_utils.rb#L239

The bad file name gets created and passed fine all the way up through the HERE doc.

If you omit this line

https://github.com/nacengineer/backup/blob/master/spec/support/sandbox_file_utils.rb#L243

Then the gsub executes fine. Although then the test is getting a return other than what its expecting.

I also put encoding("binary") before the gsub and it returned fine, but again with data that won't pass.

I did test this in my Linux VM and the specs run fine. So its specific to OS X... so maybe a Darwin issue...

@ghost
Copy link

ghost commented Sep 11, 2013

The line above the raise, L238, should be returning true. That's the real issue.
I don't have a Mac. Wish I did, but I'm unemployed and couldn't buy a stick of gum ATM.
If you run an IRB session, do these return the same?

File.expand_path 'foo'
File.expand_path 'foo/'

And, does the result end with / or not?

@nacengineer
Copy link
Contributor Author

Both lines return the same with no trailing slash.

I threw a binding.pry call into that method at L238 and did some debugging that might help.

https://gist.github.com/nacengineer/6d914af4e70e6d2ed13e

@ghost
Copy link

ghost commented Sep 11, 2013

OK, that looks fine. What happens when you step through to when the path with the invalid UTF-8 shows up?

@nacengineer
Copy link
Contributor Author

Apologies. I had inserted the pry call wrong. I did it for the opposite case the if instead of the unless.

I updated the gist and pushed through to the gsub with the list in the HERE doc and the gsub without the list in the HERE doc.

@ghost
Copy link

ghost commented Sep 11, 2013

That's odd. Obviously you can see the problem, but why? And only for this particular FileUtils.touch call?
What do you get if you run this:

require 'tmpdir'
tmpdir = Dir.mktmpdir
puts tmpdir
Dir.chdir(tmpdir) { puts File.expand_path('') }

These should be the same.

@nacengineer
Copy link
Contributor Author

Bingo. Its suddenly located within the /private/var folder not /var

I updated the gist with the output in 002

@ghost
Copy link

ghost commented Sep 11, 2013

Is there a symlink involved here?

@nacengineer
Copy link
Contributor Author

@burns Looks like this might be the issue

http://www.pressingquestion.com/3589073/Pythons-Oschdir-And-Osgetcwd-Mismatch-When-Using-Tempfilemkdtemp-On-Mac-Osx-Lion

I can confirm that there is a symlink pointing to private/var

@ghost
Copy link

ghost commented Sep 11, 2013

Well, Dir.tmpdir calls File.expand_path on the path it finds. So it must be an issue with how the PWD is being reported.

What I don't understand is that there are other tests that do this, yet only this one that's failing?
Given this issue, they should all fail.

@ghost
Copy link

ghost commented Sep 11, 2013

What about this?

require 'tmpdir'
tmpdir = Dir.mktmpdir
puts tmpdir
Dir.chdir(tmpdir) do |path|
  puts path
  puts File.expand_path('')
  puts File.expand_path(path)
end

@nacengineer
Copy link
Contributor Author

@burns I've update the gist again with the above. In file 003

There were other test that were failing but they were related to the generated system calls.

See this fork/commit:

https://github.com/nacengineer/backup/commit/ea72cbcbe6fff261c617f98fb71b4948991c3ba2

FWIW, I've added file 004 to the gist so that you can see the test suite is finishing.

@ghost
Copy link

ghost commented Sep 12, 2013

Yeah, the problem here is that Dir.pwd dereferences symlinks, which File.expand_path uses when expanding relative links. https://gist.github.com/burns/6541800

I think this one test is the only place this is causing a problem.
Try changing the test to:

diff --git a/spec/syncer/cloud/local_file_spec.rb b/spec/syncer/cloud/local_file_spec.rb
index a8321ac..328eedd 100644
--- a/spec/syncer/cloud/local_file_spec.rb
+++ b/spec/syncer/cloud/local_file_spec.rb
@@ -30,7 +30,7 @@ describe Syncer::Cloud::LocalFile do
         end
         bad_file = "sync_dir/bad\xFFfile"
         sanitized_bad_file = "sync_dir/bad\xEF\xBF\xBDfile"
-        FileUtils.touch bad_file
+        FileUtils.touch File.join(@tmpdir, bad_file)

         Logger.expects(:warn).with(
           "\s\s[skipping] #{ File.expand_path(sanitized_bad_file) }\n" +

Note that we still must expand the sanitized_bad_file path (using the dereferenced Dir.pwd),
since LocalFile.find will expand the relative path the test is passing in.
Also note that File.expand_path will not dereference a symlink if a absolute path is given.
So for the Syncers, it's probably best to use absolute paths that don't include symlinks,
since the find command is using the -L option. Otherwise, the relative links used to key the files
in the 'local_files' Hash may be incorrect (I'd have to play with this some more to be sure).

ETA: It appears that find -L still uses the path with the symlink in it's results.

@nacengineer
Copy link
Contributor Author

Actually this causes the next expectation in that test to fail.

Updated gist with debug

@ghost
Copy link

ghost commented Sep 12, 2013

How did "bad\xFFfile" become "bad%FFfile" ?

@nacengineer
Copy link
Contributor Author

not sure. I went back and edited the gist with an additional debug step where I checked the file name and instantiated a hash with that as a key... both were fine.

@ghost
Copy link

ghost commented Sep 13, 2013

Try debugging LocalFile#initialize (before #sanitize). Could it be that find under OS X is returning this?

@ghost
Copy link

ghost commented Oct 21, 2013

@nacengineer Was this ever resolved? I'd like to make sure the tests are passing under OS X :)

@patatepartie
Copy link

@ghost The error is still there on OS X. I think you were on the right track with "bad%FFfile".
It seems to be a bug with ruby's FileUtils. The following will produce TMPDIR/bad%FFfile:

FileUtils.touch File.join(Dir.mktmpdir('backup_spec'), "bad\xFFfile")

I couldn't find anything about it.

@tombruijn tombruijn added Problem and removed Problem labels Jul 24, 2014
@tombruijn
Copy link
Member

I can confirm the bug on my machine. I repeated the scenario between @nacengineer and the account formerly known as @burns and got the same results every time. It did however differ from @patatepartie last results.

require "tmpdir"
FileUtils.touch File.join(Dir.mktmpdir('backup_spec'), "bad\xFFfile")
# => ["/var/folders/n1/z69rhgkj4w3_jb6chc1c0w_m0000gn/T/backup_spec20140724-22102-1l63j9f/bad\xFFfile"]

So no %FF, but still the \xFF

Anyway, I can make the spec pass by expecting the error rather than having the line kill the spec.

diff --git a/spec/syncer/cloud/local_file_spec.rb b/spec/syncer/cloud/local_file_spec.rb
index ba43e9a..cfa6a49 100644
--- a/spec/syncer/cloud/local_file_spec.rb
+++ b/spec/syncer/cloud/local_file_spec.rb
@@ -38,12 +38,7 @@ describe Syncer::Cloud::LocalFile do
         Dir.chdir(@tmpdir) do
           bad_file = "sync_dir/bad\xFFfile"
           sanitized_bad_file = "sync_dir/bad\xEF\xBF\xBDfile"
-          FileUtils.touch bad_file
-
-          Logger.expects(:warn).with(
-            "\s\s[skipping] #{ File.expand_path(sanitized_bad_file) }\n" +
-            "\s\sPath Contains Invalid UTF-8 byte sequences"
-          )
+          expect { FileUtils.touch bad_file }.to raise_error

           local_files = described_class.find('sync_dir')
           expect( local_files.keys.count ).to be 3

Not sure if it's the desired result though. The rest of this spec seems to pass where in the above conversation it would break the following assertions in this spec. Any other ideas? Otherwise this seems like a solution.

Edit: Needs to be tested on ubuntu. I fear it might fail there.

@tombruijn
Copy link
Member

Yep, no.. This fails on ubuntu.
But this is just weird behavior. On OSX it raises an error while on Ubuntu it returns an array with the original file name.

# OSX
$ bad_file = "sync_dir/bad\xFFfile"
$ bad_file.encoding
=> #<Encoding:UTF-8>
$ FileUtils.touch bad_file
=> ArgumentError: invalid byte sequence in UTF-8
   from /Users/tom/projects/backup/spec/support/sandbox_file_utils.rb:239:in `gsub'
# Ubuntu
$ bad_file = "sync_dir/bad\xFFfile"
$ bad_file.encoding
=> #<Encoding:ISO-8859-1>
$ FileUtils.touch bad_file
=> ["sync_dir/bad\xFFfile"]
$ bad_file.force_encoding("UTF-8")
$ bad_file.encoding
=> #<Encoding:UTF-8>
$ FileUtils.touch bad_file
=> ["sync_dir/bad\xFFfile"]

So on Ubuntu it just ignores the encoding setting, and even when it's set to UTF-8 it doesn't behave the same on OSX as on Ubuntu.

Anybody got any ideas?

tombruijn added a commit that referenced this issue Nov 5, 2016
I couldn't fix it in 2014 and don't want to break my head on it now.
Maybe later?

Reported in #482
elthariel pushed a commit to elthariel/backup that referenced this issue Jun 26, 2017
I couldn't fix it in 2014 and don't want to break my head on it now.
Maybe later?

Reported in backup#482
elthariel pushed a commit to elthariel/backup that referenced this issue Oct 12, 2018
I couldn't fix it in 2014 and don't want to break my head on it now.
Maybe later?

Reported in backup#482
@elthariel
Copy link
Contributor

I've just written a tiny C snippet to test this, OSX file system or implementation of the open stdlib function is replacing \xFF by %FF, which will prevent to make this test case working on OSX :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants