Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case sensitivity of path/document names #181

Open
raucao opened this issue Apr 28, 2020 · 2 comments
Open

Case sensitivity of path/document names #181

raucao opened this issue Apr 28, 2020 · 2 comments
Labels

Comments

@raucao
Copy link
Member

raucao commented Apr 28, 2020

The question came up in remotestorage/remotestorage.js#1179 and I haven't found any mention of it in the spec. Has this been discussed before?

@michielbdejong
Copy link
Member

We should definitely mention it, because there is an expectation that URLs might be case-insensitive.

E.g. on github they partially are:

As a programmer, I lean towards saying the URL should be case-sensitive, because in most programming languages string literals are case-sensitive. But I think I could be persuaded either way.

@kevincox
Copy link

If we want to work with providers who are using various filesystems there is a bit of a problem.

remotestorage choice case sensitive backend case preserving backend case folding backend
case sensitive trivial hard1 easy 2
case preserving hard3 trivial easy2
case folding hard3 possible trivial
  1. When looking up files you need to check for collisions. This can be significantly more expensive than a case-sensitive implementation would be.
  2. When making changes you need to downcase every filename.
  3. You need to find out how to store files that vary only by case. This can be done by encoding the filename. (For example a base32 encoding with only lowercase letters.

Looking at it this way the optimal solution for implementers is case-folding. However this has a bunch of problems for applications and users.

  1. Users may expect to be able to store things that vary only by case. (Especially languages with non-bijective folding rules)
  2. At this point you are basically required to do full unicode folding[citation needed] which makes everything more difficult. (But you probably need this anyways unless you are treating paths as bytestrings)
  3. Many applications will now need to store the case some other way (if they want to use the human names in the storage path).

With those things considered I think we should treat paths as byte strings. This makes it easy for the servers to make fast, accurate implementations of remotestorage. However it does mean that it passes folding and normalization onto the app developers. However I think that can be fixed with a couple of good libraries and will be a lot less painful to fix than tracking down a couple of remotestorage implementations that do folding wrong (or just use an older Unicode standard).

There are downsides though:

  • Harder to do a trivial filesystem-based implementation if your filesystem isn't case-sensitive.
  • Dropbox and other shims might not work? I don't know what normalization if any they do for the case-preserving name.

We should probably also check with major remotestorage providers to see what they do and if it would be hard for them to migrate/support the standardized way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants