Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend that path-absolute-URL strings not be used to reference resources #1725

Merged
merged 6 commits into from
Oct 29, 2021

Conversation

mattgarrish
Copy link
Member

@mattgarrish mattgarrish commented Jun 28, 2021

Fixes #1681


Preview | Diff

@iherman
Copy link
Member

iherman commented Jun 29, 2021

Very very minor editorial remark: I would move the note in §6.1.3 to the end of that subsection; at present, it really breaks the reading flow too much for my taste. I realize it is a note regarding the preceding paragraph (and maybe it is worth adapting the language) but the really main message for the reader is the text and example after the note and that gets a bit de-emphasized by the note...

@iherman
Copy link
Member

iherman commented Jun 29, 2021

(Interestingly and somewhat annoyingly, if the file itself is reviewed, as opposed to the diff file, it now shows tons of differences that are irrelevant, essentially changes line break places. I presume that may be related to another PR having been merged since the creation of this one. Very annoying indeed, so reviewers should only look at the diff file...)

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not a big fan of this resolution. I don't see why we have to make a specific statement about path-absolute URL strings if the URI resolution mechanism is well defined. A note would be enough.

Additionally, I'm not sure where "Reasing Systems are not required to create the Root Directory when unpacking the EPUB container" is defined. Nor why it is exactly relevant (since we know RS are not required to unpack anything in the first place).
I understand it is a useful example to understand the issue with root-relative paths, but then it shouldn't be in normative text, but in a note.

I suggest that we make a resolution on #1374 before merging this and fixing #1681.

@mattgarrish
Copy link
Member Author

mattgarrish commented Jun 29, 2021

I don't see why we have to make a specific statement about path-absolute URL strings if the URI resolution mechanism is well defined. A note would be enough.

Could be, but we've been tackling the issues one at a time. There's no harm if a future resolution undoes this change as far as I'm concerned. By merging, though, we can look at the next issue in context of what we have so far instead of trying to solve multiple issues all at the same time.

I'm not sure where "Reasing Systems are not required to create the Root Directory when unpacking the EPUB container" is defined.

There isn't a requirement to do anything with the root directory so It's only mentioned in the definition:

This directory is virtual in nature: an EPUB Reading System may or may not generate a physical root directory for the contents of the OCF Abstract Container if it unzips the contents.

https://w3c.github.io/epub-specs/epub33/core/#dfn-root-directory

I still think the normative-sounding part should be a real normative statement (i.e., that the directory is "virtual in nature" should link to the normative statement after the colon which we put somewhere in the RS specification).

I understand it is a useful example to understand the issue with root-relative paths, but then it shouldn't be in normative text, but in a note.

Notes tend to be a bit more explanatory than a single example. It's a little too lengthy for an "e.g.", plus there's already a parenthetical in the preceding sentence so it would make for a lengthy run-on. I'll take a look and see if there's a way to make it more "note-worthy", though.

@rdeltour
Copy link
Member

we've been tackling the issues one at a time. There's no harm if a future resolution undoes this change as far as I'm concerned. By merging, though, we can look at the next issue in context of what we have so far instead of trying to solve multiple issues all at the same time.

I agree with taking these issues one at a time, but the order sounds counter intuitive to me 😁

…nzipping;

add statement that root directory does not have to be created to RS requirements;
fix the formatting of the container requirements list to match new pattern
@mattgarrish
Copy link
Member Author

I would move the note in §6.1.3 to the end of that subsection; at present, it really breaks the reading flow too much for my taste.

Ya, not sure if we should get into that here, though.

Took me a while to figure out that a "language specification" is just a "specification" or "format specification". Sounded like I'd dropped into a section on language codes.

I'm not 100% sure what the note is even saying. We already say the base is defined per format, so is this note saying that relative paths are also bound to specific rules of the format? If so, is that really a note or a normative requirement?

@mattgarrish
Copy link
Member Author

it now shows tons of differences that are irrelevant

Ya, looks like line formatting changed, but that it affects text we haven't updated recently is odd. Can't explain.

@mattgarrish
Copy link
Member Author

I agree with taking these issues one at a time, but the order sounds counter intuitive to me

All roads lead to somewhere in the end, right? 😉

I've added a note and changed the text to instead note that it's the variability of the root (root directory or package document location) that causes the issue. Unzipping manifests it more obviously, but even packed this appears to be the source of resources not being available.

I've also moved the text about creating the root directory being optional to the ocf container requirements.

But I'm sure there will be more changes before we're done.

If you have a proposal for closing #1374, feel free to suggest it.

@iherman
Copy link
Member

iherman commented Jun 30, 2021

I'm not 100% sure what the note is even saying. We already say the base is defined per format, so is this note saying that relative paths are also bound to specific rules of the format? If so, is that really a note or a normative requirement?

I was wondering which formats this note would refer to, but I do not think I have anything in mind right now.

Maybe nuke the note altogether?

@mattgarrish
Copy link
Member Author

looks like line formatting changed

I fixed this in main and merged into this branch so the diffs now look better.

@iherman
Copy link
Member

iherman commented Jul 2, 2021

The issue was discussed in a meeting on 2021-07-02

  • no resolutions were taken
View the transcript

2. Are root-relative paths valid?

See github issue #1681, #1374.

See github pull request #1725.

Dave Cramer: What more needs to happen or can happen in the spec for root-relative paths?

Ivan Herman: one problem we need to address is that we have a problem with iBooks and others that rely on Adobe ADE, namely that they rely on a specific way of organizing the files, which is not in the standard.
… Matt's test was done according to the standard, but iBooks and others get it wrong. We can either acknowledge that problem as a warning and keep the standard as is (iBooks doesn't conform), or we reverse-engineer and put into standard a restricted version of how files can be organized, in order to conform with iBooks. We need to decide if this will harm current eBooks.
… I personally would hate to put restrictions in the standard, but that's just me

Romain Deltour: the test was done with valid ePub with shared resources - there is still the issue of root-relative URL paths and paths that would go outside the container. I think we need the spec to address that.
… some kind of language defining the root is likely necessary.
… and review interoperability with reading systems.

John Foliot: Is an unintended consequence that a publisher would have to create two versions, one for iBooks and another for other reading systems?

Dave Cramer: I don't see huge problems around interoperability because EPUBs are consistent with folder structure, generally.

Ivan Herman: Whatever works for iBooks works for others - but there are perfectly valid ePubs that iBooks doesn't take.
… As for the questions of Romain, we have decided that path relative URLs shouldn't be used, and paths shouldn't go outside the package. We need to make this clear in the documentation but there is not a fundamental technical problem with this.

Romain Deltour: these are edge cases, we don't see this problem often if ever.
… What we have is a recommendation for authors, but we need a recommendation for reading systems on how to process URLs.
… How should a reading system deal when authors don't follow recommendation.

Ivan Herman: it would be helpful to have a clearly-worded proposal for reading systems. Hoping Romain's help with this.

Dave Cramer: everyone seems to agree that having ../.. etc. to outside the package is not a good practice.

Hadrien Gardeur: from a reading system perspective, they need to resolve URIs, and expose the HTML resource (or any resource) to web view.
… reading systems have different ways of doing this, but you need to get the web view to do what you want, and how this is achieved can impact what we are discussing.

Ivan Herman: What precisely should the recommendation in the reading system spec be to cover all implementations?

Hadrien Gardeur: we don't know how each RS works behind the scenes, we can only speculate.

Ivan Herman: If we put something in the spec, it's up to RS how to implement
… we don't have to define that.
… Whatever we do, the author of an EPUB should have a clear mental model of what's happening. The RS implementation is not under the author's influence. If we are saying EPUb is a website in a box, we should be able to clearly define the root, and stop there.

Hadrien Gardeur: On the web, we don't think about files and root containers. For reading systems, we are deciding how an EPUB behaves. So weary of this conceptual approach.

Dave Cramer: we are really talking about edge cases here. Hoping that we can build some tests based on the write-up and what we are trying to achieve.
… hoping we can get clear enough to cover our edge cases without restricting RS implementation.

Hadrien Gardeur: difficult to test everywhere - gets tricky when you have to consider different CSS, etc

Dave Cramer: let's get some proposals down with Romain's help, and get Matt to take a look at them, and proceed from there.

Ivan Herman: Must have a clear statement somewhere whether we intend to restrict EPUB content and define organization of EPUB package.

@mattgarrish mattgarrish merged commit fcdd657 into main Oct 29, 2021
@mattgarrish mattgarrish deleted the fix/issue-1681 branch October 29, 2021 17:31
@iherman
Copy link
Member

iherman commented Oct 30, 2021

The issue was discussed in a meeting on 2021-10-29

List of resolutions:

View the transcript

2.3. "IRI of the Package Document": what is this exactly? (issue epub-specs#1374)

See github issue epub-specs#1374.

Dave Cramer: See more detailed explanation.

Romain Deltour: I may summarize.
… the big problem is defining how to resolve relative URLs in an EPUB.
… most of the URLs we use are relative URLs.
… but an URL object is something which is parsed from an URL string.
… to make it absolute.
… it is done by the parsing algorith.
… I make an example.

parse("doc.xhtml", "https://example.org") == "https://example.org/doc.xhtml".

Romain Deltour: for using this algorith we have to now the base URL (https://example.org).
… the problem is that our spec doesn't define what is the URL of the EPUB (because it may be used in different locations: online, offline, ecc.).

Romain Deltour: e.g., http://example.org/publisher/mobydick.epub#/EPUB/package.opf.

Romain Deltour: I'm going to show other examples.

parse("doc.xhtml", "http://example.org/publisher/mobydick.epub#/EPUB/package.opf") == "http://example.org/publisher/doc.xhtml".

parse("../../doc.xhtml", "http://example.org/publisher/mobydick.epub#/EPUB/package.opf") == "http://example.org/doc.xhtml" // ⚠️ OUTSIDE OF CONTAINER.

Romain Deltour: in this case I'm going outside of the EPUB.

parse("/doc.html", "http://example.org/publisher/mobydick.epub#/EPUB/package.opf") == "http://example.org/doc.xhtml" // ⚠️ OUTSIDE OF CONTAINER.

Romain Deltour: that's why I think we should define which is the base URL, also for security issues.
… the solution should be unambiguious.
… the resulting URL should not go outside the container.
… resolving two relative URLs in two different EPUBs they should not resolve in the same absolute URL.
… the URL of the EPUB should not share the same origin.
… these are the 4 objectives of the ideal solution.

Ivan Herman: I remember that one solution may be to consider an EPUB as a localhost (with a unique port).
… so the localhost:port is what represents the root for the EPUB.
… but if the RS works in a streaming way, it may not work (because the EPUB is not decompressed).
… and if it goes out of the EPUB, the user gets a 404.

Romain Deltour: yes, there are different approches. One is to use domains, another is to use a custom protocol scheme:.

parse("/", "epub:/") == "epub:/".

parse("../../doc.xhtml", "epub:/EPUB/package.opf") == "epub:/doc.xhtml".

Romain Deltour: I don't know which one is better.
… from a RS point of view.

Ivan Herman: I think defining a URI scheme for that is not a good idea.

Romain Deltour: I don't think we'll come with a solution that will be used by the end user.

Brady Duga: I think there are 4 cases: local URLs, online URLs, jar URLs.
… I think is the last URL the problem.
… isn't it?

Romain Deltour: yes, but also referencing to resources outside the package.

Brady Duga: do we need to tell people how to display URLs inside on EPUBs (using fragments)?.
… I would propose to remove it.

Romain Deltour: somewhat related, a gist from @annevk about ZIP URLs (from 8 years ago): https://gist.github.com/annevk/6174119.

Hadrien Gardeur: referencing everything outside the archive is problematic specially for the content document.
… I don't think we should get to a specific resolution here, because the RSs have different solutions.

Romain Deltour: removing that paragraph about the URL of the package document won't work.
… we need to tell people how to build them.

Romain Deltour: at a minimium, we should base everything on the assumption that there is a url for the root of the container.
… and we leave it for the reading system to define.
… I'm not sure that will work.
… and we are back to the discussion that the RS spec should say something.

See github issue epub-specs#1843.

Dan Lazin: there is another issue: #1843.
… about URIs for EPUBs.
… how do specify epub in cors/iframe policy?.
… I don't know how this is managed today.
… you need some way to say, hi, I am aware of this epub can it can iframe my content.

Romain Deltour: this might not answer entirely.
… RS spec says, for scripting, reading system must associate a unique origin to the script.
… a similar mechanism could be used to answer that issue about CORS/CSP.

Dan Lazin: is it a predicable url?.

Romain Deltour: this scripting mechanism is only about an origin--could be an opaque origin, doesn't have to be a url.
… opaque origin serializes to null.

Ivan Herman: where do we go from here?.

Dave Cramer: do we ask for help?.

Ivan Herman: we have tried and failed before.
… we have been discussing these things.
… if we come up with a concrete proposal.
… and then check whether that solution is acceptable to the TAG or whoever.
… my knowledge is not good enough to write a proposal.

Romain Deltour: I was supposed to come up with a proposal.
… I can write a summary of issue with possible approaches "paths" to solutions.
… I don't know enough about URLs and security to know all the plusses and minuses.

Ivan Herman: we can't go to CR with this stuff open.
… it's unfortunate that Tess is not around any more, we might ask the TAG.
… and the TAG takes time.
… we have time pressure.

Dave Cramer: could we talk to ping?.

Romain Deltour: could we liase with Anne at WhatWG?.

Ivan Herman: I worry about that.

Tzviya Siegman: talking to Tess would be good.

Ivan Herman: if we have a proposal that romain can put together.
… my first option would be to involve Tess.

Romain Deltour: I can summarize the problem statement.

Laurent Le Meur: tests will take time.
… why don't we just say that path-absolute URLs are illegal.
… and just update epubcheck?.
… to post an error if there's a slash at the beginning of URL.

Romain Deltour: path-absolute URLs are a red herring. the issue is with any relative URL really..

Laurence Zaysser: could we have a fifth objective, easy to move to web publication?.

Romain Deltour: it's about any relative urls. Just dealing with path-relative won't solve the issue.

See github pull request epub-specs#1725.

Matt Garrish: we have 1725 PR, which forbids path-absolute URLs. Is there any reason we shouldn't merge that?.
… should we close that? Or integrate it because it deals with part of the question?.

Wendy Reid: have we exhausted this?.

Ivan Herman: to answer matt, that one can go in.

Romain Deltour: +1.

Ivan Herman: using root-relative IRIs is a bad idea for something like epub, where the root url is unclear.

Proposed resolution: Merge PR #1725. (Wendy Reid)

Romain Deltour: +1.

Ben Schroeter: +1.

Ivan Herman: +1.

Gregorio Pellegrino: +1.

Matt Garrish: +1.

Shinya Takami (高見真也): +1.

Dave Cramer: +1.

Brady Duga: +1.

Matthew Chan: +1.

Tzviya Siegman: +1.

John Roque: +1.

Bill Kasdorf: +1.

Wendy Reid: +1.

Toshiaki Koike: +1.

Laurent Le Meur: +1.

Charles LaPierre: +1.

Hadrien Gardeur: +1.

Dan Lazin: +1.

Resolution #1: Merge PR #1725.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Are root-relative paths valid?
3 participants