-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Explicitly handle origin in further revisions? #1156
Comments
Absolutely important. Labeling as deferred as we cannot tackle this for 3.2. |
The following is non-normative:
|
I think web |
Oops. That definitely sounds like a bug to me. I believe each EPUB instances should have a unique domain (with the caveat below): that is how we could really consider an EPUB as a "website in a box". N.B., if the model above was followed, then no relative URL from one content document to the other would work either!
First of all, if I look at the URL standard then the correct term may have to be a But, indeed, if we refer to "origin" as defined in the URL spec, this may be the right term to use.
So? Why is that different from using an iframe in any other content? I guess I do not see the problem. |
A pair of presentation given by @lrosenthol a few years ago is very much relevant for this discussion: Both were given at a meeting for the (now defunct) Publishing Interest Group. |
I think the statement above may be getting read out of context. It's guidance on how strongly to restrict untrusted scripts; it's not a model for serving documents. The note for that bullet says as much:
The reason for being so strict is that it would prevent a malicious third-party script from stealing an entire publication or raiding any information shared across documents. This is why only scripting within an iframe is recommended to be supported in reflowable epubs, as it limits what a script can access. Spine-level scripting is only recommended for fixed layouts, and that's because container scripting isn't terribly realistic. |
To collect some thoughts on this issue:
If we don't solve the latter issues, I don't think it really matters much what we say in the security section. I expect reading systems that are restrictive of what scripts can do at the spine level will continue to be. I think it's also somewhat inevitable that even if we define a support profile we'll still have to accept that some reading systems will take a more restrictive approach. |
I do not know whether that is the way current RS-s work. But if each content document is its own origin (which seems to be what the current text says) then scripts running in content documents cannot share data among them using storage API-s. That is why I think the current text is wrong. And.. today it might be an informative guidance: I am really considering whether that should not be a MUST for a RS. Ie, an EPUB document should behave in such a way that all resources have a single, unique origin.
You mean we do not require the capability of spine-level scripting, right? You are probably right that this may not change (we can try...) but we still must specify exactly the origin as above. If a RS does not do scripting at all, it does not change anything for it, so there is no harm for them. There are browsers that do not do scripting (and there are users who switch browser off entirely). The same way, there are RS-s that do not do scripting. Nevertheless, the Web Platform for browsers is defined with an eye on full-blown scripting; there is an analogy here.
I do not think I agree. Setting the right framework in terms of origin forms the basis of, maybe, getting to a scripting profile one day (if this is what the community wants later).
I would certainly not define a scripting profile with the features it may have; I would prefer to have a set of restriction instead. The set of restriction is finite (and maybe small); listing the capabilities may be impossible to handle when new API-s come to the fore every day... Actually... that there are RS-s that are more restrictive: let the market decide, eventually. I do not think the specification should include restrictions. Instead, the security and privacy sections should list potential security and privacy pitfalls that RS-s may want to be attentive about, and may restrict things (the slides of @lrosenthol lists some examples). Nothing normative there, just informative (as those sections usually are). That there are RS-s that are more restrictive: let the market decide, eventually. I do not think the specification should include restrictions. |
It doesn't say to assign a unique domain/origin to each document, though, or even that it's realistic that this can be done. It says for untrusted scripts, isolate them as if they have a unique domain/origin. It's effectively saying to sandbox content documents from each other. Is it overzealous about security? I'd say yes. Is it inherently wrong? Not if that's your approach to security. I have no issue with changing "content documents" to "epub publications" in that paragraph, as I think the threat from untrusted scripts comes within an epub from third party content and authors should be advised to sandbox such content themselves. We shouldn't expect paranoid reading systems just because bad things can happen. But origins and what scripting APIs are available are not intrinsically linked. Changing the informative guidance isn't going to produce support for the APIs that authors can't currently access. Even making it normative that epubs each have a unique origin won't require reading systems to enable such support. So while it may clean up a bit of outdated advice, we still face bigger challenges to ever getting to consistent spine-level scripting support. |
O.k., we agree on this. But, also, I may want to say that this is a MUST, i.e., this is how RS MUST operate, ie, by creating, conceptually, a sandbox as you called it. Authors should be able to rely on this.
That is correct. But how is this different from the browser world? There is a load of APIs defined out there, and web site designers take the risk on whether a specific API is implemented by a specific browser. The same holds for RS-s. The APIs are defined by the Web Platform (unless we want to add our own APIs, but I do not think this is something we would do in 3.3). We can draw attention on potentially dangerous setups or scripts from a security or privacy point of view (and we should probably do that) in an informative section, but that is as far as the specification should go imho. |
It's not. I'm just not sure what all we're trying to solve with this discussion. If it's only the origin question, then I think we're fine. If we're also trying to solve the issues like being able to share cookies, local storage, etc. between content documents in an epub, I think that's a whole other can of worms and one that we've been trying to solve without a lot of luck for as long as 3.0. If not, then just ignore my ramblings... :) |
I'm hoping to build a foundation, perhaps for future work. To do that, I think we need to identify some principles, informed by the kinds of things people hope to do in EPUB. Leonard mentioned the idea of a security boundary. To me, the fundamental security boundary should be the EPUB itself--for example, individual content documents in the EPUB should have access to the same local storage. But different EPUBs should not share the same local storage. I also believe we need to, as much as we can, describe what we expect using the language of the web security model. My straw-person proposal would be to say that each EPUB should act as if it has a single opaque origin. Would this allow the kinds of scripting people want to do, while limiting the damage a bad script can do? I don't know, but I think it's worth exploring. I would also note that some of the risk here is not only from malicious scripting, but from poorly written scripts. document.querySelector('html').innerHTML = '<p>Call me Ishmael.</p>'; |
The issue was discussed in a meeting on 2021-02-18 List of resolutions:
View the transcript3. Origin, cont'dWendy Reid: this is continuing from last week's meeting Dave Cramer: i think most of the discussion is in issue 1153 Leonard Rosenthol: the thing that is most problematic is the difference between actually doing this in a browser with a content hosted on a real domain vs doing this on a device (mobile, desktop, etc.) Dave Cramer: i hear you Leonard Rosenthol: the problem is that you can't do that Dave Cramer: could you solve that problem with different subdomains for each title? Leonard Rosenthol: yes, but only in a world where all the epubs come from the same publisher Dave Cramer: you're kind of creating a non-conforming RS in this example Leonard Rosenthol: that would make all web-based RS non-conforming Wendy Reid: I think dropbox actually does have an ebook reader.... Leonard Rosenthol: they're probably taking advantage of no scripting then Wendy Reid: i think the solution that most RS have come to is just to avoid scripting entirely Leonard Rosenthol: that doesn't solve other things, e.g. referencing Brady Duga: this really seems like a scripting issue Dave Cramer: Jiminy has real world examples of this sort of stuff Brady Duga: maybe? It depends on the RS and the content Leonard Rosenthol: if, say, you're building your own software and documents, and you control the entire system there's no reason why you wouldn't want to do it that way Dave Cramer: one thing to do is go back to our current language Leonard Rosenthol: can probably change that so that each epub is its own origin, like you said earlier Matt Garrish: the original wording came at a time when we were just starting to open epub to scripting Dave Cramer: to me i feels like a little bit of progress if we relax the current language to say "per epub" instead of "per content document" Brady Duga: right now the spec is more restrictive, but we're already finding examples IRL where RS are not honoring it Matt Garrish: depends where we are going with this Dave Cramer: given all that, should we take the baby step of updating the non-normative guidance that the boundary should be "per epub"?
Brady Duga: does that include changing from "domain" to "origin"? Dave Cramer: yes, i think so
Wendy Reid: that's everything that was on the agenda tonight Dave Cramer: i think i do have an action item to talk to TAG about the general ideas around epub security Wendy Reid: there is most likely going to be a special session at the business group next week about WCAG3 |
Please allow me to open and use this issue as food for thought.
This is much much longer term but it seems to me that not explicitly handling the
origin
concept has actually been an underlying issue creating compat, interoperability and security issues for all parties involved in the EPUB ecosystem – so users, authors, distributors (???), and Reading Systems.First things first, I’d vastly prefer not to be familiar with the origin concept and its security models, which can be painful to learn, but they do exist for a lot of reasons so it’s kinda worth putting
someoutstanding effort into understanding how it works.That’s to say I know this is a complex issue, but it’s already dealt with ± implicitly in the spec, albeit partially i.e. remote resources, JS guidance, etc.
JS guidance is the most obvious case, as issue #873 might demonstrate. The thing is it’s been 2 years now, and the situation has not improved that much.
To be fair, kudos to the people who did put effort into that; to name a few: iBooks, to which I personally reported a security issue, Daniel Weck @ Readium, or Mantano in Bookari. I’m pretty sure I’m forgetting some people so please be assured this is much appreciated if you did too.
However, EPUB files still share the same origin in an awful lot of apps – i.e. something that is considered a security/authoring issue by other apps. It also has practical issues for authors because if say one file
localStorage.clear()
for instance, all EPUB files will lose the items they previously set (cf. education → quizzes, etc.). Web Storage also has quotas, so having a lot of books using it may create other issues for authors.This isn’t the only example though. My goal isn’t to make a super extensive list, obviously, but to point out how handling origin could help quite significantly.
By extension, origin could be unique but not persistent i.e. the Reading App assigns a random port @ launch, in which case you’ll have Web Storage spread across
127.0.0.1:3000
and127.0.0.1:3001
for instance.Another issue is when the Reading App doesn’t use a (local) server behind the scenes so the
file://
scheme is used, and it comes with severe restrictions in some underlying rendering engines as the whole origin is consequently opaque (e.g. no Web Storage, restricted access to images, audio, video, iframes, etc.).Speaking of iframes, we have issue #1061, and there’s once again little interop in RSs there, as they decide how to manage external links (e.g. opening the platform’s default browser, emptying the iframe’s content, etc.).
And that can also apply to nodes of content the RS clones to achieve some UX – e.g. footnotes –, which are then injected into another document/a dialog element. Should something you clone from the EPUB file into a “native element” be considered opaque hence restricted to make it the app more secure or not? (spoiler: yes, but I can’t give more details yet).
Those are a few examples I am familiar with but I’m pretty sure there are others, and when the app is a cloud reader or is using a Webview, the origin concept + its policies apply anyway. So it seems to me the most reasonable option would be to build on top of it, and adjust the model to EPUB whenever needed.
cc @dauwhe as he mentioned that issue in his now-famous thread on twitter.
The text was updated successfully, but these errors were encountered: