Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transit reader produces Clojure keywords that cannot be round-tripped to EDN #32

Open
samroberton opened this issue Jul 19, 2016 · 2 comments

Comments

@samroberton
Copy link

Minimal reproduction:

(-> "[\"^ \",\"~:foo}\",\"bar\"]"
    (.getBytes "UTF-8")
    (java.io.ByteArrayInputStream.)
    (cognitect.transit/reader :json)
    cognitect.transit/read
    pr-str
    clojure.edn/read-string)

The above results in a RuntimeException from read-string ("Map literal must contain an even number of forms"), due to the } at the end of the :foo} in the transit string (which, to be clear, I've constructed by hand, rather than outputting from Transit). If you remove the read-string and replace it with keys first, you can see that Transit has not identified the keyword as invalid, and has produced a keyword :foo}. This issue is me asking for Transit instead to report that the input was malformed or is not representable as a valid Clojure data structure.

I understand that for performance reasons Clojure doesn't validate keywords when interning, and I'm not proposing to bikeshed that decision. However, for Transit, that approach has the unfortunate consequence that you have to re-validate data that Transit has read from the potentially-untrusted network before you can assume that its keywords are safe to use in everyday Clojure code.

Specifically, we have a Clojure web app with a ClojureScript front-end, and we use Transit to communicate between them. Sometimes we'll need to take some of the data received from the front-end and write it out somewhere else (to be read back in future, or to communicate it to a separate process, etc). The natural way to do that is via pr-str and clojure.edn/read. This issue introduces a new point of failure in our application -- reading the EDN back in might fail, even though it was written out via pr-str, because Transit might have given us a keyword which is not printable/readable in Clojure.

For additional context, this issue was discovered by an external pen-test, attempting an XSS attack (by manually replaying a modified request), so the answer for us can't just be "well then don't have your front end send data across the wire that you're not going to be capable of reading correctly". To give ourselves some level of confidence that our app isn't going to break when someone tries XSS, we're now having to write additional code to wrap around our calls to transit/read to ensure that the keywords it has read are valid keywords, etc. That seems like it would be better handled when reading the input in the first place.

@timewald
Copy link
Contributor

Sam,

You can address this issue by registering a read handler for keywords that
adds whatever additional checking you want prior to making and returning a
keyword.

While I understand the case you are making for adding this to Transit's
built-in keyword read handler, there are a lot of cases where either
invalid keyword values aren't possible (because you control the source of
the data too) or wouldn't cause a failure. In those cases, additional
checking would simply be overhead.

Tim-

On Mon, Jul 18, 2016 at 11:59 PM, Sam Roberton [email protected]
wrote:

Minimal reproduction:

(-> "["^ ","~:foo}","bar"]"
(.getBytes "UTF-8")
(java.io.ByteArrayInputStream.)
(cognitect.transit/reader :json)
cognitect.transit/read
pr-str
clojure.edn/read-string)

The above results in a RuntimeException from read-string ("Map literal
must contain an even number of forms"), due to the } at the end of the
:foo} in the transit string (which, to be clear, I've constructed by
hand, rather than outputting from Transit). If you remove the read-string
and replace it with keys first, you can see that Transit has not
identified the keyword as invalid, and has produced a keyword :foo}. This
issue is me asking for Transit instead to report that the input was
malformed or is not representable as a valid Clojure data structure.

I understand that for performance reasons Clojure doesn't validate
keywords when interning, and I'm not proposing to bikeshed that decision.
However, for Transit, that approach has the unfortunate consequence that
you have to re-validate data that Transit has read from the
potentially-untrusted network before you can assume that its keywords are
safe to use in everyday Clojure code.

Specifically, we have a Clojure web app with a ClojureScript front-end,
and we use Transit to communicate between them. Sometimes we'll need to
take some of the data received from the front-end and write it out
somewhere else (to be read back in future, or to communicate it to a
separate process, etc). The natural way to do that is via pr-str and
clojure.edn/read. This issue introduces a new point of failure in our
application -- reading the EDN back in might fail, even though it was
written out via pr-str, because Transit might have given us a keyword
which is not printable/readable in Clojure.

For additional context, this issue was discovered by an external pen-test,
attempting an XSS attack (by manually replaying a modified request), so the
answer for us can't just be "well then don't have your front end send data
across the wire that you're not going to be capable of reading correctly".
To give ourselves some level of confidence that our app isn't going to
break when someone tries XSS, we're now having to write additional code to
wrap around our calls to transit/read to ensure that the keywords it has
read are valid keywords, etc. That seems like it would be better handled
when reading the input in the first place.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#32, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB7nYvS6x5QdkI42wt5JZxid65UH3lwks5qXEuKgaJpZM4JPWrB
.

Datomic Team

@samroberton
Copy link
Author

Can I suggest then that at the very least this warrants calling out in the documentation?

Clojure's adopted a fairly strong position that "inputs aren't always validated, but if they're not valid, it's on you when stuff subsequently blows up". It certainly wasn't obvious to us as users of transit-clj that normal use of the library would put us on the wrong side of that line, and a warning to that effect would have been helpful.

(Better than that, from my perspective, would be an ability to set some sort of flag saying "I'm OK with the runtime overhead, please behave in a way that won't cause things to break later in unexpected ways", but I assume that that's not on the table.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants