-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing large object to file is slow #43
Comments
@borkdude I investigated this a bit and it seems that transit calls I think it makes sense if transit-java calls |
@strssndktn I worked around this by first writing to a |
@borkdude writing to BAOS first will require that you hold the whole output in memory which is undesireable. Instead you can try wrapping the output stream into your own OutputStream subclass that makes .flush a no-op. |
I’m hitting the same issue. This did the trick for me:
|
UPD: that actually should’ve been
Works like a charm |
@tonsky I just updated clojure-lsp to use your suggestion and improved memory usage indeed for huge maps, thanks for that! |
Thanks. I needed to add a couple more type hints to make it work without reflection: (defn no-flush-output-stream
"See https://github.com/cognitect/transit-clj/issues/43#issuecomment-1650341353"
^java.io.OutputStream [^java.io.OutputStream os]
(proxy [java.io.BufferedOutputStream] [os]
(flush [])
(close []
(let [^java.io.BufferedOutputStream this this]
(proxy-super flush)
(proxy-super close))))) |
Did the exact same change yesterday myself! Look at that, two senior Clojure devs got 10-line function right after just 3 attempts! |
On Fri, Jul 3, 2020 at 12:52 PM strssndktn ***@***.***> wrote:
@borkdude <https://github.com/borkdude> I investigated this a bit and it
seems that transit calls .flush overly aggressive, thus forcing the data
to be flushed out to disk on every element (I am not completely sure how
and when .flush gets called or what are the reasons for that)
The reason flush gets called so often is that the original use case or
transit was for a wire protocol. In that mode, we wanted to flush each
entity to the network when it was done being written. This is the same
reason that the underlying transit reader does not assume that a single
read gets you to the end of a stream. This could potentially be made an
option that gets passed in when you create a writer. If memory serves me,
it would have to flow down to the Java implementation. I am not advocating
for any change, just wanted to answer the question about why it is the way
it is.
I am glad you found a workaround that works for you.
Tim-
|
Well, the problem is not that transit flushes at the end of writing an entity. Problem is that it flushes multiple times in the middle of writing entity. I don’t think that could serve any purpose, it must’ve been an implementation bug |
It flushes for any complex entity it writes, including those that are
nested. So if you have an array of 100 maps, it will flush 101 times - once
for each map and once for the array. From a network streaming perspective,
I think that's preferable, but I can see the other perspective for sure.
…On Fri, Jul 28, 2023 at 11:29 AM Nikita Prokopov ***@***.***> wrote:
Well, the problem is not that transit flushes at the end of writing an
entity. Problem is that it flushes multiple times in the middle of writing
entity. I don’t think that could serve any purpose, it must’ve been an
implementation bug
—
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAHXHPB3CVAX7ZJ3XN2CHDXSPLFNANCNFSM4HCMPDXQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Why is it preferable from network perspective? All these maps are wrapped in a list, and you can’t read part of the list on the other side, can you? |
You can start the process of reading on the other side. The Jackson parser
is stream oriented, so while the transit reader won't produce a result
until, say, an entire array has arrived, it can be reading and decoding the
bytes while the writer is still producing output.
…On Fri, Jul 28, 2023 at 12:57 PM Nikita Prokopov ***@***.***> wrote:
Why is it preferable from network perspective? All these maps are wrapped
in a list, and you can’t read part of the list on the other side, can you?
—
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAHXHPMSFFXGSJQD2FN5ETXSPVQJANCNFSM4HCMPDXQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sure, but that’s why you wrap your OutputStream in BufferedOutputStream. It’ll buffer enough for you and flush itself when buffer runs out |
Yeah we could have done that too. I can't honestly remember at this point
if we considered it or not. :)
…On Fri, Jul 28, 2023 at 1:33 PM Nikita Prokopov ***@***.***> wrote:
Sure, but that’s why you wrap your OutputStream in BufferedOutputStream.
It’ll buffer enough for you and flush itself when buffer runs out
—
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAHXHMS7LHVCFFPFPL2QW3XSPZWVANCNFSM4HCMPDXQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Problem:
Writing a large object to a file with transit-clj was slower than I expected. It takes about 1 second with
transit-clj
and roughly 1/10th of that time with regularspit
.Test data:
test.edn.zip
Extract to test.edn.
Repro:
While writing the same EDN to a file (printing with
str
is much faster):Flamegraphs
Created with https://github.com/clojure-goes-fast/clj-async-profiler
Flamegraph of transit:
transit-flamegraph.svg.zip
Flamegraph of EDN to file:
edn-flamegraph.svg.zip
The text was updated successfully, but these errors were encountered: