-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove "final" modifier from CBORParser
#129
Comments
No, Out of curiosity, do you have specific use for sub-classing? |
CBORParser
I remove |
Hi cowtowncoder, Thanks for quickly fixing it! In our case, we would add a method to make CBORParser read from the specified offset of InputStream, the underling InputStream in our project support to read from a specified offset. I tried add below method to support this. Does it make sense to you? public class CBORParserEx extends CBORParser { |
HiTatu,
Thanks for quickly fixing it! May I know when will the 2.9.5 be released?
In our case, we would add a method to make CBORParser read from the
specified offset of InputStream, the underling InputStream in our
project support to read from a specified offset. I tried add below
method to support this. Does it make sense to you?
public class CBORParserEx extends CBORParser {
public void setOffset(int offset) {
final ByteInputStream in = getInputStream();
in.setOffset(offset);
_currInputProcessed = offset;
_inputPtr = _inputEnd = 0;
}
}
Regards,
Jin
…On 3/2/18 12:36 PM, Tatu Saloranta wrote:
I remove |final| modifier from |CBORParser| -- I can not recall
explicit reason to add it, and no other parser or generator
implementation is left final (at an earlier point some were).
This is for |2.9| branch and |master|, so will be included in |2.9.5|
release once that is made (and eventually 3.0).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_FasterXML_jackson-2Ddataformats-2Dbinary_issues_129-23issuecomment-2D369817833&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=_2rUppWNPkdxJ73wkPkJySUCWTQ2RzoEFd5Z1LP_Z24&s=R0K7evEltPIgukHTgvRanLo-FpgHjUDfws-Tq2ZGHCg&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHahsR7DC1XHvoHItBogGmPVaYJXvcIYks5taMxLgaJpZM4SZWDC&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=_2rUppWNPkdxJ73wkPkJySUCWTQ2RzoEFd5Z1LP_Z24&s=qJHOyMYTymY333y1pJncEIH-jrdGDN1bAUDN-xp0Zjc&e=>.
|
On modification: I would personally modify it by custom As to |
Hi Tatu,
Thanks for your quick response!
In our case, there is a custom InputStream which supports to set read
offset at any position, but it seems that the CBORParser has its own
internal InputBuffer that caches a batch of data read from InputStream,
so adjusting the offset of InputStream will not immediately impact on
CBORParser, it will read from its InputBuffer until no more data and
need to load more from InputStream.
Is it possible to avoid using InputBuffer in CBORParser?
Regards,
Jin
…On 3/2/18 11:45 PM, Tatu Saloranta wrote:
On modification: I would personally modify it by custom |InputStream|
instead, and not try to do that on parser. Or, if you prefer,
sub-classing |CBORFactory| and handling it here.
The only thing that would not achieve would be |_currInputProcessed|,
and if that is important, the only way would be sub-classing.
As to |2.9.5|, likely some time in March, maybe in 2 weeks or so.
Usually there's at least 2 months between patch releases at this point.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_FasterXML_jackson-2Ddataformats-2Dbinary_issues_129-23issuecomment-2D369957817&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=vmCRLNuXjGHgq0-DUE4UH5s_WF4sQkqYRxdAv4QJPRI&s=9yT4Dk-Bme9uMAPwV86mB0dRO3MmZPK-mmKesPjqYMs&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHahsVHSuSD8PFB4SZETJBcRGapthjb0ks5taWkLgaJpZM4SZWDC&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=vmCRLNuXjGHgq0-DUE4UH5s_WF4sQkqYRxdAv4QJPRI&s=ZfQeYRl4xGh1FJMzBpWBlbK8O7PZHsuwFjVFX7aMrN4&e=>.
|
Use of input buffer is pretty essential for reasonable performance -- overhead of single byte reads would be significant. So design does not allow skipping of buffering. But I guess I am not sure which aspect of buffering would be problematic for you: location, or something else? |
Hi Tatu,
On 3/5/18 12:16 PM, Tatu Saloranta wrote:
Use of input buffer is pretty essential for reasonable performance --
overhead of single byte reads would be significant. So design does not
allow skipping of buffering.
It is possible to "flush" content buffered but not (yet) read at end
of decoding (using method |releaseBuffered()|on parser).
But I guess I am not sure which aspect of buffering would be
problematic for you: location, or something else?
Yes. it is location. In our case, the capability to read value in a
customized order is required. So we just go through the InputStream to
collects the offsets for each value, then read values in a customized
order by setting the offset of InputStream.
Based on my performance testing, using Jackson CBOR for above case is
12% slower than using serialization module implemented by ourselves.
I feel like the way we use Jackson CBOR to read value on specified
offset is not suitable for the current Jackson CBOR implementation that
uses a internal buffer to cache data. In our case, the process is like
set offset1 -> parse value1 -> set offset2 -> parse value2 -> ....
When setting offset, actually it resets the InputBufer(see below
setOffset() mehtod). During parse value, it will load 8K data to
InputBuffer, but parse 1st value only, then reset again, load another 8K
data and parse 2nd value, this process might not be that efficient I
think. What do you think?
public class CBORParserEx extends CBORParser {
public void setOffset(int offset) {
final ByteInputStream in = getInputStream();
in.setOffset(offset);
_currInputProcessed = offset;
_inputPtr = _inputEnd = 0;
}
}
Regards,
Jin
…
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_FasterXML_jackson-2Ddataformats-2Dbinary_issues_129-23issuecomment-2D370305528&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=Ju-E8nByOjLWwY9HroVs_pYLQL-pz-ZRbry44K8Ff3Q&s=iSpwPX5urwNp7mtVCNvBUewh-F6VZWzvT8wLpgt8gEI&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHahsY4Ht9wG7DnGS7cQKyNAH2ERv0O3ks5tbLwNgaJpZM4SZWDC&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=Ju-E8nByOjLWwY9HroVs_pYLQL-pz-ZRbry44K8Ff3Q&s=nPE2Kfn_rGm2ywmwU9iahU1xDhu8WYAMtmPClBkrh9U&e=>.
|
Hi Tatu,
Thanks for your response as always!
On 3/5/18 12:16 PM, Tatu Saloranta wrote:
Use of input buffer is pretty essential for reasonable performance --
overhead of single byte reads would be significant. So design does not
allow skipping of buffering.
It is possible to "flush" content buffered but not (yet) read at end
of decoding (using method |releaseBuffered()|on parser).
But I guess I am not sure which aspect of buffering would be
problematic for you: location, or something else?
Yes. it is location. In our case, the capability to read value in a
customized order is required. So we just go through the InputStream to
collects the offsets for each value, then read values in a customized
order by setting the offset of InputStream.
Based on my performance testing, using Jackson CBOR for above case is
12% slower than using serialization module implemented by ourselves.
I feel like the way we use Jackson CBOR to read value on specified
offset is not suitable for the current Jackson CBOR implementation that
uses a internal buffer to cache data. In our case, the process is like
set offset1 -> parse value1 -> set offset2 -> parse value2 -> ....
When setting offset, actually it resets the InputBufer(see below
setOffset() mehtod). During parse value, it will load 8K data to
InputBuffer, but parse 1st value only, then reset again, load another 8K
data and parse 2nd value, this process might not be that efficient I
think. What do you think?
public class CBORParserEx extends CBORParser {
public void setOffset(int offset) {
final ByteInputStream in = getInputStream();
in.setOffset(offset);
_currInputProcessed = offset;
_inputPtr = _inputEnd = 0;
}
}
Regards,
Jin
…
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_FasterXML_jackson-2Ddataformats-2Dbinary_issues_129-23issuecomment-2D370305528&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=Ju-E8nByOjLWwY9HroVs_pYLQL-pz-ZRbry44K8Ff3Q&s=iSpwPX5urwNp7mtVCNvBUewh-F6VZWzvT8wLpgt8gEI&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHahsY4Ht9wG7DnGS7cQKyNAH2ERv0O3ks5tbLwNgaJpZM4SZWDC&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=Ju-E8nByOjLWwY9HroVs_pYLQL-pz-ZRbry44K8Ff3Q&s=nPE2Kfn_rGm2ywmwU9iahU1xDhu8WYAMtmPClBkrh9U&e=>.
|
In general intent is not to modify underlying But your use case where random-access is required is quite different from kinds of use cases I have considered. I can see how parser, as is, would have significant overhead. If you do know the length of document (or document fragment) to parse you can prevent |
Hi Tatu,
Thanks for the analysis!
In our cases, the length of documents might be variable I think, it
looks like our random-access case is not good case using the existing
Jackson CBOR.
Regards,
Jin
…On 3/7/18 12:39 PM, Tatu Saloranta wrote:
In general intent is not to modify underlying |InputStream|, but to
construct separate |CBORParser| instances for distinct input
documents. This is safer and overhead for parser instances should be
modest.
But your use case where random-access is required is quite different
from kinds of use cases I have considered. I can see how parser, as
is, would have significant overhead.
If you do know the length of document (or document fragment) to parse
you can prevent |InputStream| from exposing more content than is
needed: so, for example, if document is known to be 400 bytes long,
only return up 400 bytes. That would avoid significant overhead of
reading 8000 bytes, discarding 7600.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_FasterXML_jackson-2Ddataformats-2Dbinary_issues_129-23issuecomment-2D371020541&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=vy9JlEKneMWD-9v0G0FoJ685qBDtl1u-z1Pj-_I0CU4&s=2UBfMbMf7L-hg7YQFMEs--47yvoGmG4UTo43GbY2d0o&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHahsUiBVFN0mBOLWXlAYO8HMHO78TBVks5tb2R8gaJpZM4SZWDC&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=vy9JlEKneMWD-9v0G0FoJ685qBDtl1u-z1Pj-_I0CU4&s=V7RFZ6dHmn8o7DgmJ4pwkvqRmA2VVUggLpQ7MuTLKTw&e=>.
|
@jinzha Yes, it sounds like this is the case unfortunately. Optimizing for such access is tricky -- although something like this is actually implemented in Jackson JSON parser for Similarly, async parsers (for JSON, Smile; perhaps in future for CBOR) use different access pattern as input is fed, not read via blocking stream. So it would be possible to implement no-caching parser; and make that pluggable via For now your best bet may be to something else that exists, esp. if you have your own decoder. |
Hi Tatu,
Thanks you very much!
It is very nice experience to use Jackson CBOR library. I think you are
right, we probably have to keep to use own decoder for now.
Regards
Jin
…On 3/8/18 2:33 AM, Tatu Saloranta wrote:
@jinzha
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jinzha&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=8HdcAxPS-dklEv4qsFn2_tCsPJsDq8zqc_Du0Ptdf_k&s=Z4FJXJN0_zo4WhdUImK8tBa9SUHhr49ov0LR-6Db1mg&e=>
Yes, it sounds like this is the case unfortunately. Optimizing for
such access is tricky -- although something like this is actually
implemented in Jackson JSON parser for |DataInput|. It results in -40%
performance decrease (which may be more than what would happen with
|InputStream|; perhaps |DataInput| access pattern is more difficult
for JVM to optimize), but does work with no look-ahead/read-ahead
(since |DataInput| does not have means to detect end of input; reader
must know how much is still needed).
Similarly, async parsers (for JSON, Smile; perhaps in future for CBOR)
use different access pattern as input is fed, not read via blocking
stream.
So it would be possible to implement no-caching parser; and make that
pluggable via |CBORFactory|. I don't have time to do that myself, but
if anyone wants to try that in future, I can help with integration.
For now your best bet may be to something else that exists, esp. if
you have your own decoder.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_FasterXML_jackson-2Ddataformats-2Dbinary_issues_129-23issuecomment-2D371237813&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=8HdcAxPS-dklEv4qsFn2_tCsPJsDq8zqc_Du0Ptdf_k&s=gFleEk-Z7PjcThg1kD38du7vcdaQVRq7oKif32br5Mo&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHahsdw4UzXEmQ7IGx6mzLYsx4-2D0ofNzks5tcCgNgaJpZM4SZWDC&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wb6jsjsd3i3-fQugULtSmsNMCoqysPMHv94nEiI-Jv8&m=8HdcAxPS-dklEv4qsFn2_tCsPJsDq8zqc_Du0Ptdf_k&s=sammE3GII5sWdKcHQY8ejLBbGHPk3td8dE7BAcznWHY&e=>.
|
Hi CBOR developers,
I would use the CBOR lib in our project, but want to extend it to add some additional functionality. It looks like CBORParser defines a bunch of protected members which implies it could be extended, but it is "final" class, so can not be extended.
Is it intended to not be extended? If not, can you please fix this by removing the final modifier?
BTW, the CBORGenerator is not final class.
The text was updated successfully, but these errors were encountered: