-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to decode Python 2.7 Code Object #1
Comments
I am so sorry for not noticing this until now, but if you're still interested... Rather than adding a feature flag, I feel like it would make sense to just use the Re: strings vs bytes: any idea what encoding the strings you are encountering are in? Latin-1? Or something else? If the encoding is predictable, would converting to UTF-8 AKA FYI: I'm happy to support as wide of a range of versions and use-cases as possible. However, I'm not really working on this much anymore myself, so feel free to submit a PR. |
No worries, I know how it goes :) For full context, I started using this crate as I didn't want to write my project in Python. I also was not aware of how different the Python 2.7 and Python 3 marshal format was. As I've worked on the project since creating this issue some other changes have been made. You can view them here: master...landaire:master Note the change looks a lot larger than it really is since somewhere along the way I had energy to add support for serializing data, refactored some things, then quickly lost that energy. Here's a rough list of major changes:
This makes sense.
They should be UTF-8, but I think the problem is that in Python 2.7 strings aren't guaranteed to be UTF-8 -- I think they're just arbitrary bytes but I could be wrong. As I eluded to above, I ended up just using the
Totally understand. My work on the project has been sort of off-and-on, so when I get a chance I'll package things up and submit a PR. Hopefully you don't feel any pressure -- I've been moving along just fine with a git source specified for the crate. |
Things I'm fine with:
Things I'm not fine with:
AFAICT from my testing:
To summarize: I'm happy to support Python 2, but without removing support for Python 3. I'm fine with making breaking changes if needed, though. |
First off, thanks for doing the hard work for this crate. I have a project where I'd like to just read data from a marshalled Python object and this solution fits the bill.
The code object I'm trying to deserialize is from Python 2.7 which appears to have a slightly different layout for
TYPE_CODE
. If you try to deserialize this data currently, the code object will advance the stream beyond where it should be and you'll start deserializing incorrect data.I wrote a patch to add basic support for Python 2.7:
There's a lot going on here, but the basic change is this:
This on its own is not enough, however. At least for the object I'm trying to deserialize, things break again once the
filename
field is deserialized.r_object
is invoked again to read the filename string, and then you hit this block:Note the change to
r_object_extract_string
:Type::String
currently returns anObj::Bytes
butextract_string
expects anObj::String
. I'm not sure what the right approach is here. I ended up just changing all of the strings to beVec<u8>
s since, at least in my case, the strings may contain wildly invalid UTF-8.I wanted to ask: are you interested in supporting Python 2.7 deserialization? I wouldn't mind cleaning up this patch a bit and adding tests. I would need to study the Python 2.7 marshal code a bit more though to understand what the right approach here is for handling these strings correctly.
The text was updated successfully, but these errors were encountered: