-
Notifications
You must be signed in to change notification settings - Fork 854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial muslheap port #2186
base: dev
Are you sure you want to change the base?
Initial muslheap port #2186
Conversation
This was briefly discussed with regard to glibc blobs, but I don't think a consensus was reached. I defer to @disconnect3d since it's their train set. |
This looks promising 👌
I take it this code is what relies on #2185? As an aside, 1400+ lines in one PR is a lot for a reviewer, 1400+ lines in 2 commits is a pretty wild workflow! |
Also I believe the terms of the MIT license require the following line be added to pwndbg's |
Correct, I did mention that (and another required PR) at the beginning of the PR summary.
Thanks I'll take a look. I debated breaking this up into smaller PRs (also why I sent a bunch before this PR), and I'm open to it here if it's preferred. The main reasons I didn't is it makes it easier to test. I could break it up into one PR per command and maybe try to break out some parts of the backend. The real development was not just 2 commits of course, but also not clean enough of a workflow to port here to be reviewed in logical chunks. That said, I can also try to create a new series of smaller logical commits for this PR to ease the burden of review. |
0607b9d
to
e723064
Compare
I updated the license and added type annotations as I realized I missed those and there was a recent PR adding them everywhere else. |
Trust me, Dave Farley is your man 👍
I think since much of this is copy pasted from https://github.com/xf1les/muslheap there's no need to break it down further. I can take the time to review this as-is 👍 |
@gsingh93 how come these latest tests fail but Unit tests still gets a green tick? |
#2199 should fix the unit test issue |
@@ -2,6 +2,8 @@ The MIT License (MIT) | |||
|
|||
Copyright (c) 2023 Dominik 'Disconnect3d' Czarnota | |||
|
|||
Copyright (c) 2021 xf1les |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is the right way to do this, doesn't this give them copyright over the entire project? Is there a way we're supposed to specify exactly what the copyright is over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya wasn't entirely clear to me, I just went off https://law.stackexchange.com/questions/16847/how-should-multiple-copyright-holders-be-credited-using-the-mit-license which says if you derive from someones code and aren't working with them directly you use the way I did it.
Possibly we could do something like:
The musl heap analysis functionality is derived from muslheap:
Copyright (c) 2021 xf1les
...
which is what it suggests in https://opensource.stackexchange.com/questions/10506/how-to-sub-license-a-modified-mit-licensed-project-as-a-copyright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure there's specific guidance on this wrt the MIT license, I've seen a few different ways of doing it.
e.g. Microsoft just concatenate every license into one file: https://github.com/Microsoft/vscode/blob/main/ThirdPartyNotices.txt
I've seen advice stating just add the names (and sometimes dates) to the copyright holder section.
Sometimes (as in the Microsoft example) there is more detail as to what exactly was copied.
For a deeper dive into the MIT license:
https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html
One solution to the licensing issue could be to include the MIT license with the original author's name in-line within the mallocng.py file. Since that file is the only one that contains code written by the original author (correct me if I'm wrong) I don't think anyone would quibble that it wasn't being licensed properly. |
This is my initial port of muslheap to work inside of pwndbg. All credit goes to xf1les, and I've put a note in CREDITS.md.
At the moment this PR branch won't be usable without applying some of the other pending PRs I've put up (namely #2185, #2157, and #2155), so it's mostly for initial review now. Once those get sorted, I'll rebase this and update so people know they can test it.
There are 2 commits in this PR. The first is the main commands being added, in a second is the tests:
New heap commands
For now I've added a new category for musl mallocng, and changed the old heap category to glibc heap. Atm the command file is called muslheap.py but actually since this is mallocng-specific, it may be worth renaming, as perhaps support for the old musl heap allocator will get added eventually.
The functionality with the original ported is the mostly same (except I excluded the mmagic command as it's not really heap related), but the relevant parts that could be switched to use pwndbg APIs I've changed where I could figure out what to do. I've made minor changes to the names (use slot instead of chunk). I did some minor reorganization and reduce some duplication, which will make future refactoring easier. I also added more robust symbol look up handling for the group structure, since the original never worked for me.
The goal here is to just get some baseline version in that lets pwndbg users start playing with it and then we start adding more versions, static musl support, more commands, etc in future PR
There are some FIXME I added for things I'm not sure if there's better ways to do it in pwndbg, or just for things I want to add. Other XXX, etc are likely from the original author.
Tests
I've added 1.2.4 dynamic musl test and functionality, to get the ball rolling. The ported library was originally designed for 1.2.2, so I suspect this will actually work well across most versions but I haven't tested.
For the tests I added aggressive asserts for almost all output to make sure nothing breaks initially while refactoring.
Test binary blobs
There's binary blobs that this includes in order to build the binary, namely the musl 1.2.4 shared library and .debug file, as well as .o files for crt linking.
The .o files I included because zig is broken compiling musl targets, and will always compile the static version, so instead of specifying a target I manually point to the object files. Its likely that some of these can actually be removed, as for instance were not building a PIE test, etc. Also perhaps there's a better way to do this, as I haven't used zig before, so would be happy to drop the .o's if possible.
I put in the readme how I extracted those files and from where, with the sha256 hashes. Before this gets merged someone else should show that this is consistent with what they get and there are seeing the same hashes.
One other thing which I'm not sure if you folks have discussed before is that if we hypothetically add a bunch of versions of musl libraries (1.2.1 through 1.2.5 currently), and eventually also the libc.a (which is about 10M), this will get quite large to store in the repo, so it might be worth discussing if eventually some of this stuff should be in a separate repo and cloned by the developer scripts or whatever.