Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial muslheap port #2186

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

fidgetingbits
Copy link
Contributor

@fidgetingbits fidgetingbits commented May 24, 2024

This is my initial port of muslheap to work inside of pwndbg. All credit goes to xf1les, and I've put a note in CREDITS.md.

At the moment this PR branch won't be usable without applying some of the other pending PRs I've put up (namely #2185, #2157, and #2155), so it's mostly for initial review now. Once those get sorted, I'll rebase this and update so people know they can test it.

There are 2 commits in this PR. The first is the main commands being added, in a second is the tests:

New heap commands

For now I've added a new category for musl mallocng, and changed the old heap category to glibc heap. Atm the command file is called muslheap.py but actually since this is mallocng-specific, it may be worth renaming, as perhaps support for the old musl heap allocator will get added eventually.

The functionality with the original ported is the mostly same (except I excluded the mmagic command as it's not really heap related), but the relevant parts that could be switched to use pwndbg APIs I've changed where I could figure out what to do. I've made minor changes to the names (use slot instead of chunk). I did some minor reorganization and reduce some duplication, which will make future refactoring easier. I also added more robust symbol look up handling for the group structure, since the original never worked for me.

The goal here is to just get some baseline version in that lets pwndbg users start playing with it and then we start adding more versions, static musl support, more commands, etc in future PR

There are some FIXME I added for things I'm not sure if there's better ways to do it in pwndbg, or just for things I want to add. Other XXX, etc are likely from the original author.

Tests

I've added 1.2.4 dynamic musl test and functionality, to get the ball rolling. The ported library was originally designed for 1.2.2, so I suspect this will actually work well across most versions but I haven't tested.

For the tests I added aggressive asserts for almost all output to make sure nothing breaks initially while refactoring.

Test binary blobs

There's binary blobs that this includes in order to build the binary, namely the musl 1.2.4 shared library and .debug file, as well as .o files for crt linking.

The .o files I included because zig is broken compiling musl targets, and will always compile the static version, so instead of specifying a target I manually point to the object files. Its likely that some of these can actually be removed, as for instance were not building a PIE test, etc. Also perhaps there's a better way to do this, as I haven't used zig before, so would be happy to drop the .o's if possible.

I put in the readme how I extracted those files and from where, with the sha256 hashes. Before this gets merged someone else should show that this is consistent with what they get and there are seeing the same hashes.

One other thing which I'm not sure if you folks have discussed before is that if we hypothetically add a bunch of versions of musl libraries (1.2.1 through 1.2.5 currently), and eventually also the libc.a (which is about 10M), this will get quite large to store in the repo, so it might be worth discussing if eventually some of this stuff should be in a separate repo and cloned by the developer scripts or whatever.

@fidgetingbits fidgetingbits marked this pull request as draft May 24, 2024 09:13
@CptGibbon CptGibbon self-requested a review May 24, 2024 16:14
@CptGibbon
Copy link
Collaborator

One other thing which I'm not sure if you folks have discussed before is that if we hypothetically add a bunch of versions of musl libraries (1.2.1 through 1.2.5 currently), and eventually also the libc.a (which is about 10M), this will get quite large to store in the repo, so it might be worth discussing if eventually some of this stuff should be in a separate repo and cloned by the developer scripts or whatever.

This was briefly discussed with regard to glibc blobs, but I don't think a consensus was reached.
I believe one option was to (as you suggest) host a separate repo under the pwndbg organization that held binary blobs used for development and only clone them into development environments, perhaps using e.g. setup-dev.sh

I defer to @disconnect3d since it's their train set.

@CptGibbon
Copy link
Collaborator

CptGibbon commented May 28, 2024

This looks promising 👌
Before I clone this PR, it looks like the tests are failing due to:

mheapinfo_output = gdb.execute("mheapinfo", to_string=True)
E       gdb.error: Error occurred in Python: module 'pwndbg.gdblib.symbol' has no attribute 'value'

I take it this code is what relies on #2185?

As an aside, 1400+ lines in one PR is a lot for a reviewer, 1400+ lines in 2 commits is a pretty wild workflow!
Making smaller changes can benefit both you and reviewers, it could be worth your time to check out anything Dave Farley has to say on continuous integration or perhaps a recent edition of Martin Fowler's book Refactoring.

@CptGibbon
Copy link
Collaborator

Also I believe the terms of the MIT license require the following line be added to pwndbg's LICENSE.md
"Copyright (c) 2021 xf1les"

@fidgetingbits
Copy link
Contributor Author

fidgetingbits commented May 29, 2024

This looks promising 👌 Before I clone this PR, it looks like the tests are failing due to:

mheapinfo_output = gdb.execute("mheapinfo", to_string=True)
E       gdb.error: Error occurred in Python: module 'pwndbg.gdblib.symbol' has no attribute 'value'

I take it this code is what relies on #2185?

Correct, I did mention that (and another required PR) at the beginning of the PR summary.

As an aside, 1400+ lines in one PR is a lot for a reviewer, 1400+ lines in 2 commits is a pretty wild workflow! Making smaller changes can benefit both you and reviewers, it could be worth your time to check out anything Dave Farley has to say on continuous integration or perhaps a recent edition of Martin Fowler's book Refactoring.

Thanks I'll take a look. I debated breaking this up into smaller PRs (also why I sent a bunch before this PR), and I'm open to it here if it's preferred. The main reasons I didn't is it makes it easier to test. I could break it up into one PR per command and maybe try to break out some parts of the backend.

The real development was not just 2 commits of course, but also not clean enough of a workflow to port here to be reviewed in logical chunks. That said, I can also try to create a new series of smaller logical commits for this PR to ease the burden of review.

@fidgetingbits
Copy link
Contributor Author

I updated the license and added type annotations as I realized I missed those and there was a recent PR adding them everywhere else.

@CptGibbon
Copy link
Collaborator

The main reasons I didn't is it makes it easier to test.

Trust me, Dave Farley is your man 👍

I could break it up into one PR per command and maybe try to break out some parts of the backend.

I think since much of this is copy pasted from https://github.com/xf1les/muslheap there's no need to break it down further. I can take the time to review this as-is 👍

@CptGibbon
Copy link
Collaborator

@gsingh93 how come these latest tests fail but Unit tests still gets a green tick?

@gsingh93
Copy link
Member

#2199 should fix the unit test issue

@@ -2,6 +2,8 @@ The MIT License (MIT)

Copyright (c) 2023 Dominik 'Disconnect3d' Czarnota

Copyright (c) 2021 xf1les
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the right way to do this, doesn't this give them copyright over the entire project? Is there a way we're supposed to specify exactly what the copyright is over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya wasn't entirely clear to me, I just went off https://law.stackexchange.com/questions/16847/how-should-multiple-copyright-holders-be-credited-using-the-mit-license which says if you derive from someones code and aren't working with them directly you use the way I did it.

Possibly we could do something like:

The musl heap analysis functionality is derived from muslheap:

Copyright (c) 2021 xf1les

...

which is what it suggests in https://opensource.stackexchange.com/questions/10506/how-to-sub-license-a-modified-mit-licensed-project-as-a-copyright.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure there's specific guidance on this wrt the MIT license, I've seen a few different ways of doing it.

e.g. Microsoft just concatenate every license into one file: https://github.com/Microsoft/vscode/blob/main/ThirdPartyNotices.txt

I've seen advice stating just add the names (and sometimes dates) to the copyright holder section.
Sometimes (as in the Microsoft example) there is more detail as to what exactly was copied.

For a deeper dive into the MIT license:
https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html

@CptGibbon
Copy link
Collaborator

One solution to the licensing issue could be to include the MIT license with the original author's name in-line within the mallocng.py file.

Since that file is the only one that contains code written by the original author (correct me if I'm wrong) I don't think anyone would quibble that it wasn't being licensed properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants