Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing existing jemalloc profile /proc/id/maps section #7

Open
Rjected opened this issue Apr 8, 2024 · 4 comments
Open

Support parsing existing jemalloc profile /proc/id/maps section #7

Rjected opened this issue Apr 8, 2024 · 4 comments
Assignees

Comments

@Rjected
Copy link

Rjected commented Apr 8, 2024

Right now parse_jeheap uses MAPPINGS to populate mapping information:

if let Some(mappings) = MAPPINGS.as_ref() {
for mapping in mappings {
profile.push_mapping(mapping.clone());
}
}

When using parse_jeheap on an existing file, this still collects mappings for the running process:

/// Mappings of the processes' executable and shared libraries.
#[cfg(target_os = "linux")]
pub static MAPPINGS: Lazy<Option<Vec<Mapping>>> = Lazy::new(|| {
/// Build a list of mappings for the passed shared objects.
fn build_mappings(objects: &[SharedObject]) -> Vec<Mapping> {
let mut mappings = Vec::new();
for object in objects {
for segment in &object.loaded_segments {
// I have observed that `memory_offset` can be negative on some very old
// versions of Linux (e.g. CentOS 7), so use wrapping add here.
let memory_start = object.base_address.wrapping_add(segment.memory_offset);
mappings.push(Mapping {
memory_start,
memory_end: memory_start + segment.memory_size,
memory_offset: segment.memory_offset,
file_offset: segment.file_offset,
pathname: object.path_name.clone(),
build_id: object.build_id.clone(),
});
}
}
mappings
}
// SAFETY: We are on Linux, and this is the only place in the program this
// function is called.
match unsafe { crate::linux::collect_shared_objects() } {
Ok(objects) => Some(build_mappings(&objects)),
Err(err) => {
error!("build ID fetching failed: {err}");
None
}
}
});
#[cfg(not(target_os = "linux"))]
pub static MAPPINGS: Lazy<Option<Vec<Mapping>>> = Lazy::new(|| {
error!("build ID fetching is only supported on Linux");
None
});
/// Information about a shared object loaded into the current process.
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct SharedObject {
/// The address at which the object is loaded.
pub base_address: usize,
/// The path of that file the object was loaded from.
pub path_name: PathBuf,
/// The build ID of the object, if found.
pub build_id: Option<BuildId>,
/// Loaded segments of the object.
pub loaded_segments: Vec<LoadedSegment>,
}
/// Build ID of a shared object.
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct BuildId(Vec<u8>);
impl fmt::Display for BuildId {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for byte in &self.0 {
write!(f, "{byte:02x}")?;
}
Ok(())
}
}
/// A segment of a shared object that's loaded into memory.
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct LoadedSegment {
/// Offset of the segment in the source file.
pub file_offset: u64,
/// Offset to the `SharedObject`'s `base_address`.
pub memory_offset: usize,
/// Size of the segment in memory.
pub memory_size: usize,
}
/// Collects information about all shared objects loaded into the current
/// process, including the main program binary as well as all dynamically loaded
/// libraries. Intended to be useful for profilers, who can use this information
/// to symbolize stack traces offline.
///
/// Uses `dl_iterate_phdr` to walk all shared objects and extract the wanted
/// information from their program headers.
///
/// SAFETY: This function is written in a hilariously unsafe way: it involves
/// following pointers to random parts of memory, and then assuming that
/// particular structures can be found there. However, it was written by
/// carefully reading `man dl_iterate_phdr` and `man elf`, and is thus intended
/// to be relatively safe for callers to use. Assuming I haven't written any
/// bugs (and that the documentation is correct), the only known safety
/// requirements are:
///
/// (1) It must not be called multiple times concurrently, as `dl_iterate_phdr`
/// is not documented as being thread-safe.
/// (2) The running binary must be in ELF format and running on Linux.
pub unsafe fn collect_shared_objects() -> Result<Vec<SharedObject>, anyhow::Error> {
let mut state = CallbackState {
result: Ok(Vec::new()),
};
let state_ptr = std::ptr::addr_of_mut!(state).cast();
// SAFETY: `dl_iterate_phdr` has no documented restrictions on when
// it can be called.
unsafe { dl_iterate_phdr(Some(iterate_cb), state_ptr) };
state.result
}

When profiling is enabled and disabled inline, this makes sense.
However, for an existing jemalloc profile, this is counter-intuitive, because the returned mappings are not actually generated from the heap file.

For example, if this method is run on an existing heap file, the returned mappings will be different each time, and do not actually match what exists in the file.

It would be great to support parsing the /proc/id/maps output from the .heap file.

@brancz
Copy link
Member

brancz commented Apr 9, 2024

Can you explain the use case a bit? Are you looking to extract the profile from outside of the process? What would you expect the API to look like?

cc @umanwizard

@umanwizard umanwizard self-assigned this Apr 9, 2024
@umanwizard
Copy link
Collaborator

If I understand correctly, @Rjected wants to use the library to parse unrelated jeheap files from other processes and transform them to pprof, which requires a way to provide the mappings. I think we can support this, I'll look into it later this week (Friday)

@Rjected
Copy link
Author

Rjected commented Apr 9, 2024

If I understand correctly, @Rjected wants to use the library to parse unrelated jeheap files from other processes and transform them to pprof, which requires a way to provide the mappings. I think we can support this, I'll look into it later this week (Friday)

Yep that's exactly it! although I'm attempting to analyze them in the firefox profiler instead of pprof, like this:
https://share.firefox.dev/3PIeyE6

@Rjected
Copy link
Author

Rjected commented May 20, 2024

I did some thinking on what would be required to support this (for linux only for now):

  • proc maps parsing, from a Lines struct - could be done in the rsprocmaps crate
  • parsing of the elf note header for the population of the debug id. I'm not sure how comfortable I am parsing ELF headers by hand, so I would use a crate for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants