Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0023 Improve fingerprinting for performance issues #24

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

smeubank
Copy link
Member

@smeubank smeubank commented Oct 4, 2022

Request for ideas around improving fingerprinting to enable scaling out to more performance issue types

Rendered RFC

@smeubank smeubank linked an issue Oct 4, 2022 that may be closed by this pull request
…rove-fingerprinting-for-performance-issues.md
@smeubank smeubank changed the title Create 0023-improve-fingerprinting-for-performance-issues 0023 Improve fingerprinting for performance issues Oct 4, 2022

# Options Considered

1. Client side/SDK Includes application file name in span/transaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is inspired from opentelemetry source code attributes, but we have to be careful about the runtime costs here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where the original inspiration came from, pull this from a notion doc with performance team, will try to get them involved here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also inspired by our PHP SDK which surprisingly send code line numbers with the spans. We're worried about runtime costs, too, but it's worth looking into since it would dramatically improve fingerprinting quality

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime costs here

I'd have to check the impact again, but I think the way we do stacktraces (serializing etc) in some of our sdks adds to the runtime cost instead of just fetching frames in the callstack. Something to keep in mind if we're not displaying a stacktrace is perhaps there are some savings we can do solely for the sake of fingerprinting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Cocoa, serializing the stacktrace for a single thread would be too much overhead for multiple spans during a transaction. We could do that for selected spans, and maybe we don't need the full stacktrace, but only the top n frames. If we only want to use this for fingerprinting and don't want to display the stacktrace, we could focus on only sending the bare minimum for symbolication to reduce the payload size.


1. Client side/SDK Includes application file name in span/transaction
2. Can we fetch more info on Sentry server side from profiling if available
3. Could SDKs detect something and create a unique identifier artificially to empower fingerprinting?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we can do is inject more thing during build time. For browser SDKs we have bundler plugins, in android we have the gradle plug-in, and in web SDKs we run in CI to create a release.

we can dynamically inject code based on parsed modules to attach metadata to events. If it’s expensive to generate this per event, we can send it as part of the release, and then resolve it server-side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Django (or other frameworks) have any useful runtime state that we could inject into the spans? e.g., when we were talking about detector evidence, there was an idea of parsing out the table names from the span description, and looking up the names of the corresponding Django models. Any relationship (even tenuous) between the span description and the code is helpful!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so crazy idea. We could potentially walk the AST of any codebase and upload that information during CI, since Sentry often runs in CI to do things like upload release artifacts (debug symbols, sourcemaps) and create a release.

This means we can actually do this evidence mapping in Relay itself, since it can just use the release information to make better decisions about spans. For example, in Django we could upload all the models, views, controllers - and then map those automatically to certain spans/transactions based on their name. We could even upload line/col numbers during this process, and then resolve it like how we do stacktraces in the product.

@mitsuhiko this comes back to our discussions about uploading transaction name info during build time.

Also @benvinegar @HazAT comes back to our general discussions about build time insights, uploading information at build that helps enhance our production data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During CI / buildtime is a good idea 👍 we just have to make sure the venn diagram of overlapping performance / sdk / and people willing to setup CI tools from us makes this worth it as a happy path.

This means we can actually do this evidence mapping in Relay itself

I think we'd have to be careful with performance costs since we'll potentially want to run this on processed events and pulling data into Relay would be expensive. Potentially we could do something like the projectoption cache sync, but we'd want fairly responsive cache updates upon pushing so we catch performance issues properly upon deploy instead of having a lag and making the start time of the issue not line up. Additionally, we would need to figure out if a non-AST evidence issue and an AST evidence issue could be merged so as not to have 2 different issues in the time between a cache update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must ensure that those artificial fingerprints work across releases and multiple SDK installations. This could be a challenge.

# Options Considered

1. Client side/SDK Includes application file name in span/transaction
2. Can we fetch more info on Sentry server side from profiling if available

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this for a bit and the main thing right now is that we don't have a simple way / any infrastructure built that makes it possible to leverage profiling data from other products. A big consideration going forward but for now it makes it unfeasible to rely on profiling for any of the work for next quarter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will be nice to leverage profiling in the future but I agree it's probably premature to plan for it in fingerprinting if we don't have the infrastructure setup yet.

Comment on lines +49 to +50
* Can we specify why fingerprinting is required?
* what the threshold would be for uniqueness to enable fingerprinting for performance issues?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fingerprinting is required for grouping problematic transactions into issues. The goal of grouping is to create exactly one issue for exactly one performance problem in the code. Fingerprints are the mechanism that accomplishes this for both error events and transaction events. Each event is "hashed" into a string fingerprint. Events with the same fingerprint are grouped and become an Issue!

Fingerprinting transaction events is hard because we don't have stack traces. We can't determine the code location precisely, so have to infer it using spans. When fingerprinting is too strict, we end up creating lots of issues even though the transactions come from one piece of code. This causes notification spam. When fingerprinting is too loose we end up creating just one group for multiple unrelated problems. This causes issues to become unresolved when a user fixes just one of the underlying problems.

That's the overall threshold/heuristic we're striving towards: to create exactly one issue for one piece of code that causes the performance problems 🙏🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fingerprinting transaction events is hard because we don't have stack traces. We can't determine the code location precisely

We could add the stacktrace to spans for some performance issues, such as file I/O on the main thread. As we have to detect that on the client side, we can attach the stacktrace of the calling thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fingerprinting transaction events is hard because we don't have stack traces. We can't determine the code location precisely

We could add the stacktrace to spans for some performance issues, such as file I/O on the main thread. As we have to detect that on the client side, we can attach the stacktrace of the calling thread.

1. Client side/SDK Includes application file name in span/transaction
2. Can we fetch more info on Sentry server side from profiling if available
3. Could SDKs detect something and create a unique identifier artificially to empower fingerprinting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion option 4: Add the stacktrace of the calling thread when the performance issue relies on SDK side detection, such as file I/O on the main thread.

Comment on lines +49 to +50
* Can we specify why fingerprinting is required?
* what the threshold would be for uniqueness to enable fingerprinting for performance issues?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fingerprinting transaction events is hard because we don't have stack traces. We can't determine the code location precisely

We could add the stacktrace to spans for some performance issues, such as file I/O on the main thread. As we have to detect that on the client side, we can attach the stacktrace of the calling thread.

Comment on lines +49 to +50
* Can we specify why fingerprinting is required?
* what the threshold would be for uniqueness to enable fingerprinting for performance issues?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fingerprinting transaction events is hard because we don't have stack traces. We can't determine the code location precisely

We could add the stacktrace to spans for some performance issues, such as file I/O on the main thread. As we have to detect that on the client side, we can attach the stacktrace of the calling thread.


# Options Considered

1. Client side/SDK Includes application file name in span/transaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Cocoa, serializing the stacktrace for a single thread would be too much overhead for multiple spans during a transaction. We could do that for selected spans, and maybe we don't need the full stacktrace, but only the top n frames. If we only want to use this for fingerprinting and don't want to display the stacktrace, we could focus on only sending the bare minimum for symbolication to reduce the payload size.


1. Client side/SDK Includes application file name in span/transaction
2. Can we fetch more info on Sentry server side from profiling if available
3. Could SDKs detect something and create a unique identifier artificially to empower fingerprinting?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must ensure that those artificial fingerprints work across releases and multiple SDK installations. This could be a challenge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve fingerprinting for performance issues
6 participants