Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(query-source): Fix query source relative filepath #2717

Merged
merged 9 commits into from
Feb 21, 2024

Conversation

gggritso
Copy link
Member

@gggritso gggritso commented Feb 7, 2024

tl;dr

When generating the filename attribute for stack trace frames, the SDK uses the filename_for_module function. When generating the code.filepath attribute for query spans, the SDK does not use that function. Because of this inconsistency, code mappings that work with stack frames sometimes don't work with queries that come from the same files.

This PR makes sure that query sources use filename_for_module, so the paths are consistent.

filename_for_module

The filename_for_module function, roughly, does this:

  • accepts a module (e.g., "sentry.api.serializers.base") and a full file path (e.g., "/opt/software/sentry/src/sentry/api/serializers/base.py")
  • finds the "base" module (in this case, "sentry")
  • looks up the base "sentry" module in sys.modules
  • fetches the path of the base module (in this case "/opt/software/sentry/src/sentry/__init__.py"), and strips the last two segments (in this case, `"/opt/software/sentry/src/")
  • removes that path from the filename (in this case, it would return "sentry/api/serializers/base.py")

This has a similar effect to stripping the project_root but even better.

Testing filename_for_module

filename_for_module only starts to differ from regular behaviour if a module is manually added to sys.path. For example, the Sentry codebase likes to import Sentry modules like from sentry.x import y, rather than relative paths. This is possible because sentry is added to the module path. In these cases, Python will report the frame's module (frame.f_globals.get("__name__")) as sentry.x since that's how it was imported.

In order to test this behaviour, I added a small helper module to each relevant integration, and added it to sys.path manually, so it can be referenced without using relative paths.

N.B. Python 2 has different behaviour! Even for modules that are added to sys.path and imported by name, it still reports the frame's module by it's full path, so filename_for_module can't remove the extra prefixed path segments. IMO this is fine since it degrades gracefully. The module stripping behaviour is more of an enhancement, since it's possible to work around it with code mappings.

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Without this line, a lot of the forked tests don't run on macOS at all. Apparently this is a safe thing to ignore most of the time, but you tell me!

Product Context

We're seeing an issue in Sentry where query source file paths aren't matched to GitHub URLs because the path prefixes are different from the prefixes generated for stack traces for the same files.

e.g., in getsentry/sentry good path is like sentry/api/serializers/base.py, but query sources for that file have the file path sentry/src/sentry/api/serializers/base.py: we're adding an extra sentry/src prefix to the path. The existing GitHub file path mappings don't work, and there's no "Open in GitHub" link as a result.

It is possible to set up more code mappings to resolve this, but I think the inconsistency is worth fixing.

@sentrivana
Copy link
Contributor

Heyy @gggritso 👋🏻 The change and the reasoning make sense to me.

As for testing this: we usually try to make sure each change is covered by a test. In this case there are already testcases for query source in the three different integrations where we support it, so ideally this should just be a matter of taking an existing testcase, copying it and modifying it slightly to trigger behavior similar to what you're seeing in Sentry.

I think these specific test cases could be repurposed easily:

You'll probably need to add some custom project_root to the sentry_init and then check that data.get(SPANDATA.CODE_FILEPATH) looks the way it should.

As for Python 2 support, at this stage we still support it 100%, so ideally after this change you'd get the same filepath regardless of Python version. Verifying this locally is a bit tricky because you need some pyenv-like multiple py version setup, but the CI will run all tests suites in Python 2 as well, so as long as you write the new tests cases and they pass the CI, that should be enough to verify that it works in py2 too.

Use `filename_for_module` is possible. This will create the same source
paths as our stack trace handling.
These are mostly benign and show up on recent macOS versions. Turning
them off so tests can succeed.
- add a simple helper module to each integration test suite
- load the module directly into `sys.path` so it can be referenced
- check that the module name and path are correct
@gggritso gggritso force-pushed the ggg/fix-query-source-base-path branch from 27640ef to d0125d6 Compare February 14, 2024 17:00
@gggritso
Copy link
Member Author

@sentrivana thanks for the hints! I added a unit test for each of the three integrations you mentioned. It was a little complicated, so I updated the PR description to explain everything that's going on (it wasn't a project_root thing, it was more of a module import thing).

As for Python 2, I'm not sure I can get it to work perfectly. More in the PR, but the behaviour there is different enough that it doesn't really work. You tell me, but since this is an improvement to a feature (Python 2 still has a good graceful fallback!) and I'm not breaking anything existing, it might be worth just merging.

P.S. Do you know how long Sentry will keep supporting Python 2?

@gggritso gggritso marked this pull request as ready for review February 14, 2024 18:02
Copy link
Contributor

@sentrivana sentrivana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left two nits on the asyncpg tests but they apply to all three test suites.

tests/integrations/asyncpg/__init__.py Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd just call the directory helpers since it's already in an asyncpg dir/namespace.

It'd also be cool to have a comment either here or where the function is used explaining why this exists and why it needs to be in a separate directory.

@sentrivana
Copy link
Contributor

Regarding 2.7 support -- we're actually working on a new major for the SDK where we drop 2.7 support. So coming soon! The target date is end of February.

@gggritso
Copy link
Member Author

@sentrivana I added some comments to explain why the sys.path editing is there, let me know if you want me to add more, plus a docstring to the tests

As for the dir/namespace, I'm a little stuck there. I'd like to rename sqlalchemy_helpers to helpers, but if the helper directories in asyncpg, sqlalchemy and django are all called helpers there will be a conflict at import time, no?

I thought about it for a while, and tried to re-do the tests so they only edit sys.path for the one test, but I couldn't figure out how to do it for Django. There, the app is created as a fixture, so the routes are only imported once. I need a way to change the module search path, import a different URL config, create an app with this different config, run tests, and then clean up the module search path. I couldn't figure out a clean way of doing it. What do you think?

@sentrivana
Copy link
Contributor

@gggritso I see -- I think in that case it's ok to leave the *_helpers dirs as is. There's probably a way around it but I also don't have any ideas atm. Feel free to merge :shipit: (You might need to click the Update branch button before you can do that though.)

@sentrivana sentrivana enabled auto-merge (squash) February 21, 2024 10:28
@sentrivana sentrivana merged commit 2eeb8c5 into master Feb 21, 2024
122 of 123 checks passed
@sentrivana sentrivana deleted the ggg/fix-query-source-base-path branch February 21, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants