Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeating query in django view that failed due to OperationalError (timeout) #2114

Open
Pokapiec opened this issue Sep 2, 2024 · 5 comments
Labels
agent-python community Issues opened by the community triage Issues awaiting triage

Comments

@Pokapiec
Copy link

Pokapiec commented Sep 2, 2024

Describe the bug: When in django view some expensive query gets executed and it exceeds statement_timeout, OperationalError occurs and it makes repeatedly the same query X times (where X is probably the amount of stack traces pointing to code parts attached to QuerySet django class). Basically from what i've seen in the elasticapm codebase, elasticapm.utils.encoding.transform function executes repr internal python func on QuerySet class and under the hood, it refetches the whole query.

In our project we use forked version of elastic-apm library with small change where we add info about current tenant to each transaction in celery.

To Reproduce

  1. Write django view where you execute some ORM SELECT query
  2. Set statement_timeout session variable to some low value like 1ns to simulate OperationalError (SET LOCAL statement_timeout='1ns')
  3. Log all sql statements executed in view in django server logs (see repeating statement)

Environment (please complete the following information)

  • OS: Debian 12.6
  • Python version: 3.12.4
  • Framework and version: Django 4.2.15
  • APM Server version: 8.8.1
  • Agent version: 6.13.1

Additional context

Add any other context about the problem here.

  • Agent config options
Click to expand
ELASTIC_APM = {
    "ENABLED": USE_ELASTIC_APM,
    "SERVICE_NAME": "tenants",
    "SECRET_TOKEN": config("ELASTIC_APM_SECRET_TOKEN", cast=str),
    "SERVER_URL": config("ELASTIC_APM_SERVER_URL", cast=str),
    "ENVIRONMENT": config("ELASTIC_APM_ENVIRONMENT", cast=str, default="production"),
    "TRANSACTION_SAMPLE_RATE": config("ELASTIC_APM_TRANSACTION_SAMPLE_RATE", cast=float, default="0.01"),
    "PROCESSORS": (
        "core.elasticapm.sanitize_email",
        "core.elasticapm.sanitize_cookie",
        "core.elasticapm.add_transaction_sampling",
        "core.elasticapm.drop_unsampled_spans",
        "core.elasticapm.add_transaction_url_name",
        "core.elasticapm.add_sql_context",
    ),
    "DEBUG": config("ELASTIC_APM_DEBUG", cast=bool, default=False),
    "CAPTURE_HEADERS": False,  # No need for HTTP headers
    "LOG_LEVEL": "off",  # We caputre logs not via APM but via filebeat
    "AUTO_LOG_STACKS": False,  # Exceptions stack traces are kept in sentry
}
- `pyproject.toml`:
Click to expand
[tool.poetry.dependencies]
python = "~3.12"

aioapns = "3.2"
beautifulsoup4 = "~4.12"
bleach = "~6.1"
boto3 = "~1.34"
celery = "5.4.0"
chardet = "~5.2"
dateparser = "~1.2"
Django = "4.2.15"
djangorestframework = "~3.15"
django-cacheops = "7.0.2"
django-countries = "~7.4"
django-cors-headers = "~4.4.0"
django-filter = "~23.1"
django-fsm = "~2.8"
django-ipware = "~5.0"
django-mptt = "~0.14"
django-ordered-model = "~3.7"
django-recaptcha = "~3.0"
django-storages = "~1.13"
dnspython = "~2.6"
drf-yasg = "~1.21"
easy-thumbnails = "~2.8"
factory-boy = "~3.3"
fcm-django = "~2.1.0"
geoip2 = "~4.1.0"
google-api-python-client = "~1.12.8"
hashids = "~1.3.1"
html2text = "~2020.1.16"
IMAPClient = "~2.3"
lupa = "2.1"
messente-api = "1.5.0"
msal = "~1.25"
msgpack = "~1.0.0"
oauth2client = "~4.1.3"
phonenumberslite = "~8.12.3"
psycopg2-binary = "~2.9.5"
python-decouple = "~3.4"
python-facebook-api = "0.18.0"
python-json-logger = "2.0.2"
python-magic = "~0.4"
python-slugify = "~4.0.1"
pytz = "~2021.3"
pywebpush = "~1.11.0"
redis = "4.5.4"
sentry-sdk = "1.45.0"
ShopifyAPI = "9.0.0"
smsapi-client = "2.9.6"
tqdm = "~4.66"
twilio = "9.2.3"
ujson = "~5.8"
user-agents = "~2.2.0"
google-auth = "2.29.0"
# used by third-party apps
Brotli = "~1.0.9"
clicksend-client = "^5.0.72"
html5lib = "~1.1"
lxml = "~5.1.0"
Pillow = "~10.3"
celery-redbeat = "2.2.0"
ipdb = "0.13.4"
ipython = "~8.18"
libmagic = "~1.0"
ndg-httpsclient = "~0.5.1"
uWSGI = "2.0.26"

[tool.poetry.group.internal.dependencies]
# This is just version of elastic-apm=6.13.1 but with minor patch
elastic-apm = { git = "https://poetry:[email protected]/developers/elasticapm_tenants.git", tag = "v6.13.1-our" }

[tool.poetry.dev-dependencies]
argh = "0.31.2"
django-debug-toolbar = "*"
django-extensions = "~3.2.0"
pyroscope-io = "~0.8"
tblib = "*"
watchdog = "2.1.9"
black = "22.12.0"
coverage = "~7.0.0"
dotmap = "~1.3.8"
fakeredis = "~2.22"
flake8 = "6.0.0"
safety = "~3.2"
selenium = "~3.141.0"
sphinx = "6.2.1"
recommonmark = "0.7.1"
nbsphinx = "0.8.2"
sphinx-rtd-theme = "0.5.1"
sphinx-autobuild = "2021.3.14"
@github-actions github-actions bot added agent-python community Issues opened by the community triage Issues awaiting triage labels Sep 2, 2024
@xrmx
Copy link
Member

xrmx commented Sep 3, 2024

Thanks for reporting. I think this is expected given how the Django QuerySet works, they are lazy and evaluated only when used. Said that some special casing for the case may be possible if the QuerySet has not been already evaluated and there is a way to know it is not already cached?

@Pokapiec
Copy link
Author

Pokapiec commented Sep 3, 2024

If the query to database was too heavy to be evaluated there is no way it was cached. You can only cache QuerySet that is computable (it's execution time fits in statement timeout).

@xrmx
Copy link
Member

xrmx commented Sep 3, 2024

If the query to database was too heavy to be evaluated there is no way it was cached. You can only cache QuerySet that is computable (it's execution time fits in statement timeout).

Exactly, so if we don't find it cached it means we should not evaluate it.

@Pokapiec
Copy link
Author

Pokapiec commented Sep 4, 2024

Are you saying that there is some kind of safe-guard against evaluating non-cached QuerySets ? because i could not find such condition in library source code. This util function elasticapm.utils.encoding.transform does not consider if object is of type QuerySet and therefore it tries evaluating QuerySet (at this line https://github.com/elastic/apm-agent-python/blob/main/elasticapm/utils/encoding.py#L149) repeatedly for every stack trace regardless of the cache in it.

If it gives more context, the process of repeating SQL queries happens when got_request_exception django signal is being sent.

@xrmx
Copy link
Member

xrmx commented Sep 4, 2024

Are you saying that there is some kind of safe-guard against evaluating non-cached QuerySets ?

No, I'm saying that it may be possible to add

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-python community Issues opened by the community triage Issues awaiting triage
Projects
None yet
Development

No branches or pull requests

2 participants