Postgres floating-point exception but health check was ok #2958

max-mycarly · 2024-04-15T11:55:29Z

Self-Hosted Version

24.3.0

CPU Architecture

x86_64

Docker Version

25.0.3

Docker Compose Version

2.25.0

Steps to Reproduce

You try to load any admin page like
https://sentry.domain.com/organizations/[ORGA]/projects/
And receive a HTTP Code 500

But when you call
https://sentry.domain.com/_health/
You still get an HTTP Code 200 and the message: ok

Expected Result

When there is an error which causes all http request to fail with a HTTP Code 500, the health endpoint should also reflect this.

Actual Result

We experienced a strange error with Sentry.
The PostgreSQL Database started to respond with an error to all SELECT set_config queries.
Web, Cron, Worker all show the same errors caused by postgres.
All API endpoints, Admin interface etc have thrown server errors and a HTTP Code 500 but /_health/ was returning HTTP 200 and a OK.
The problem lastet for 5 hours because monitoring thought the service is still alive.

All services were running.
Restart the instance and all Sentry services fixed the problem.

docker compose logs postgres:

postgres-1  | 2024-04-13 11:14:11.218 UTC [763405] ERROR:  floating-point exception
postgres-1  | 2024-04-13 11:14:11.218 UTC [763405] DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
postgres-1  | 2024-04-13 11:14:11.218 UTC [763405] STATEMENT:  SELECT set_config('TimeZone', 'UTC', false)
postgres-1  | 2024-04-13 11:14:11.382 UTC [763406] ERROR:  floating-point exception
postgres-1  | 2024-04-13 11:14:11.382 UTC [763406] DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
postgres-1  | 2024-04-13 11:14:11.382 UTC [763406] STATEMENT:  SELECT set_config('TimeZone', 'UTC', false)
postgres-1  | 2024-04-13 11:14:11.415 UTC [763407] ERROR:  floating-point exception
postgres-1  | 2024-04-13 11:14:11.415 UTC [763407] DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
postgres-1  | 2024-04-13 11:14:11.415 UTC [763407] STATEMENT:  SELECT set_config('TimeZone', 'UTC', false)

docker compose logs web:

web-1  | psycopg2.errors.FloatingPointException: floating-point exception
web-1  | DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
web-1  | 
web-1  | 
web-1  | The above exception was the direct cause of the following exception:
web-1  | 
web-1  | Traceback (most recent call last):
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/api/base.py", line 306, in handle_exception
web-1  |     response = super().handle_exception(exc)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 469, in handle_exception
web-1  |     self.raise_uncaught_exception(exc)
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
web-1  |     raise exc
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/api/base.py", line 411, in dispatch
web-1  |     self.initial(request, *args, **kwargs)
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/../sentry_sdk/integrations/django/__init__.py", line 312, in sentry_patched_drf_initial
web-1  |     return old_drf_initial(self, request, *args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 414, in initial
web-1  |     self.perform_authentication(request)
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 324, in perform_authentication
web-1  |     request.user
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/request.py", line 227, in user
web-1  |     self._authenticate()
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/request.py", line 380, in _authenticate
web-1  |     user_auth_tuple = authenticator.authenticate(self)
web-1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/api/authentication.py", line 197, in authenticate
web-1  |     return self.authenticate_credentials(relay_id, relay_sig, request)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/api/authentication.py", line 203, in authenticate_credentials
web-1  |     relay, static = relay_from_id(request, relay_id)
web-1  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/api/authentication.py", line 128, in relay_from_id
web-1  |     relay = Relay.objects.get(relay_id=relay_id)
web-1  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
web-1  |     return getattr(self.get_queryset(), name)(*args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 645, in get
web-1  |     num = len(clone)
web-1  |           ^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 382, in __len__
web-1  |     self._fetch_all()
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1928, in _fetch_all
web-1  |     self._result_cache = list(self._iterable_class(self))
web-1  |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 91, in __iter__
web-1  |     results = compiler.execute_sql(
web-1  |               ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
web-1  |     cursor = self.connection.cursor()
web-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
web-1  |     return func(*args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 316, in cursor
web-1  |     return self._cursor()
web-1  |            ^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/db/postgres/decorators.py", line 40, in inner
web-1  |     return func(self, *args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/db/postgres/base.py", line 107, in _cursor
web-1  |     return super()._cursor()
web-1  |            ^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 292, in _cursor
web-1  |     self.ensure_connection()
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
web-1  |     return func(*args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 274, in ensure_connection
web-1  |     with self.wrap_database_errors:
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
web-1  |     raise dj_exc_value.with_traceback(traceback) from exc_value
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 275, in ensure_connection
web-1  |     self.connect()
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/../sentry_sdk/integrations/django/__init__.py", line 677, in connect
web-1  |     return real_connect(self)
web-1  |            ^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
web-1  |     return func(*args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 258, in connect
web-1  |     self.init_connection_state()
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 314, in init_connection_state
web-1  |     commit_tz = self.ensure_timezone()
web-1  |                 ^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 296, in ensure_timezone
web-1  |     cursor.execute(self.ops.set_time_zone_sql(), [timezone_name])
web-1  | django.db.utils.DataError: floating-point exception
web-1  | DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
web-1  | 
web-1  | 13:03:00 [INFO] sentry.access.api: api.access (method='POST' view='sentry.api.endpoints.relay.project_configs.RelayProjectConfigsEndpoint' response=500 user_id='None' is_app='None' token_type='None' is_frontend_request='False' organization_id='None' auth_id='None' path='/api/0/relays/projectconfigs/' caller_ip='172.18.0.44' user_agent='None' rate_limited='False' rate_limit_category='None' request_duration_seconds=0.04009842872619629 rate_limit_type='DNE' concurrent_limit='None' concurrent_requests='None' reset_time='None' group='None' limit='None' remaining='None')
web-1  | 13:03:00 [ERROR] django.request: Internal Server Error: /api/0/relays/projectconfigs/ (status_code=500 request=<WSGIRequest: POST '/api/0/relays/projectconfigs/?version=3'>)
web-1  | Traceback (most recent call last):
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 275, in ensure_connection
web-1  |     self.connect()
web-1  |   File "/usr/local/lib/python3.11/site-packages/sentry/../sentry_sdk/integrations/django/__init__.py", line 677, in connect
web-1  |     return real_connect(self)
web-1  |            ^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
web-1  |     return func(*args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 258, in connect
web-1  |     self.init_connection_state()
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 314, in init_connection_state
web-1  |     commit_tz = self.ensure_timezone()
web-1  |                 ^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 296, in ensure_timezone
web-1  |     cursor.execute(self.ops.set_time_zone_sql(), [timezone_name])
web-1  | psycopg2.errors.FloatingPointException: floating-point exception
web-1  | DETAIL:  An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero.
web-1  |

Event ID

No response

The text was updated successfully, but these errors were encountered:

hubertdeng123 · 2024-04-15T23:15:10Z

Thanks for reporting here. I do not think you should rely on the /_health/ endpoint in Sentry as a source of truth, I just took a look and it seems to be pretty out of date. I'll backlog this item to improve the endpoint to cover more of the main components of self-hosted.

getsantry bot added the Waiting for: Product Owner label Apr 15, 2024

getsantry bot removed the Waiting for: Product Owner label Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres floating-point exception but health check was ok #2958

Postgres floating-point exception but health check was ok #2958

max-mycarly commented Apr 15, 2024

hubertdeng123 commented Apr 15, 2024

Postgres floating-point exception but health check was ok #2958

Postgres floating-point exception but health check was ok #2958

Comments

max-mycarly commented Apr 15, 2024

Self-Hosted Version

CPU Architecture

Docker Version

Docker Compose Version

Steps to Reproduce

Expected Result

Actual Result

Event ID

hubertdeng123 commented Apr 15, 2024