-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a mechanism to retry event processing without logging an error #488
Comments
I was able to look into this today, it seems like CloudEvent handlers also support a (little documented) callback which can accept an (error, response) callback; but unfortunately the response body can't be used in an express friendly way to set the code without sending a body since FF also checks for integers (which would be interpreted as the HTTP response code) and casts them to strings first: functions-framework-nodejs/src/invoker.ts Line 54 in 978054b
Agree that a custom, sentinel error is probably cleanest here that'll allow extension without breaking backward compatibility. |
Although Cloud Functions do allow a retry mechanism when processing events, this mechanism is currently not very flexible. Indeed, in order for the processing to be retried, the function must throw an error.
Because the functions framework does not know the nature of the thrown error (Is it an uncaught error in a dependency? Is it an error thrown on purpose?), it will always log it as such along with its stack trace. While this makes sense in the general case, it also has effects on other Google services:
logging.googleapis.com/log_entry_count
metric under theERROR
severity. This metric could be used in dashboard and alert policies. If the error is thrown intentionally only to retry processing, this adds noise to the metric and can possibly trigger false alerts.There are many valid reasons why one would want to retry processing without explicitly logging an error. Actually, I would argue that the main use case detailed in the documentation (transient errors) should probably not log an error. If an error is transient (e.g. a timeout or a network error when calling an API) and is expected to occur from time to time, there's not much value in counting it as a generic error in Error Reporting and Cloud Logging / Monitoring. In fact, those transient errors can probably be reported more precisely using Cloud Logging and Monitoring directly (not relying on the
functions-framework
wrapper), along with other statistics about the API calls (e.g. latency, endpoint name, etc).Conversely, I'd argue that we want an error to be logged and picked us as such by downstream GCP services when it is unexpected. Moreover if the error is unexpected, it should probably not be retried because it could be a bug in the code (as the GCP documentation indeed calls out).
✨ Ideal behaviour
Summing up the previous points, the ideal behaviour for me would be to:
204
). This corresponds to unforeseen errors that might be bugs.500
.)This would reduce the risk of setting up a Cloud Function with the
retry
flag and ending up with endless retries. The solutions provided by the documentation involve custom logic and are in my opinion prone to errors and/or suboptimal.🛠️ Current workaround
My current workaround is actually a wrapper around all my Cloud Functions which:
functions-frameworks
, which thinks the processing ended successfully. Processing is not retried. Logged errors do appear in Error Reporting / Cloud Monitoring error metrics, as they need to be reviewed and possibly fixed by developers.RetryableError
) and rethrows them. Only anINFO
-level log is written because theRetryableError
marks an explicit need in user code to retry processing. Because a rejection is passed to thefunctions-framework
, the error is logged and processing is retried.Obvisouly, this workaround does not fully eliminate the problem because retryable errors are still logged as errors by the
functions-framework
. However it does help to:💡 Breaking change-free suggestion
The ideal behaviour described above would result in a huge breaking change and the idea would probably be quickly dismissed. However I think there is a smaller, non-breaking, change that can bring more flexibility to developers: simply provide an explicit value or error that can be returned/thrown by the function, which does not involve logging an error but still causes the
functions-framework
to return a500
status code. This would be a small change to sendResponse, checking thatresult
orerr
is of a specific type (or whatever check you feel most comfortable with).This solution does solve the main problem which is that processing cannot currently be retried without logging an error. Because uncaught errors would still be retried (and also logged as errors), this does not alleviate the risk of an event being endlessly retried in case of a bug in the function's code. However this is clearly documented and can be solved in user code (e.g. like it is in my current workaround).
The text was updated successfully, but these errors were encountered: