-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(localenv): add performance metrics #2999
Conversation
- the underlying metrics for the new promql queries/grafana visualizations do not exist yet and should be added in later commit
- liquidity resolver, op lifecycle, op service create
…rformance-metrics
✅ Deploy Preview for brilliant-pasca-3e80ec ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments
try { | ||
await lifecycle.handleSending(deps, payment) | ||
stopTimer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use a finally
here
if (!event || !isOutgoingPaymentEvent(event)) { | ||
stopTimer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something we don't necessary have to tackle in this PR, but what can we do when it comes to stopping timers during a happy path vs an error? is there something we can add to make that distinction in the metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we could have a success: boolean
attribute on the underlying histogram. I was inspired by a pattern that was shown to me during the WW, and it had that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used currently, but it should be possible now. I updated the closure returned from startTimer
to accept additionalAttributes
which will be merged on the ones initially provided to startTimer
. Usage example added to tests (and tests added for startTimer
in general - noticed we didnt have any).
if (isFundingError(paymentOrErr)) { | ||
stopTimer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a finally would be nice here to avoid calling stopTimer()
multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. put everything in a new try and removed the stopTimer
s. catch just re-throws and finally calls stopTimer
. This is a good pattern for this sort of thing... ensures we dont miss a place to stop the timer.
@@ -79,10 +91,15 @@ async function handlePaymentLifecycle( | |||
return | |||
} | |||
|
|||
const stopTimer = deps.telemetry.startTimer('handle_sending_ms', { | |||
callName: 'handleSending' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about prefixing callName
with service name always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messaged you on this with more details. seems like a good idea but if we want to make it a standard prefix I think maybe it needs to be more abstract than service name since not everything is called from outside the service method or is necessarily scoped to a service (gql resolvers,accountService name doesnt properly signify tb/psql, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to prefix with a pattern like [AbstractScopeName]:literalFunctionName
.
for example:
WebhookService:getResourceById
Resolver:depositOutgoingPaymentLiquidity
Not every case is as clean as service calls where we have an obvious and unambiguous scope so I'm keeping it a little more abstract to ensure we can standardize.
callName: 'rates', | ||
description: 'Time to get rates' | ||
} | ||
) | ||
const rates = await deps.ratesService | ||
.rates(options.walletAddress.asset.code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe best to just convert to a normal try-catch call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted from chaining to try/catch/finally
Co-authored-by: Max Kurapov <[email protected]>
"Unique identifier of the grant." | ||
id: ID! | ||
): Grant! | ||
grant("Unique identifier of the grant." id: ID!): Grant! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some formatting issue made it into main somehow. had to format and generate gql types to make CI happy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge in main this should go away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
"Unique identifier of the grant." | ||
id: ID! | ||
): Grant! | ||
grant("Unique identifier of the grant." id: ID!): Grant! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge in main this should go away
{ | ||
callName: 'RateService:rates', | ||
description: 'Time to get rates' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be directly in the rates service instead? Or instead, called something like PaymentMethod:ILP:getQuote:getRates
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm fair point.
We have external timers that wrap function calls from the outside like this, and internal timers that start at the beginning of a function and stop at the end. I think in this case we want the external timer because the goal is to measure getQuote
and it's parts (thats how this metric is graphed). In other words, we want to measure this rates service call, not all of them.
I take your point on the name. I was thinking about it in the context of the graph, where the PaymentMethod:ILP:getQuote
would be redundant. However, as soon as we want to time this same method somewhere else the name breaks down, so yeah... looks like that convention (scoped by where its called) is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, we want to measure this rates service call, not all of them.
Yes, makes sense. We do have the histogram name ilp_get_quote_rate_time_ms
, so that is easy to narrow down and trace in the code in any case.
For the callName, just as long as we can get group metrics from a certain part of the codebase like PaymentMethod:ILP:*
that's good enough.
Changes proposed in this pull request
create-outgoing-payment.js
k6 script.Partial view of local dashboard (best observed locally while running performance test):
Context
fixes #2983
currently set to merge into bc/2980/refactor-telemetry-optionality because it depends on those changes.
Checklist
fixes #number
user-docs
label (if necessary)