Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(localenv): add performance metrics #2999

Merged
merged 36 commits into from
Nov 7, 2024
Merged

Conversation

BlairCurrey
Copy link
Contributor

@BlairCurrey BlairCurrey commented Sep 23, 2024

Changes proposed in this pull request

  • added timer to metric service
  • used new timer to power several new graphs tracking durations for most of the main paths in the create-outgoing-payment.js k6 script.

Partial view of local dashboard (best observed locally while running performance test):

image

Context

fixes #2983

currently set to merge into bc/2980/refactor-telemetry-optionality because it depends on those changes.

Checklist

  • Related issues linked using fixes #number
  • Tests added/updated
  • Make sure that all checks pass
  • Bruno collection updated (if necessary)
  • Documentation issue created with user-docs label (if necessary)
  • OpenAPI specs updated (if necessary)

@BlairCurrey BlairCurrey changed the title Bc/2983/performance metrics feat(localenv): add performance metrics Sep 23, 2024
@github-actions github-actions bot added pkg: backend Changes in the backend package. type: source Changes business logic labels Sep 23, 2024
@BlairCurrey BlairCurrey marked this pull request as ready for review September 24, 2024 17:31
@github-actions github-actions bot added the type: tests Testing related label Sep 24, 2024
Base automatically changed from bc/2980/refactor-telemetry-optionality to main October 1, 2024 14:12
Copy link

netlify bot commented Oct 1, 2024

Deploy Preview for brilliant-pasca-3e80ec ready!

Name Link
🔨 Latest commit 7f07761
🔍 Latest deploy log https://app.netlify.com/sites/brilliant-pasca-3e80ec/deploys/672a47c18959590008365a60
😎 Deploy Preview https://deploy-preview-2999--brilliant-pasca-3e80ec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@mkurapov mkurapov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

try {
await lifecycle.handleSending(deps, payment)
stopTimer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use a finally here

if (!event || !isOutgoingPaymentEvent(event)) {
stopTimer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we don't necessary have to tackle in this PR, but what can we do when it comes to stopping timers during a happy path vs an error? is there something we can add to make that distinction in the metrics?

Copy link
Contributor Author

@BlairCurrey BlairCurrey Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we could have a success: boolean attribute on the underlying histogram. I was inspired by a pattern that was shown to me during the WW, and it had that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used currently, but it should be possible now. I updated the closure returned from startTimer to accept additionalAttributes which will be merged on the ones initially provided to startTimer. Usage example added to tests (and tests added for startTimer in general - noticed we didnt have any).

if (isFundingError(paymentOrErr)) {
stopTimer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a finally would be nice here to avoid calling stopTimer() multiple times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. put everything in a new try and removed the stopTimers. catch just re-throws and finally calls stopTimer. This is a good pattern for this sort of thing... ensures we dont miss a place to stop the timer.

@@ -79,10 +91,15 @@ async function handlePaymentLifecycle(
return
}

const stopTimer = deps.telemetry.startTimer('handle_sending_ms', {
callName: 'handleSending'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about prefixing callName with service name always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

messaged you on this with more details. seems like a good idea but if we want to make it a standard prefix I think maybe it needs to be more abstract than service name since not everything is called from outside the service method or is necessarily scoped to a service (gql resolvers,accountService name doesnt properly signify tb/psql, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to prefix with a pattern like [AbstractScopeName]:literalFunctionName.

for example:

  • WebhookService:getResourceById
  • Resolver:depositOutgoingPaymentLiquidity

Not every case is as clean as service calls where we have an obvious and unambiguous scope so I'm keeping it a little more abstract to ensure we can standardize.

packages/backend/src/open_payments/quote/service.ts Outdated Show resolved Hide resolved
callName: 'rates',
description: 'Time to get rates'
}
)
const rates = await deps.ratesService
.rates(options.walletAddress.asset.code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe best to just convert to a normal try-catch call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted from chaining to try/catch/finally

@github-actions github-actions bot added the pkg: auth Changes in the GNAP auth package. label Oct 15, 2024
"Unique identifier of the grant."
id: ID!
): Grant!
grant("Unique identifier of the grant." id: ID!): Grant!
Copy link
Contributor Author

@BlairCurrey BlairCurrey Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some formatting issue made it into main somehow. had to format and generate gql types to make CI happy here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you merge in main this should go away

Copy link
Contributor

@mkurapov mkurapov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

"Unique identifier of the grant."
id: ID!
): Grant!
grant("Unique identifier of the grant." id: ID!): Grant!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you merge in main this should go away

Comment on lines +48 to +51
{
callName: 'RateService:rates',
description: 'Time to get rates'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be directly in the rates service instead? Or instead, called something like PaymentMethod:ILP:getQuote:getRates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm fair point.

We have external timers that wrap function calls from the outside like this, and internal timers that start at the beginning of a function and stop at the end. I think in this case we want the external timer because the goal is to measure getQuote and it's parts (thats how this metric is graphed). In other words, we want to measure this rates service call, not all of them.

I take your point on the name. I was thinking about it in the context of the graph, where the PaymentMethod:ILP:getQuote would be redundant. However, as soon as we want to time this same method somewhere else the name breaks down, so yeah... looks like that convention (scoped by where its called) is good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, we want to measure this rates service call, not all of them.

Yes, makes sense. We do have the histogram name ilp_get_quote_rate_time_ms, so that is easy to narrow down and trace in the code in any case.

For the callName, just as long as we can get group metrics from a certain part of the codebase like PaymentMethod:ILP:* that's good enough.

@github-actions github-actions bot removed the pkg: auth Changes in the GNAP auth package. label Nov 5, 2024
@BlairCurrey BlairCurrey merged commit aa77aea into main Nov 7, 2024
30 of 42 checks passed
@BlairCurrey BlairCurrey deleted the bc/2983/performance-metrics branch November 7, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: backend Changes in the backend package. type: source Changes business logic type: tests Testing related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add additional performance metrics for local playground
2 participants