Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry(amazonq): measure e2e latency for amazon q requests #5566

Merged
merged 2 commits into from
Sep 25, 2024

Conversation

jpinkney-aws
Copy link
Contributor

@jpinkney-aws jpinkney-aws commented Sep 10, 2024

Problem:

  • We can't measure e2e how long a chat request takes

Solution:

  • Plumb the trace id through to the webview -> vscode -> webview. This allows us to know exactly what is happening to a message and how long everything took to get there
  • chat e2e latency can be linked with the other telemetry events via the traceId
Previous PR description

Problem

This is just a draft of what we could potentially do to measure e2e latency in amazon q chat message requests. Feel free to leave comments on the PR

Main code level problems:

  • It's hard to connect a tab id to a conversation id. Tab ID's exist pretty much to send/receive messages but they don't really live along side conversation id's. Conversation id's are exclusively related to the session. The only place they meetup is in their chat session storages
  • Do we want to track individual events like webviewToDispatcher as metrics?
    • Though there wouldn't really be a way to "listen" on these events except for if we manually allow-listed metrics since theres no reason to attach a tabID to that metric
  • How do we listen to events emitted from telemetry so that we can find them later?
    • One solution is proposed in this PR where we listen to events being emitted and then check if we want to track them
  • Could we connect these with a parent/child mechanism like we want to introduce eventually?
    • We kind of do this because the "listener" event is the parent and the things it finds are the children. The problem is we can really have only one "listener" for these Q events, since we don't know what tabs are related to what conversation ID's until much later (after the first request). This is why we build up the map of conversation metrics and tab metrics and join them together before we emit

Solution

This PR does a couple things:

  • Moves up the chat session storage for codewhisperer chat. This allows us to eventually connect conversation ids to tabs
  • Introduces a chat storage facades that allows us to call getConversationId on any tab type. This is only implemented for cwc right now
  • Introduces a listener pattern that allows telemetry events to "listen" to metrics that are being emitted
  • emits a sample events that can you can trace from "pressing enter" to "receives message in ui"

Sample event when sending a request in amazon q chat:

2024-09-11 10:11:05.712 [debug] telemetry: chat_roundTrip {
  Metadata: {
    duration: '14585',
    name: 'onChatPrompt',
    child_metrics: 'initialRequestTime,webviewToDispatcher,dispatcherToFeature,amazonq_enterFocusConversation,amazonq_startConversation,amazonq_addMessage,finalRequestTime',
    child_metric_durations: '{"webviewToDispatcher":1,"dispatcherToFeature":3,"amazonq_enterFocusConversation":1542,"amazonq_startConversation":2,"amazonq_addMessage":13003,"finalRequestTime":34}',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

You can also get events from the right click context menu items as well (explain, refactor, etc)

2024-09-11 10:11:34.384 [debug] telemetry: chat_roundTrip {
  Metadata: {
    duration: '12885',
    name: 'aws.amazonq.explainCode',
    child_metrics: 'initialRequestTime,webviewToDispatcher,amazonq_enterFocusConversation,amazonq_addMessage,finalRequestTime',
    child_metric_durations: '{"webviewToDispatcher":17,"amazonq_enterFocusConversation":1263,"amazonq_addMessage":11572,"finalRequestTime":33}',
    result: 'Succeeded',
    awsAccount: 'not-set',
    awsRegion: 'us-east-1'
  },
  Value: 1,
  Unit: 'None',
  Passive: false
}

License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

This pull request modifies files in src/ but no tests were added/updated. Confirm whether tests should be added or ensure the PR description explains why tests are not required.


// A common interface over existing chat session storages
// TODO align all the chatsessionstorages so we don't have to do this :/
export class ChatSessionStorageFacade {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

packages/core/src/amazonq/util/storageFacade.ts Outdated Show resolved Hide resolved
packages/core/src/amazonq/util/telemetryHelper.ts Outdated Show resolved Hide resolved
packages/core/src/amazonq/util/telemetryHelper.ts Outdated Show resolved Hide resolved
packages/core/src/amazonq/util/telemetryHelper.ts Outdated Show resolved Hide resolved
packages/core/src/amazonq/util/telemetryHelper.ts Outdated Show resolved Hide resolved
packages/core/src/shared/telemetry/spans.ts Outdated Show resolved Hide resolved
@jpinkney-aws
Copy link
Contributor Author

I think another way we could possible go is to just instrument everything via a traceId which is more traditional for distributed tracing (rather than connecting tabIds to conversation Ids at the end of a flow like we do here). I think this might be more scalable for codewhisperer inline as well.

on chat prompt we could create a trace id, plumb that through the entire system and store various different timings in the RecordMap.

This would remove:

  • listening on telemetry events
  • remove the chatStorageFacade

we would keep:

  • record map
  • start listening/stop listening

end result:

  • we still get the chat message duration (but we don't necessarily correlate telemetry events with chat_roundTrip (though we totally can))
  • less complicated to understand/the flow is easily observable e2e
  • predictable metrics appearing in the end result

@jpinkney-aws jpinkney-aws marked this pull request as ready for review September 20, 2024 14:41
@jpinkney-aws jpinkney-aws requested review from a team as code owners September 20, 2024 14:41
@jpinkney-aws jpinkney-aws changed the title draft: measure e2e latency for amazon q requests telemetry(amazonq): measure e2e latency for amazon q requests Sep 23, 2024
Problem:
- We can't measure e2e how long a chat request takes

Solution:
- Plumb the trace id through to the webview -> vscode -> webview. This allows us to know exactly what is happening to a message and how long everything took to get there
- chat e2e latency can be linked with the other telemetry events via the traceId
Copy link

This pull request modifies code in src/ but no tests were added/updated. Confirm whether tests should be added or ensure the PR description explains why tests are not required.

@jpinkney-aws jpinkney-aws merged commit 359b1d8 into master Sep 25, 2024
14 of 16 checks passed
@jpinkney-aws jpinkney-aws deleted the jpinkney-aws/e2e-latency branch September 25, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants