Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SDK metrics for go v2 AWS SDK #1744

Open
2 tasks
DanielBauman88 opened this issue Jun 29, 2022 · 3 comments
Open
2 tasks

Support SDK metrics for go v2 AWS SDK #1744

DanielBauman88 opened this issue Jun 29, 2022 · 3 comments
Assignees
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue queued This issues is on the AWS team's backlog

Comments

@DanielBauman88
Copy link

Describe the feature

The java feature is documented here.
The functionality is described in this section.

The request is to support the same for the go sdk so that it is trivial to get metrics for latencies/errors/retries to aws dependencies made in a customer application.

Use Case

I want to have operational metrics for latency,error,num-calls for all my dependencies so that I can monitor the performance of my service and dig into problems and investigate the impact of outages.

Proposed Solution

To implement this functionality with a simple option on SDK creation in the go sdk v2.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS Go SDK V2 Module Versions Used

This is applicable to all SDKs

Go version used

This should be applicable to all go versions

@DanielBauman88 DanielBauman88 added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jun 29, 2022
@vudh1 vudh1 removed the needs-triage This issue or PR still needs to be triaged. label Jul 19, 2022
@jeichenhofer
Copy link

jeichenhofer commented Oct 26, 2022

I'm also looking to integrate some metrics with the aws-sdk-go-v2 libraries, but I don't want to re-invent the wheel. Hopefully this will be an officially supported feature, but I also need a solution in the meantime. Specifically, I want to record a tuple of service name, operation name, aws region, latency, retry count, and response code on every request sent to AWS. I can envision doing this with the "middleware" API, but these are the only docs I can find, and they don't do a great job explaining what information about the request is available (https://aws.github.io/aws-sdk-go-v2/docs/middleware/) (e.g., would we need to record the "sent time" in Initialize step, then check it in the deserialize step, or is latency already a populated metadata value).

While we wait for a response from the development team about incorporating this as an SDK feature, is there any guidance on implementing something ourselves?

@jeichenhofer
Copy link

Here's what I could come up with by stepping through the middleware stack code. It seems to work as intended, but I'd be curious to hear from people more familiar with the API.

Of course, this would need to be incorporated with some existing metrics system, replacing the ReportMetrics function with something that feeds into monitoring systems or log files. If there's a chance that the function might return an error, then I'd have to think a bit more about how to handle that.

Also, because this is placed "after" all of the other deserializers, it will be executed per retry. That's why I left the "retry on access denied" code in there, to test out what happens when a retried operation fails. The output measures the latency of each individual retry request (by default that's three requests total). I thought replacing the smithymiddleware.After with smithymiddleware.Before would measure latency of the combined three round-trips, but that was not the case. Since I want the behavior to be per-retry, I didn't investigate further.

Here is the working code to test this out. Just replace the AKID and SKEY constants with IAM User credentials with no access, and you'll see the metrics spit out from the three requests with a 403 response code.

package main

import (
	"context"
	"fmt"
	"github.com/aws/aws-sdk-go-v2/aws"
	sdkmiddleware "github.com/aws/aws-sdk-go-v2/aws/middleware"
	"github.com/aws/aws-sdk-go-v2/aws/retry"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials"
	"github.com/aws/aws-sdk-go-v2/service/s3"
	smithymiddleware "github.com/aws/smithy-go/middleware"
	"github.com/aws/smithy-go/transport/http"
	"time"
)

const (
	AKID = "akid_here"
	SKEY = "secret_access_key_here"
	SESH = ""
)

type RequestMetricTuple struct {
	ServiceName   string
	OperationName string
	Region        string
	LatencyMS     int64
	ResponseCode  int
}

func ReportMetrics(metrics *RequestMetricTuple) {
	fmt.Printf("metrics: %+v\n", metrics)
}

func reportMetricsMiddleware() smithymiddleware.DeserializeMiddleware {
	reportRequestMetrics := smithymiddleware.DeserializeMiddlewareFunc("ReportRequestMetrics", func(
		ctx context.Context, in smithymiddleware.DeserializeInput, next smithymiddleware.DeserializeHandler,
	) (
		out smithymiddleware.DeserializeOutput, metadata smithymiddleware.Metadata, err error,
	) {
		requestMadeTime := time.Now()
		out, metadata, err = next.HandleDeserialize(ctx, in)
		if err != nil {
			return out, metadata, err
		}

		responseStatusCode := -1
		switch resp := out.RawResponse.(type) {
		case *http.Response:
			responseStatusCode = resp.StatusCode
		}

		latency := time.Now().Sub(requestMadeTime)
		metrics := RequestMetricTuple{
			ServiceName:   sdkmiddleware.GetServiceID(ctx),
			OperationName: sdkmiddleware.GetOperationName(ctx),
			Region:        sdkmiddleware.GetRegion(ctx),
			LatencyMS:     latency.Milliseconds(),
			ResponseCode:  responseStatusCode,
		}
		ReportMetrics(&metrics)

		return out, metadata, nil
	})

	return reportRequestMetrics
}

func getDefaultConfig(ctx context.Context) (*aws.Config, error) {
	cfg, err := config.LoadDefaultConfig(
		ctx,
		config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(AKID, SKEY, SESH)),
		config.WithRetryer(
			func() aws.Retryer {
				return retry.AddWithErrorCodes(retry.NewStandard(), "AccessDenied")
			},
		),
	)
	if err != nil {
		return nil, err
	}

	cfg.APIOptions = append(cfg.APIOptions, func(stack *smithymiddleware.Stack) error {
		return stack.Deserialize.Add(reportMetricsMiddleware(), smithymiddleware.After)
	})

	return &cfg, nil
}

func doStuff(ctx context.Context, client *s3.Client) {
	listBucketResults, err := client.ListBuckets(ctx, &s3.ListBucketsInput{})
	if err != nil {
		panic(err)
	}
	fmt.Printf("num_buckets: %d\n", len(listBucketResults.Buckets))
}

func main() {
	ctx := context.Background()
	cfg, err := getDefaultConfig(ctx)
	if err != nil {
		panic(err)
	}

	client := s3.NewFromConfig(*cfg)

	for true {
		doStuff(ctx, client)
		time.Sleep(time.Second * 2)
	}
}

@RanVaknin RanVaknin added p2 This is a standard priority issue l Effort estimation: large labels Nov 14, 2022
@lucix-aws
Copy link
Contributor

lucix-aws commented Nov 28, 2023

related: #1142

We intend to implement this in terms of aws/smithy-go#470, the internal spec for this component of the smithy client reference architecture is being finalized.

Please upvote this issue if this functionality is important to you as an SDK user.

@RanVaknin RanVaknin added the queued This issues is on the AWS team's backlog label Feb 15, 2024
@lucix-aws lucix-aws removed the l Effort estimation: large label May 24, 2024
@lucix-aws lucix-aws self-assigned this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue queued This issues is on the AWS team's backlog
Projects
None yet
Development

No branches or pull requests

5 participants