Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_s3_bucket should use HeadBucket instead of GetBucketLocation #1586

Closed
e-gineer opened this issue Feb 14, 2023 · 13 comments · Fixed by #2082
Closed

aws_s3_bucket should use HeadBucket instead of GetBucketLocation #1586

e-gineer opened this issue Feb 14, 2023 · 13 comments · Fixed by #2082
Assignees
Labels
enhancement New feature or request

Comments

@e-gineer
Copy link
Contributor

From https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLocation.html:

For requests made using AWS Signature Version 4 (SigV4), we recommend that you use HeadBucket to return the bucket Region instead of GetBucketLocation.

location, err := svc.GetBucketLocation(ctx, params)
if err != nil {
plugin.Logger(ctx).Error("aws_s3_bucket.getBucketLocation", "bucket_name", *bucket.Name, "clientRegion", clientRegion, "api_error", err)
return nil, err
}

@e-gineer e-gineer added the enhancement New feature or request label Feb 14, 2023
@github-actions
Copy link

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Apr 16, 2023
@misraved misraved removed the stale No recent activity has been detected on this issue/PR and it will be closed label Apr 17, 2023
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Jun 16, 2023
@misraved misraved removed the stale No recent activity has been detected on this issue/PR and it will be closed label Jun 19, 2023
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Aug 18, 2023
@cbruno10 cbruno10 removed the stale No recent activity has been detected on this issue/PR and it will be closed label Aug 21, 2023
@graza-io
Copy link
Contributor

Took a quick look at using HeadBucket operation - however I kept receiving a 301 error.

operation error S3: HeadBucket, https response error StatusCode: 301, RequestID: 6ZC4YNZX6Z7MDFR8, HostID: eI5x2FvRBbAQreiEqD2lpJlZvJh00lOWHPtcVRw9Q24esThCm8UxrzoPorEo5eIk8HK+ce0D5yU=, api error MovedPermanently: Moved Permanently

@pdecat
Copy link
Contributor

pdecat commented Sep 25, 2023

Can't you see what the location header was set to in the 301 response?

Edit: too bad AWS do not return the standard Location header here.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Nov 24, 2023
@ParthaI ParthaI removed the stale No recent activity has been detected on this issue/PR and it will be closed label Nov 24, 2023
@Veetaha
Copy link

Veetaha commented Dec 21, 2023

Using HeadBucket instead makes sense, because GetBucketLocation can return a special string EU. See it specified in the API docs of this method:
image

There is no such region as "EU", so steampipe AWS plugin right now has this crutch:

// Buckets in eu-west-1 created through the AWS CLI or other API driven methods can return a location of "EU",
// so we need to convert back
if location.LocationConstraint == "EU" {
return &s3.GetBucketLocationOutput{
LocationConstraint: "eu-west-1",
}, nil
}
return location, nil

I'm not sure how correct this is. I couldn't find any docs that actually say that EU == eu-west-1, but it looks like it practically is?

@pdecat
Copy link
Contributor

pdecat commented Dec 21, 2023

I'm not sure how correct this is. I couldn't find any docs that actually say that EU == eu-west-1, but it looks like it practically is?

It sort of does EverythingMe/ncdu-s3#4 🙂

@e-gineer e-gineer assigned pdecat and unassigned ParthaI Feb 19, 2024
@e-gineer
Copy link
Contributor Author

FYI - work done here may also help with #1713

@pdecat
Copy link
Contributor

pdecat commented Feb 20, 2024

Hi, FYI I'm currently working on implementing this, and investigating the best options.

Took a quick look at using HeadBucket operation - however I kept receiving a 301 error.

Indeed, the 301 happens when invoking HeadBucket on an endpoint region that does not match the bucket region, the that endpoint does not include the standard HTTP Location header.

When using the aws s3api head-bucket command, the AWS CLI extracts the bucket region from the x-amz-bucket-region header (documented here) and retries the request on that region:

AWS_DEFAULT_REGION=eu-west-1 aws --debug s3api head-bucket --bucket commoncrawl |& grep -E "urllib3.connectionpool|x-amz-bucket"
2024-02-20 13:24:16,989 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): commoncrawl.s3.eu-west-1.amazonaws.com:443
2024-02-20 13:24:17,116 - MainThread - urllib3.connectionpool - DEBUG - https://commoncrawl.s3.eu-west-1.amazonaws.com:443 "HEAD / HTTP/1.1" 301 0
2024-02-20 13:24:17,116 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-bucket-region': 'us-east-1', 'x-amz-request-id': '4AD4FCR5BJ7N91FD', 'x-amz-id-2': 'wmDu9nX5k6L818sp+cEYM31xBFKlhrotqRnbhwqhCqR2YWqRb3ghB/IbWeI4cwBamCt16/+0Y1U=', 'Content-Type': 'application/xml', 'Date': 'Tue, 20 Feb 2024 12:24:16 GMT', 'Server': 'AmazonS3'}
2024-02-20 13:24:17,121 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): commoncrawl.s3.us-east-1.amazonaws.com:443
2024-02-20 13:24:17,556 - MainThread - urllib3.connectionpool - DEBUG - https://commoncrawl.s3.us-east-1.amazonaws.com:443 "HEAD / HTTP/1.1" 200 0
2024-02-20 13:24:17,557 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-id-2': 'T8ZAgIx9faNGXQPOE+XMdhWt4ZWa6tSNZyCETNBbfHVJhBUK1buLQSh0+8+49xkolCdct52VT74=', 'x-amz-request-id': '4AD5FT0YS5TV2HER', 'Date': 'Tue, 20 Feb 2024 12:24:18 GMT', 'x-amz-bucket-region': 'us-east-1', 'x-amz-access-point-alias': 'false', 'Content-Type': 'application/xml', 'Server': 'AmazonS3'}

With the AWS SDK for Go v2, it is currently not possible to retry that call as the SDK does not expose the header in case of 301 HTTP responses. And adding that feature has been rejected by the development team.

As we only want the region information, there's no need to retry that call, making simple unauthenticated requests is enough to get the desired header:

# curl -HEAD -i https://commoncrawl.s3.amazonaws.com
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: ****
x-amz-id-2: ****
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 20 Feb 2024 12:33:07 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>****</RequestId><HostId>****</HostId></Error>[

Even with path style usage:

# curl -HEAD -i https://s3.amazonaws.com/commoncrawl
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: ****
x-amz-id-2: ****
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 20 Feb 2024 12:46:36 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>****</RequestId><HostId>****</HostId></Error>

See aws/aws-sdk-go#356 (comment)
and aws/aws-sdk-go#720 (comment)

Right now, I believe the way to go is to perform such plain HTTP HEAD requests to get the header value.

@Veetaha
Copy link

Veetaha commented Feb 20, 2024

making simple unauthenticated requests is enough to get the desired header

That is surprising... Also, I'm not sure how much quota AWS provides for the rate of unauthenticated requests. It's probably lower? And the question is wether the response still contains the bucket region header if a RateExceeded or Throttling error is returned

@pdecat
Copy link
Contributor

pdecat commented Feb 20, 2024

There's caching in the steampipe query data connection manager to avoid requesting the region to often for a single bucket. I've not faced throttling on that HEAD method yet, will perform more load testing ASAP to see how it behaves.

FWIW, I'm almost done before submitting my PR.

@pdecat
Copy link
Contributor

pdecat commented Feb 20, 2024

Submitted #2082, PTAL 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
7 participants