Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network errors impacting mason downloads #516

Open
springmeyer opened this issue Nov 20, 2017 · 16 comments
Open

Network errors impacting mason downloads #516

springmeyer opened this issue Nov 20, 2017 · 16 comments

Comments

@springmeyer
Copy link
Contributor

I feel like I've been seen an increased amount of network failures when fetching binaries from s3 in the last month +. This ticket stands to track these to start assembling a fuller picture of the failures and see if there is a pattern.

@springmeyer
Copy link
Contributor Author

Failed to download https://mason-binaries.s3.amazonaws.com/osx-x86_64/android-ndk/arm-9-r13b.tar.gz (returncode: 56) on OS X travis build: https://travis-ci.org/mapbox/mason/jobs/304807664#L1357

@springmeyer
Copy link
Contributor Author

/cc @mapsam who mentioned seeing multiple/repeated clang++ download failures. @mapsam was this on OS X or within docker?

@mapsam
Copy link
Member

mapsam commented Nov 20, 2017

@springmeyer I was on OSX and saw hangs with clang++ when using the following curl command:

curl -sSfL https://s3.amazonaws.com/mason-binaries/osx-x86_64/clang++/5.0.0.tar.gz | tar --gunzip --extract --strip-components=1

The connection to the file is made relatively quick, but the 30MB download takes much longer than other 30MB files.

@springmeyer
Copy link
Contributor Author

@mapsam, okay thanks for the details. After 1727795 mason will now output the exact returncode on error. This is what is producing the:

(returncode: 56)

Above in the error I saw in @artemp's commit where the android SDK failed to download. Let's keep an eye on whether we always see 56 (CURLE_RECV_ERROR) or whether we see other errors reported by curl.

@springmeyer
Copy link
Contributor Author

Not an s3 issue, but noting nonetheless that I also just hit this on an OS X travis job:

$ ./mason build ${MASON_NAME} ${MASON_VERSION}
Cloning into '/Users/travis/build/mapbox/mason/mason_packages/.build/mapnik-vf02a25901'...
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

Which looks like a git clone failing in curl, also with 56 as error.

@springmeyer
Copy link
Contributor Author

Now seeing:

* Downloading binary package https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz
Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz (returncode: 35)

https://travis-ci.org/mapbox/mason/jobs/305567441#L495

@springmeyer
Copy link
Contributor Author

ugh, also just hit:

oci runtime error: exec failed: container_linux.go:265: starting container process caused "could not create session key: disk quota exceeded"

https://travis-ci.org/mapbox/mason/jobs/305567199

@springmeyer
Copy link
Contributor Author

hrm:

./scripts/clang-format.sh
Downloading https://s3.amazonaws.com/mason-binaries/linux-x86_64/clang++/5.0.0.tar.gz
curl: (22) The requested URL returned error: 429 Too Many Requests
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make: *** [format] Error 2

https://travis-ci.org/mapbox/node-cpp-skel/jobs/305550318#L552

@springmeyer
Copy link
Contributor Author

/cc @rclark who I've spoken with about this a few weeks ago. @rclark - s3 downloads from the mason bucket appear to be degrading and the problem is worsening. Any ideas of things to test or try to get to the bottom of why this is happening?

@rclark
Copy link
Contributor

rclark commented Nov 22, 2017

Do you have any way to observe the S3 connections or S3 errors more directly? All the error codes you've got here appear to be from downstream applications that are perhaps reacting to S3 networking failures. But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.

@springmeyer
Copy link
Contributor Author

We are using curl on the command line to download the binary .tar.gz files from s3:

mason/mason.sh

Lines 533 to 544 in 2602c30

mason_step "Downloading binary package ${FULL_URL}"
local CURL_RESULT=0
local HTTP_RETURN=0
HTTP_RETURN=$(curl -w "%{http_code}" --retry 3 ${MASON_CURL_ARGS} -f -L ${FULL_URL} -o "${MASON_BINARIES_PATH}.tmp") || CURL_RESULT=$?
if [[ ${CURL_RESULT} != 0 ]]; then
if [[ ${HTTP_RETURN} == "403" ]]; then
mason_step "Binary not available for ${FULL_URL}"
else
mason_error "Failed to download ${FULL_URL} (returncode: ${CURL_RESULT})"
exit $CURL_RESULT
fi
else
.

In 1727795 I modified things to actually try to print the http error code.

But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.

That one (The requested URL returned error: 429 Too Many Requests) struck me as well - that looks to be coming from the curl code itself rather than the bash output logic I added.

@rclark
Copy link
Contributor

rclark commented Nov 22, 2017

I think I'd have to take it to AWS support. You might try to check for x-amz headers in the HTTP response to see if S3 is trying to tell you anything there.

@springmeyer
Copy link
Contributor Author

Thanks @rclark - signing off for the holiday now. I will add -v to dump the headers next time I see persistent errors.

@springmeyer
Copy link
Contributor Author

another one, which looks only related to travis network since the upstream is not coming from AWS. I probably won't post more of this kind to avoid being too noisy on this ticket, but will post this one since I've not seen it before:

* Downloading http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2...
curl: (56) Recv failure: Connection reset by peer
Failed to download http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2 (returncode: 56)

https://travis-ci.org/mapbox/mason/jobs/308033127#L1784

@springmeyer
Copy link
Contributor Author

CMake Error at cmake/mason.cmake:103 (message):
  [Mason] Failed to download
  https://mason-binaries.s3.amazonaws.com/headers/rapidjson/1.1.0.tar.gz:
  curl: (35) gnutls_handshake() failed: Error in the pull function.

https://circleci.com/gh/mapbox/mapbox-gl-native/88893?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

@sssoleileraaa
Copy link
Contributor

sssoleileraaa commented Apr 24, 2018

Error message in Travis when trying to download recently published LLVM 6.0.0 binaries:

Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/android-ndk/arm-14-r16b.tar.gz (returncode: 141)

Note: (returncode: 141)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants