You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 12, 2018. It is now read-only.
Haven't had time to look into this but logging here for future reference. We have had about a dozen cases in the last week where we find the strata binary hung up for over 24h (normal time is < 5 min on average). In the below example this occurred while doing garbage collection, but I have also seen this happen with backups.
but tcpdump doesn't show any activity on this port.
$ sudo tcpdump -i eth0 -n tcp src port 51681
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
The connection is clearly dead but the process or kernel haven't figured it out. In any case, the workaround is to kick (kill -15) and let the next backup run as scheduled. Any missed files will get picked up on the next run.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Haven't had time to look into this but logging here for future reference. We have had about a dozen cases in the last week where we find the strata binary hung up for over 24h (normal time is < 5 min on average). In the below example this occurred while doing garbage collection, but I have also seen this happen with backups.
lsof shows one active connection to AWS
$ sudo lsof -p 24099 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME strata 24099 root cwd DIR 202,1 4096 16386 /root strata 24099 root rtd DIR 202,1 4096 2 / strata 24099 root txt REG 202,1 7663776 34085 /usr/bin/strata strata 24099 root mem REG 202,1 1807032 395277 /lib/x86_64-linux-gnu/libc-2.15.so strata 24099 root mem REG 202,1 135366 395280 /lib/x86_64-linux-gnu/libpthread-2.15.so strata 24099 root mem REG 202,1 149280 395283 /lib/x86_64-linux-gnu/ld-2.15.so strata 24099 root 0r FIFO 0,8 0t0 17782110 pipe strata 24099 root 1w FIFO 0,8 0t0 17794429 pipe strata 24099 root 2w FIFO 0,8 0t0 17794429 pipe strata 24099 root 3r CHR 1,9 0t0 3081 /dev/urandom strata 24099 root 4u CHR 1,3 0t0 3076 /dev/null strata 24099 root 5u IPv4 17821048 0t0 TCP ip-10-252-0-135.ec2.internal:51681->s3-1.amazonaws.com:https (ESTABLISHED) strata 24099 root 6u 0000 0,9 0 11491 anon_inode strata 24099 root 9w FIFO 0,8 0t0 17780115 pipe strata 24099 root 11r REG 202,1 0 6639 /tmp/mtools_backup.lock strata 24099 root 63w FIFO 0,8 0t0 17780115 pipe
netstat confirms this:
$ netstat -ant | grep 51681 tcp 0 0 10.252.0.135:51681 52.216.192.3:443 ESTABLISHED
but tcpdump doesn't show any activity on this port.
$ sudo tcpdump -i eth0 -n tcp src port 51681 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel
The connection is clearly dead but the process or kernel haven't figured it out. In any case, the workaround is to kick (kill -15) and let the next backup run as scheduled. Any missed files will get picked up on the next run.
The text was updated successfully, but these errors were encountered: