-
Notifications
You must be signed in to change notification settings - Fork 24
How does the example cron based backup job work? #20
Comments
You're right that manually calling strata will run in the foreground and not return control to the bash console until it is finished. The concern is that for a given replica ID, only one write-capable strata operation should run at once. Suppose you put 3 sequential strata calls in a bash script, and then execute that script once every two hours. What if the script takes longer than 2 hours to finish? Then your first invocation would still be running when your second invocation starts, and you would risk running two write-capable strata operations at once. This is the problem that I tried to address in the example with start-stop-daemon. But you might be right about start-stop-daemon's behavior; I'm not very familiar with it, and we do not actually use it to run strata. |
At Parse we actually ended up using a python wrapper to chain the commands together, with a file lock to protect against multiple backups running at once. Here is the code for reference, maybe it can help you. def _do_strata_backup(replicaset, hostname, bucket_name="mybucket",
region="us-east-1", bucket_prefix="mongo-rocks"):
# get flock
if not helper.flock(BACKUP_FLOCK_PATH):
error("Failed to obtain lock on %s. Is there another backup running?"
% BACKUP_FLOCK_PATH)
return 1
# kick off strata
# backup
strata_cmd = "/usr/bin/strata " \
"--bucket=" + bucket_name + " " \
"--region=" + region + " " \
"--bucket-prefix=" + bucket_prefix + " " \
"backup " \
"--replica-id=" + replicaset + "_" + hostname
return_code = helper.run_shell_cmd(strata_cmd)
# only do metadata cleanup if the backup succeeded
if return_code == 0:
delete_cmd = "/usr/bin/strata " \
"--bucket=" + bucket_name + " " \
"--region=" + region + " " \
"--bucket-prefix=" + bucket_prefix + " " \
"delete " \
"--replica-id=" + replicaset + "_" + hostname + " " \
"--age=" + ROCKS_BACKUP_RETENTION
delete_code = helper.run_shell_cmd(delete_cmd)
# only run gc if metadata delete succeeded
if delete_code == 0:
gc_cmd = "/usr/bin/strata " \
"--bucket=" + bucket_name + " " \
"--region=" + region + " " \
"--bucket-prefix=" + bucket_prefix + " " \
"gc " \
"--replica-id=" + replicaset + "_" + hostname + " "
gc_code = helper.run_shell_cmd(gc_cmd)
if gc_code != 0:
warn("Got error code %d when runnign gc" % gc_code)
else:
warn("Got error code %d when running metadata cleanup" %
delete_code)
# clear flock
if not helper.unflock(BACKUP_FLOCK_PATH):
warn("Failed to cleanly release lock on %s" % BACKUP_FLOCK_PATH)
return return_code run_shell_command is just a wrapper around subprocess.Popen def run_shell_cmd(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT):
args = shlex.split(command)
proc = subprocess.Popen(args, stdout=stdout, stderr=stderr)
while proc.poll() is None:
cmd_out = proc.stdout.readline()
info(cmd_out.rstrip())
return proc.returncode and flock: def flock(path):
try:
f = os.open(path, os.O_CREAT)
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
return False
return True |
@AGFeldman @tredman glad to have both of you back! :) |
Hey @AGFeldman @tredman Thanks for the quick response.
Yes I see why start-stop-daemon was used now. Unfortunately it is not available for redhat6 / amazon linux. I did find some start-stop-daemon rpms people have built for redhat. But it's black box and I can't be completely sure what they do. I have a picture in my head of me living under a bridge and having to explain to my children how I lost my job because I downloaded something off the internet and put it on a production database server. The at command seems to have a queue feature. But when I tested it out, it didn't behave anything like a queue. False advertising, hours wasted.
Thanks for supplying me with your python code. I don't want to sound ungrateful but I probably won't use it because it is yet another language for me to learn. At the moment I think the best way might be to simply write a bash script which creates a PID file and simply cron the bash script at regular intervals. If the process is still running from the last scheduled run, the current one will simply exit similar to how the start-stop-daemon based script was intended to work. Thanks again for openly sharing all your wisdom. Without people like you guys, there probably will be no internet and people like me probably wouldn't have jobs. Thanks. |
there there... |
Hey Guys
Neat project. So many awe inspiring open source projects on the internet.
I got one question on the example cron based backup job detailed here:
https://github.com/facebookgo/rocks-strata/blob/master/examples/backup/run.sh
How would this backup job work? I'm not an unix expert, but according to the start-stop-daemon documentation I found, it will prevent a call from starting a new process if a process with the same name is already running is that right?
The way I understand it, because it's running the process as a daemon, wouldn't the call to "backup" return immediately and the run.sh script would attempt to run the 2nd line "delete" straight afterwards while "backup" is still running?
So does that mean if "backup" takes more than a few milliseconds to run, "delete" would never actually get run?
I am only questioning the use of the start-stop-daemon because I need to find an alternative because the process isn't available in amazon linux. To achieve the purpose of avoiding multiple concurrent writes to the same file in S3, wouldn't it be fine to do the 3 strata calls sequentially one after the other in run.sh? When I tested the the strata calls manually, it looked like it was running in the foreground and doesn't return control to the bash console until it is finished. Would this work?
Thanks
The text was updated successfully, but these errors were encountered: