Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to calculate a value in before-run which is available afterwards for the restic commands #299

Open
darkmattercoder opened this issue Jan 15, 2024 · 2 comments
Milestone

Comments

@darkmattercoder
Copy link

darkmattercoder commented Jan 15, 2024

I am having a hard time, getting my head around how the environment is available during resticprofile runs and how I can manipulate it to inject values to variables which I need to be present during restic execution (or at parsing time)

Ultimately the Problem to solve is that I want to have a check profile which does a read-data-subset: n/m where m is some value that can be static but n should count upwards based on the day of year and be reset to 1 when it reaches m.

I first built some complicated things with multiple profiles that make use of {{ (.Now.AddDate 0 0 -(m-1)).YearDay }} scheduling it with multiple OnCalendar-directives (Which kinda worked), but that requires a lot of code duplication which I am apparently not able (given my current knowledge) to deduplicate using the template language and ends up in for example at least 7 profile entries if m happens to be 50. Also to deal with the leap year is really complicated and makes everything even messier.

So I can easily calculate the value of n externally using a script or binary. I only need to have that value present for accessing it in the read-data-subset: call.

I am not able to even simulate it by hardcoding an env: variable. At the actual check command line, it is always empty.

I am a bit lost at this point. Any Ideas how to proceed?

P.S.:

This was the complicated approach which kinda works, I described above:

read-data-daily-1:
  inherit: dummy
  run-before: "curl -m 10 --retry 5 {{ $healthchecks_ping_url }}/slug/start?create=1"
  run-finally: "if [ -z $ERROR_EXIT_CODE ];then ERROR_EXIT_CODE=\"0\";fi;curl -m 10 --retry 5 \"{{ $healthchecks_ping_url }}/slug/$ERROR_EXIT_CODE\""
  check:
    read-data-subset: {{ .Now.YearDay }}/50
    # first 50 days of the year
    schedule: 
      - '*-1-* 02:00:00'
      - '*-2-1..19 02:00:00'
    schedule-permission: user
    schedule-priority: background
    schedule-lock-mode: default
    schedule-lock-wait: 1h

And this was an approach where I experimented with a hardcoded env: config, which did not expand the variable on the actual check execution:

read-data-daily:
  inherit: read-data-daily-1
  env:
    EXAMPLE_PORTION: "100"
  run-before:
    # Is correctly expanded at runtime
    - "echo \"Portion: $EXAMPLE_PORTION\""
  check:
    # is empty
    read-data-subset: ${EXAMPLE_PORTION}/255
    schedule: 
      - 'daily' 
@darkmattercoder
Copy link
Author

After more research, I doubt that things in the configuration can be dynamically calculated at all if not possible via the built-in mechanisms that the templates provide.

So I came up with a workaround. I use run-before to collect the parameters for read-data-subset and execute the restic command manually there. That way scheduling and using resticprofile as the wrapper still works. However, the check command without arguments is executed nonetheless after the run-before-section has passed. But since this is a less time consuming option the advantages outweigh the disadvantages imho. To speed that up even more, I explicitly use with-cache: true for this.

In case anyone else should be interested, Here is how I implemented the workaround:

# profile to perform a full read-data consistency check on <repo>. Data is read in subsets. We split the data so that n/m of the whole set is read each day.
# That means after m passed days, we visited most of the repo during that interval (of course, changes might be re-checked only every other m days when data has been added)
# Since there seems to be no way to natively use a dynamic value for the read-data-subset parameter as of now (2024-01-16), we have to do a hack where we run the entire restic 
# command manually in `run-before` with dynamically calculated values. Resticprofile will then additionally call a regular check afterwards. This is unavoidable when we want to use
# the resticprofile api as is. We can try to keep the execution time of that quick check as low as possible by using the `--with-cache` option for this. We should switch to a more 
# native approach as soon as a possibility to add dynamic values for the `read-data-subset` parameter will be available. See https://github.com/creativeprojects/resticprofile/issues/299
read-data-daily:
  inherit: dummy
  env:
    MAX_DATA_PORTIONS: 30
    BIN_DIR: {{ .CurrentDir }}/../../../bin
    RESTIC_HOST_DIR: {{ .CurrentDir }}/../../../hosts/<host>
  run-before: 
    - curl -m 10 --retry 5 {{ $healthchecks_ping_url }}/slug/start?create=1
    - echo "Starting manual restic check";
      echo "";
      source $RESTIC_HOST_DIR/resticenv.sh;
      PORTION_OF_THE_DAY=$($BIN_DIR/calculate-portion.sh $MAX_DATA_PORTIONS);
      restic check --read-data-subset=$PORTION_OF_THE_DAY/$MAX_DATA_PORTIONS;
      unset RESTIC_REPOSITORY;
      unset RESTIC_PASSWORD;
      echo "proceeding with resticprofile check command";
      echo ""
  run-finally: "if [ -z $ERROR_EXIT_CODE ];then ERROR_EXIT_CODE=\"0\";fi;curl -m 10 --retry 5 \"{{ $healthchecks_ping_url }}/slug/$ERROR_EXIT_CODE\""
  check:
    with-cache: true
    schedule: '*-*-* 01:00:00'
    schedule-permission: user
    schedule-priority: background
    schedule-lock-mode: default
    schedule-lock-wait: 1h

And the script calculate-portion.sh which calculates n for a given m is here:

#!/bin/bash

print_usage_and_exit(){
    echo "Usage: $0 <m> [<o>] [<YYYY-MM-DD>]"
    echo "This script calculates a specific value <n> on each different day it is called."
    echo "<n> will always be a positive integer in the range [1,<m>]."
    echo "<m> is passed as a mandatory argument. It has to be a positive integer."
    echo "<o> is an optional, non-positional argument which is an offset in days to the day the calculation is done on."
    echo "<YYYY-MM-DD> is an optional, non positional date argument for which <n> should be calculated. If not given, the script assumes the current date for the calculation."
	echo "For each call of the script on subsequent <YYYY-MM-DD>, as long as the offset and the value of <m> are not changed,the resulting value of <n> is"
	echo "incremented by one. So that <n> cycles from 1 to <m> on each call on subsequent days. It will start over with 1 the day after the day on which it equals <m>"
    exit $1
}

# echo "Your input for <o> was \"$1\". <o> must be an integral value!"
validate_offset(){
	if ! [[ $1 =~ ^-?[0-9]+$ ]];then
		return 1
	fi
	return 0
}

validate_date(){
	if [[ ! "$1" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
		return 1
	fi
	if ! date -d "$1" >/dev/null 2>&1; then
			return 1
	fi
	return 0
}

# Check if at least one argument <m> is provided
if [ $# -lt 1 ]; then
	echo "At least one argument for <m> has to be given"
	echo ""
    print_usage_and_exit 1
fi

# Check if at most 3 arguments are provided
if [[ $# -gt 3 ]]; then
	echo "Too many arguments given. Expected 3 at most, got $#"
	echo ""
	print_usage_and_exit 1
fi

# Extract the upper limit "m" from the arguments and test it for being a positive integer
m=$1
if ! [[ $m =~ ^[0-9]+$ ]]; then
	echo "Your input for <m> was \"$m\". <m> must be a positive integral value!"
	echo ""
	print_usage_and_exit 1
fi

# Defaults
offset_days=0
input_date=$(date +%y-%m-%d) 

# if a second argument is provided, check whether it is an offset or a year
if [ $# -ge 2 ]; then
	if validate_offset $2; then
		offset_days=$2
	elif validate_date $2; then
		input_date=$2
	else
		echo "Could neither get a valid offset value nor a valid date from argument \"$2\". Offsets have to be positive and dates have to be passed in the form of \"YYYY-MM-DD\""
		echo ""
		print_usage_and_exit 1
	fi
fi

# If a third argument is provided, check whether it is an offset or a year
if [ $# -eq 3 ]; then
	if (! validate_offset $3) && (! validate_date $3) then
		echo "Could neither get a valid offset value nor a valid date from argument \"$3\". Offsets have to be positive and dates have to be passed in the form of \"YYYY-MM-DD\""
		echo ""
		print_usage_and_exit 1
	fi
	if validate_offset $3 && validate_date $2; then
		offset_days=$3
	elif validate_date $3 && validate_offset $2; then
		input_date=$3
	else
		echo "It looks like you either provided two date arguments or two offset arguments. Please provide either type once. Offsets have to be positive and dates have to be passed in the form of \"YYYY-MM-DD\""
		echo ""
		print_usage_and_exit 1
	fi
fi

# Calculate days since the Unix epoch
input_timestamp=$(date -d "$input_date" +%s)
seconds_since_epoch=$(( (input_timestamp - $(date -d "1970-01-01" +"%s")) ))
days_since_epoch=$(( (seconds_since_epoch / 86400)  ))
days_since_epoch_with_offset=$(( (days_since_epoch + offset_days) ))

# Only positive results are allowed for the calculated days since the epoch when the offset is applied
if [[ $days_since_epoch_with_offset -lt 0 ]]; then
	echo "Invalid date or offset given. Use dates after and including 1970-01-01 + days offset value: $offset_days"
	print_usage_and_exit 1
fi

# Calculate the cyclic value between 1 and m
cyclic_value=$(( (1 + (days_since_epoch_with_offset % m)) ))

#echo "Days since the epoch for $input_date and an offset of $offset_days days: $days_since_epoch_with_offset"
#echo "Cyclic value (1 to $m) since the Unix epoch for $input_date: $cyclic_value"
echo "$cyclic_value"

@creativeprojects
Copy link
Owner

I do like this idea of using an environment variable inside any configuration value:

read-data-daily:
  inherit: read-data-daily-1
  env:
    EXAMPLE_PORTION: "100"
  run-before:
    # Is correctly expanded at runtime
    - "echo \"Portion: $EXAMPLE_PORTION\""
  check:
    # introducing new feature to make it work
    read-data-subset: ${EXAMPLE_PORTION}/255
    schedule: 
      - 'daily' 

that would simplify a lot of the pain of using template variables, which can only be referenced during the compilation of the configuration anyway.

Now we need to extract the environment variables that were set during a command.
I had an idea about that:

What if we wrap up any command inside a tiny shell script, like:

[command goes here, any `run-before`, `run-after` or calling `restic`]

# grab the exit code
exitCode = $?

# save environment variables
env > output_env.tmp

# return the exit code from the command
exit $exitCode

This way we can save the environment variables that have been set by the command.
It should be possible to use set for cmd.exe and Get-ChildItem ENV: on powershell

What do you think @jkellerer ?

@creativeprojects creativeprojects added this to the v0.27.0 milestone Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants