Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TD time step parameter #87

Open
baedan opened this issue Jun 19, 2022 · 0 comments
Open

TD time step parameter #87

baedan opened this issue Jun 19, 2022 · 0 comments

Comments

@baedan
Copy link
Collaborator

baedan commented Jun 19, 2022

currently multi-step TD has an incorrect parameter (JuliaReinforcementLearning/ReinforcementLearning.jl#648).

function run_once(n, α)
env = StateTransformedEnv(
RandomWalk1D(N=NS, actions=ACTIONS),
state_mapping=GroupMapping(n=NS)
)
agent = Agent(
policy=VBasedPolicy(
learner=TDLearner(
approximator=TabularVApproximator(;
n_state=n_groups+2,
opt=Descent(α)
),
method=:SRS,
n=n
),
mapping=(env,V) -> rand(action_space(env))
),
trajectory=VectorSARTTrajectory()
)
hook = RecordRMS()
run(agent, env, StopAfterEpisode(10),hook)
mean(hook.rms)
end

as an example, the n is used as the number of time steps. however it currently corresponds to the number of time steps plus one. run_once(1, α) thus is not TD(0) which has a time step parameter of 1, but rather a 2-step TD method. depending on how upstream is resolved an update might be needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant