22/03/2022
This repository is a collection of my personal solutions for the projects of the Udacity course - Deep Reinforcement Learning Nanodegree
For this project, an agent was trained to navigate (and collect bananas!) in a large, square world. Its goal is to collect as many yellow bananas as possible in each episode while avoiding blue bananas.
This project was made with the Reacher environment.
In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location.
Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.
The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.
This project was made with the Tennis environment.
In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
Due to the nature of the enviroment and how the reward function is designed, the enviroment rewards cooperation between the players.
Proof of course completion can be found in the Official Certificate by Udacity