Files

.circleci
Algorithms_and_Hardness_for_Learning_Linear_Thresholds_from_Label_Proportions
CIQA
COSTAR
CardBench_zero_shot_cardinality_training
CoDi
Domain_Agnostic_Contrastive_Representations_for_Learning_from_Label_Proportions
KNF
LLP_Bench
On_Combining_Bags_to_Better_Learn_from_Label_Proportions
OpenMSD
STraTA
aav
abps
abstract_nas
action_angle_networks
action_gap_rl
activation_clustering
active_selective_prediction
adaptive_learning_rate_tuner
adaptive_low_rank
adaptive_prediction
adaptive_surrogates
adversarial_nets_lr_scheduler
after_kernel
agile_modeling
al_for_fep
albert
algae_dice
aloe
alx
amortized_bo
android_control
android_in_the_wild
anthea
aptamers_mlpd
aqt
aquadem
ara_optimization
arithmetic_sampling
arxiv_latex_cleaner
assemblenet
assessment_plan_modeling
attentional_adapters
attribute_semantics
attribute_with_prefixlm
attribution
automated_feature_engineering
automatic_structured_vi
automl_zero
autoregressive_diffusion
aux_tasks
axial
bam
bangbang_qaoa
basisnet
batch_science
behavior_regularized_offline_rl
bertseq2seq
better_storylines
bigg
bigger_better_faster
bisimulation_aaai2020
bitempered_loss
blur
bnn_hmc
bonus_based_exploration
building_detection
business_metric_aware_forecasting
bustle
c_learning
- README.md
- c_learning_agent.py
- c_learning_envs.py
- c_learning_utils.py
- train_eval.py
cache_replacement
caltrain
camp_zipnerf
cann
capsule_em
caql
cascaded_networks
cate
causal_label_bias
cbertscore
cell_embedder
cell_mixer
cfq
cfq_pt_vs_sa
charformer
ciw_label_noise
ckd
class_balanced_distillation
clay
clip_as_rnn
cluster_gcn
clustering_normalized_cuts
cmmd
cnn_quantization
cochlear_implant
code_as_policies
codistillation
cognate_inpaint_neighbors
coherent_gradients
cola
cold_posterior_bnn
cold_posterior_flax
collocated_irradiance_network
coltran
combiner
comisr
compgen_d2t
compositional_classification
compositional_rl
compositional_transformers
concept_explanations
concept_marl
conceptor
conqur
constrained_language_typology
context_aware_transliteration
contrack
contrails
contrastive_rl
coref_mt5
correct_batch_effects_wdn
correlated_compression
correlation_clustering
covid_epidemiology
covid_vhh_design
cube_unfoldings
cubert
cvl_public
d3pm
dac
darc
data_free_distillation
data_selection
dataset_or_not
dble
ddpm_w_distillation
deciphering_clinical_abbreviations
dedal
deep_homography
deep_representation_one_class
demogen
dense_representations_for_entity_retrieval
deplot
depth_and_motion_learning
depth_from_video_in_the_wild
design_bipartite_experiments
dialogue_ope
dichotomy_of_control
dictionary_learning
didi_dataset
differentiable_data_selection
differentially_private_gnns
diffusion_distillation
dimensions_of_motion
dipper
direction_net
disarm
dissecting_factual_predictions
distinguishing_romanized_hindi_urdu
distracting_control
distribution_embedding_networks
dnn_predict_accuracy
do_wide_and_deep_networks_learn_the_same_things
docent
domain_conditional_predictors
dot_vs_learned_similarity
dp_alternating_minimization
dp_instructions
dp_multiq
dp_posets
dp_regression
dp_sgd_clipping
dp_topk
dp_transfer
dpok
dql_grasping
drawtext
dreamfields
dreg_estimators
drfact
drjax
drops
dselect_k_moe
dual_dice
dual_pixels
dvrl
earthquakes_fern
ebp
editable_graph_temporal
eeg_modelling
eim
eli5_retrieval_large_lm
elo_rater_model
enas_lm
encyclopedic_vqa
entropy_semiring
eq_mag_prediction
es_enas
es_maml
es_optimization
etcmodel
etcsum
euphonia_spice
ev3
evanet
evolution
experience_replay
explaining_risk_increase
extreme_memorization
f_divergence_estimation_ram_mc
f_net
factoring_sqif
factorize_a_city
factors_of_influence
fair_submodular_matroid
fair_submodular_maximization_2020
fair_survival_analysis
fairness_and_bias_in_online_selection
fairness_teaching
fast_gradient_clipping
fast_k_means_2020
fastconvnets
fat
federated_vision_datasets
felix
findit
fisher_brc
flare_removal
flax_models
floatseg
flood_forecasting
fm4tlp
fractals_language
frechet_audio_distance
frechet_video_distance
frequency_analysis
frmt
frost
fsq
fully_dynamic_facility_location
fully_dynamic_submodular_maximization
fvlm
fwl
gaternet
gbrt
ged_tts
gen_patch_neural_rendering
general-pattern-machines
generalization_representations_rl_aistats22
generalized_rates
generative_forests
generative_trees
genomics_ood
geometric_tokenizer
gfsa
ghum
gift
gigamol
goemotions
gon
gradient_based_tuning
gradient_coresets_replay
graph_compression
graph_embedding
graph_sampler
graph_temporal_ai
graphqa
grbm
group_agnostic_fairness
grouptesting
grow_bert
gumbel_max_causal_gadgets
gwikimatch
hal
hct
health_equity_toolbox
hierarchical_foresight
high_confidence_ir_eval_using_genai
hipi
hist_thresh
hitnet
hmc_swindles
homophonous_logography
hspace
hst_clustering
human_attention
human_object_interaction
hybrid_zero_dynamics
hyperattention
hyperbolic
hyperbolic_discount
hypertransformer
ials
icetea
ieg
igt_optimizer
ime
imghum
imp
implicit_constrained_optimization
implicit_pdf
incontext
incremental_gain
individually_fair_clustering
inerf
infinite_nature
infinite_nature_zero
infinite_uncertainty
instruction_following_eval
intent_recognition
interactive_cbms
interpretability_benchmark
invariant_explanations
invariant_slot_attention
investigating_m4
ipagnn
irregular_timeseries_pretraining
isl
isolating_factors
jax_dft
jax_mpc
jax_particles
jaxbarf
jaxnerf
jaxraytrace
jaxsel
jaxstronomy
jrl
jslm
k_norm
keypose
kip
kl_guided_sampling
kobe
ksme
kwikbucks
kws_streaming
l2da
l2tl
label_bias
lamp
language_model_uncertainty
large_margin
large_scale_voting
lasagna_mt
latent_programmer
latent_shift_adaptation
layout-blt
learn_to_forget
learn_to_infer
learning_parameter_allocation
learning_with_little_mixing
learnreg
ledge
lego
light_field_neural_rendering
lighthouse
linear_dynamical_systems
linear_eval
linear_identifiability
linear_vae
lista_design_space
llm4mobile
llm_longdoc_interpretability
lm_fact_tracing
lm_memorization
local_forward_gradient
locoprop
logic_inference_dataset
logit_adjustment
loss_functions_transfer
low_rank_local_connectivity
m_layer
m_theory
macro_mining
madlad_400
many_constraints
marot
mathwriting
matsci
mave
mbpp
mechanic
meena
memento
memory_efficient_attention
menger_rl
mentormix
merf
mesh_diffusion
meta_augmentation
meta_learning_without_memorization
meta_pseudo_labels
meta_reward_learning
metapose
mface
mico
micronet_challenge
microscope_image_quality
milking_cowmask
minigrid_basics
mir_uai24
misinfo_provenance
missing_link
ml_debiaser
mobilebert
model_pruning
moe_models_implicit_bias
moe_mtl
moew
mol_dqn
moment_advice
motion_blur
mpi_extrapolation
mqm_viewer
muNet
mucped22
mucped23
muller
multi_annotator
multi_game_dt
multi_resolution_rec
multilingual_abbreviation_survey
multimodalchat
munchausen_rl
musiq
mutual_information_representation_learning
muzero
ncsnv3
negative_cache
nerflets
nested_rhat
neural_additive_models
neural_guided_symbolic_regression
neutra
newspalm_mbr_qe
nf_diffusion
ngrammer
nigt_optimizer
nngp_nas
non_decomp
non_semantic_speech_benchmark
nopad_inception_v3_fcn
norml
npy_array
numbert
occluder_recovery
offline_online_bandits
omnimatte3D
on_device_rewrite
online_belief_propagation
online_correlation_clustering
opencontrails
openscene
opt_list
optimizing_interpretability
osf
padir
pair_ngram
pairwise_fairness
pali
palm2_automqm
parallel_clustering
pde_preconditioner
performer
persistent-nature
persistent_es
perso_arabic_norm
perturbations
pgdl
playrooms
poem
policy_eval
polish
poly_kernel_sketch
polysketchformer
postproc_fairness
pretrained_conv
prime
primer
privacy_poison
privacy_sandbox
private_covariance_estimation
private_kendall
private_personalized_pagerank
private_sampling
private_text_transformers
procedure_cloning
property_linking
protein_lm
protenn
protnlm
protoattend
protseq
proxy_rewards
pruning_identified_exemplars
pse
psyborgs
psycholab
ptopk_patch_selection
pvn
pwil
q_match
qanet
qsp_quantum_metrology
quantile_regression
quantum_sample_learning
r4r
rank_ckpt
rankgen
rankt5
ravens
rcc_algorithms
rce
re_identification_risk
readtwice
realformer
recs_ecosystem_creator_rl
recursive_optimizer
red-ace
regnerf
relc
rembert
remote_sensing_representations
repnet
representation_batch_rl
representation_clustering
representation_similarity
reset_free_learning
resolve_ref_exp_elements_ml
restarting_FOM_for_LP
revisiting_neural_scaling_laws
rewritelm
richhf_18k
rico_semantics
rise
rl4circopt
rl_metrics_aaai2021
rl_repr
rllim
robust_count_sketch
robust_loss
robust_loss_jax
robust_optim
robust_retrieval
rouge
routing_transformer
rpc
rrlfd
rs_gnn
saccader
saf
sail_rl
saycan
scalable_shampoo
scaling_transformer_inference_efficiency
scaling_transformers
scann
schema_guided_dst
schptm_benchmark
score_prior
scouts_ml_model_env
screen2words
scrna_benchmark
sd_gym
seeds
seq2act
sequential_attention
sgk
shortcut_testing
sifer
sign_language_detection
simpdom
simple_mesh_viewer
simple_probabilistic_programming
simulation_research
single_view_mpi
sketching
sliding_window_clustering
slot_attention
sm3
smart_eval
smerf
smith
smu
smug_saliency
smurf
snerg
snlds
sobolev
social_rl
socraticmodels
soft_sort
soft_topk
soil_moisture_retrieval
solver1d
sorb
spaceopt
sparse_data
sparse_deferred
sparse_mixers
sparse_soft_topk
sparse_user_encoders
special_orthogonalization
specinvert
spectral_bias
spectral_graphormer
speech_embedding
spelling_convention_nlm
spin_spherical_cnns
spreadsheet_coder
sql_palm
squiggles
stable_transfer
stacked_capsule_autoencoders
standalone_self_attention_in_vision_models
star_cfq
state_of_sparsity
stochastic_to_deterministic
storm_optimizer
strategic_exploration
stream_s2s
streetview_contrails_dataset
structformer
structured_multihashing
student_mentor_dataset_cleaning
study_recommend
subclass_distillation
sudoku_gpt
sufficient_input_subsets
summae
supcon
supervised_pixel_contrastive_loss
symbolic_functionals
t5_closed_book_qa
tabnet
tag
talk_about_random_splits
taperception
task_set
task_specific_learned_opt
tcc
tempered_boosting
text_blueprint
tf3d
tf_trees
tft
tide
tide_nlp
time_varying_optimization
tiny_video_nets
tip
topological_transformer
towards_gan_benchmarks
trainable_grids
transformer_modifications
trimap
true_teacher
truss_decomposition
tsmixer
tubevit
tunas
uflow
ugif
ugsl
ul2
uncertainties
understanding_convolutions_on_graphs
universal_embedding_challenge
unprocessing
uq_benchmark_2019
using_dl_to_annotate_protein_universe
vae_ood
value_dice
value_function_polytope
vatt
vbmi
vct
vdvae_flax
video_structure
video_timeline_modeling
vila
visual_relationship
vmsst
vrdu
warmstart_graphcut_image_segmentation
wavelet_fields
weak_disentangle
whatever-name-we-choose
widget-caption
widget_caption
wiki_split_bleu_eval
wildfire_conv_lstm
wildfire_perc_sim
wt5
xirl
xor_attriqa
yeast_transcription_network
yobo
yoto
youtube_asl
youtube_sl_25
zebra_puzzle_generator
zebraix
zero_shot_structured_reflection
.gitignore
CONTRIBUTING.md
LICENSE
README.md
__init__.py
compile_protos.sh

c_learning

siyangl

and

copybara-github

Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…

Jan 23, 2024

1d49f2c · Jan 23, 2024

History

This branch is 46 commits behind google-research/google-research:master.

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	Fixed typos in installation and usage instructions.	Jun 18, 2021
c_learning_agent.py	c_learning_agent.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
c_learning_envs.py	c_learning_envs.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
c_learning_utils.py	c_learning_utils.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
train_eval.py	train_eval.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024

README.md

C-Learning: Learning to Achieve Goals via Recursive Classification

Benjamin Eysenbach, Ruslan Salakhutinov , Sergey Levine

paper, website

tldr: We reframe goal-conditioned RL as the problem of predicting and controlling the future state distribution of an autonomous agent. We solve this problem indirectly by training a classifier to predict whether an observation comes from the future. Importantly, an off-policy variant of our algorithm allows us to predict the future state distribution of a new policy, without collecting new experience. While conceptually similar to Q-learning, our approach provides a theoretical justification for goal-relabeling methods employed in prior work and suggests how the goal-sampling ratio can be optimally chosen. Empirically our method outperforms these prior methods.

If you use this code, please consider adding the corresponding citation:

@article{eysenbach2020clearning,
  title={C-Learning: Learning to Achieve Goals via Recursive Classification},
  author={Eysenbach, Benjamin and Salakhutdinov, Ruslan and Levine, Sergey},
  journal={arXiv preprint arXiv:2011.08909},
  year={2020}
}

Installation

These instructions were tested in Google Cloud Compute with Ubuntu version 18.04.

1. Install Mujoco

Copy your mujoco key to ~/.mujoco/mjkey.txt, then complete the steps below:

sudo apt install unzip gcc libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so
wget https://www.roboti.us/download/mujoco200_linux.zip -P /tmp
unzip /tmp/mujoco200_linux.zip -d ~/.mujoco
mv ~/.mujoco/mujoco200_linux ~/.mujoco/mujoco200
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin" >> ~/.bashrc

2. Install Anaconda

wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh
chmod +x Miniconda2-latest-Linux-x86_64.sh
./Miniconda2-latest-Linux-x86_64.sh

Restart your terminal so the changes take effect.

3. Create an Anaconda environment and install the remaining dependencies

conda create --name c-learning python=3.6
conda activate c-learning
pip install tensorflow==2.4.0rc0
pip install tf_agents==0.6.0
pip install gym==0.13.1
pip install mujoco-py==2.0.2.10
pip install git+https://github.com/rlworkgroup/metaworld.git@33f3b90495be99f02a61da501d7d661e6bc675c5

Running Experiments

The following lines replicate the C-learning experiments on the Sawyer tasks. Training proceeds at roughly 120 FPS on an 12-core CPU machine (no GPU). The experiments in Fig. 3 ran for up to 3M time steps, corresponding to about 7 hours. Please see the discussion below for an explanation of what the various command line options do.

python train_eval.py --root_dir=~/c_learning/sawyer_reach --gin_bindings='train_eval.env_name="sawyer_reach"' --gin_bindings='obs_to_goal.start_index=0' --gin_bindings='obs_to_goal.end_index=3' --gin_bindings='goal_fn.relabel_next_prob=0.5' --gin_bindings='goal_fn.relabel_future_prob=0.0'

python train_eval.py --root_dir=~/c_learning/sawyer_push --gin_bindings='train_eval.env_name="sawyer_push"' --gin_bindings='train_eval.log_subset=(3, 6)' --gin_bindings='goal_fn.relabel_next_prob=0.3' --gin_bindings='goal_fn.relabel_future_prob=0.2' --gin_bindings='SawyerPush.reset.arm_goal_type="goal"' --gin_bindings='SawyerPush.reset.fix_z=True' --gin_bindings='load_sawyer_push.random_init=True' --gin_bindings='load_sawyer_push.wide_goals=True'

python train_eval.py --root_dir=~/c_learning/sawyer_drawer --gin_bindings='train_eval.env_name="sawyer_drawer"' --gin_bindings='train_eval.log_subset=(3, None)' --gin_bindings='goal_fn.relabel_next_prob=0.3' --gin_bindings='goal_fn.relabel_future_prob=0.2' --gin_bindings='SawyerDrawer.reset.arm_goal_type="goal"'

python train_eval.py --root_dir=~/c_learning/sawyer_window --gin_bindings='train_eval.env_name="sawyer_window"' --gin_bindings='train_eval.log_subset=(3, None)' --gin_bindings='SawyerWindow.reset.arm_goal_type="goal"' --gin_bindings='goal_fn.relabel_next_prob=0.5' --gin_bindings='goal_fn.relabel_future_prob=0.0'

Explanation of the command line arguments:

train_eval.env_name: Selects which environment to use.
obs_to_goal.start_index, obs_to_goal.end_index: Select a subset of the observation to use for learning the classifier and policy. This option modifies C-learning to predict and control the density $p (s_{t +} [start : end] ∣ s_{t}, a_{t})$ . For example, the sawyer_reach task actually contains an object in coordinates 3 through 6, but we want to ignore the object position when learning reaching.
goal_fn.relabel_next_prob, goal_fn.relabel_future_prob: TD C-learning says that 50% of goals should be sampled from the next state distribution, corresponding to setting goal_fn.relabel_next_prob=0.5. The hybrid MC + TD version of C-learning described in Appendix E changes this so that some goals are also sampled from the future state distribution (corresponding to setting goal_fn.relabel_future_prob). For C-learning, we assume that goal_fn.relabel_next_prob + goal_fn.relabel_future_prob == 0.5. To prevent unintended bugs, both these parameters must be explicitly specified in the command line.
*.reset.arm_goal_type: For the Sawyer manipulation tasks, the goal state contains both the desired object position and arm position. Use *.reset.arm_goal_type="goal" to indicate that the arm goal position should be the same as the object goal position. Use *.reset.arm_goal_type="puck" to indicate that the arm goal position should be the same as the initial object position.
load_sawyer_push.random_init: Whether to randomize the initial arm position.
load_sawyer_push.wide_goals: Whether to sample a wider range of goals.

Questions?

If you have any questions, comments, or suggestions, please reach out to Benjamin Eysenbach (eysenbach@google.com).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

c_learning

c_learning

README.md

C-Learning: Learning to Achieve Goals via Recursive Classification

Installation

1. Install Mujoco

2. Install Anaconda

3. Create an Anaconda environment and install the remaining dependencies

Running Experiments

Questions?

Files

c_learning

Directory actions

More options

Directory actions

More options

Latest commit

History

c_learning

Folders and files

parent directory

README.md

C-Learning: Learning to Achieve Goals via Recursive Classification

Installation

1. Install Mujoco

2. Install Anaconda

3. Create an Anaconda environment and install the remaining dependencies

Running Experiments

Questions?