human-feedback

Here are 14 public repositories matching this topic...

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

reinforcement-learning deep-learning transformers artificial-intelligence attention-mechanisms human-feedback

Updated Jan 14, 2024
Python

opendilab / awesome-RLHF

Star

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated May 10, 2024

conceptofmind / LaMDA-rlhf-pytorch

Star

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

machine-learning reinforcement-learning deep-learning transformers artificial-intelligence attention-mechanism human-feedback

Updated Feb 24, 2024
Python

wxjiao / ParroT

Star

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

machine-translation llama lora contrastive gpt-4 chatgpt human-feedback instruction-tuning bloomz error-guided

Updated Oct 12, 2023
Python

xrsrke / instructGOOSE

Star

Implementation of Reinforcement Learning from Human Feedback (RLHF)

reinforcement-learning chatgpt human-feedback rlhf instructgpt

Updated Apr 7, 2023
Jupyter Notebook

huggingface / data-is-better-together

Star

Let's build better datasets, together!

community machine-learning datasets human-feedback

Updated May 6, 2024
Jupyter Notebook

trubrics / trubrics-sdk

Star

Product analytics for AI Assistants

machine-learning mlops streamlit ml-monitoring llm human-feedback llmops model-feedback

Updated May 13, 2024
Python

yk7333 / d3po

Star

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

reinforcement-learning diffusion-models human-feedback

Updated Apr 6, 2024
Python

PKU-Alignment / beavertails

Star

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

safety llama gpt datasets language-model beaver ai-safety human-feedback-data llm llms human-feedback rlhf large-language-model safe-rlhf

Updated Oct 27, 2023
Makefile

HannahKirk / prism-alignment

Star

The Prism Alignment Project

dataset alignment multicultural sociotechnical human-feedback-data human-feedback

Updated Apr 25, 2024
Jupyter Notebook

AlaaLab / pathologist-in-the-loop

Star

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

synthetic-data human-feedback rlhf pathology-images

Updated Oct 19, 2023
Python

gao-g / prelude

Star

Aligning LLM Agents by Learning Latent Preference from User Edits

transformers alignment user-feedback edits interpretability preference-learning gpt4 llm llms human-feedback

Updated May 1, 2024
Python

victor-iyi / rlhf-trl

Star

Reinforcement Learning from Human Feedback with 🤗 TRL

reinforcment-learning human-feedback rlhf

Updated Jun 14, 2023
Python

01Kevin01 / awesome-RLHF-Turkish

Star

A curated list of reinforcement learning with human feedback resources[awesome-RLHF-Turkish] (continually updated)

ai artificial-intelligence turkish-language general-language-model human-feedback rlhf value-alignment awesome-rlhf rlhf-turkish

Updated Apr 27, 2023

Improve this page

Add a description, image, and links to the human-feedback topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the human-feedback topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

human-feedback

Here are 14 public repositories matching this topic...

lucidrains / PaLM-rlhf-pytorch

opendilab / awesome-RLHF

conceptofmind / LaMDA-rlhf-pytorch

wxjiao / ParroT

xrsrke / instructGOOSE

huggingface / data-is-better-together

trubrics / trubrics-sdk

yk7333 / d3po

PKU-Alignment / beavertails

HannahKirk / prism-alignment

AlaaLab / pathologist-in-the-loop

gao-g / prelude

victor-iyi / rlhf-trl

01Kevin01 / awesome-RLHF-Turkish

Improve this page

Add this topic to your repo