Awesome talking-head

Welcome to the Awesome List for Talking Head Generation! This curated collection of resources focuses on the intriguing domain of 'Talking Head Generation' - an area of computer graphics and artificial intelligence that strives to create lifelike digital recreations of human heads and faces. These 'talking heads' can be used in a variety of applications, from realistic video content and virtual reality, to advanced communication tools and beyond. This list aims to gather key research papers, state-of-the-art algorithms, seminal GitHub repositories, educational videos, inspiring blogs, and more. Whether you are an AI researcher, computer graphics professional, or an AI enthusiast, this list is your one-stop destination to dive into the world of Talking Head Generation. Happy exploring!

GitHub projects

AudioGPT : Understanding and Generating Speech, Music, Sound, and Talking Head. 🗣️🎵
SadTalker : Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. 🎭🎶
Thin-Plate-Spline-Motion-Model : Thin-Plate Spline Motion Model for Image Animation. 🖼️
GeneFace : Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code. 👤💬
CVPR2022-DaGAN : Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation. 👥📹
sd-wav2lip-uhq : Wav2Lip UHQ extension for Automatic. 👄
Text2Video : ICASSP 2022: "Text2Video: text-driven talking-head video synthesis with phonetic dictionary". 🔤🎞️
OTAvatar : This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023]. 👤🎭
Audio2Head : Code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021. 🗣️👤

IP_LAP : CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors. 🔥🤖
Wunjo AI : Synthesize & clone voices in English, Russian & Chinese, real-time speech recognition, deepfake face & lips animation, face swap with one photo, change video by text prompts, segmentation, and retouching. Open-source, local & free. 🗣️👤💬
LIHQ : Long-Inference, High Quality Synthetic Speaker (AI avatar/ AI presenter). 🎙️👤
Co-Speech-Motion-Generation : Freeform Body Motion Generation from Speech. 🗣️🚶
Neural Head Reenactment with Latent Pose Descriptors : The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper. 🤖👤
NED : PyTorch implementation for NED (CVPR 2022). It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles. 😃🎭🎥
WACV23_TSNet : The pytorch implementation of our WACV23 paper "Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis". 🎬✨
ICCV2023-MCNET : The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation. 🎥🤖
Speech2Video : Code for ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses". 🗣️🎥💃
StyleLipSync : Official pytorch implementation of "StyleLipSync: Style-based Personalized Lip-sync Video Generation". 💋🎥

Articles & Blogs

How to Create Fake Talking Head Videos With Deep Learning (Code Tutorial): An article explaining the process of generating fake talking head videos using deep learning techniques.
AudioGPT: Understanding and Generating Speech, Music, Sound: A research paper introducing AudioGPT, a multi-modal AI system that can process complex audio information and understand and generate speech, music, sound, and talking head content.
Text-based Editing of Talking-head Video: An academic publication discussing the editing of talking-head videos using text-based instructions.
Few-Shot Adversarial Learning of Realistic Neural Talking Head: A research paper presenting a system capable of learning personalized talking head models from just a few image views of a person, using adversarial training techniques.
DisCoHead: Audio-and-Video-Driven Talking Head Generation: A paper describing DisCoHead, a method that disentangles and controls head pose and facial expressions in talking head generation, without supervision.
Microsoft's 3D Photo Realistic Talking Head: A blog post showcasing Microsoft's 3D talking head technology, which combines photorealistic video with a 3D mesh model.
Depth-Aware Generative Adversarial Network for Talking Head: A research paper proposing a GAN-based approach that leverages dense 3D facial geometry to generate realistic and accurate talking head videos.
Talking-head Generation with Rhythmic Head Motion: This article presents a method for generating realistic talking-head videos with natural head movements, addressing the challenge of generating lip-synced videos while incorporating natural head motion. The proposed approach utilizes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module, resulting in controllable and photo-realistic talking-head videos with natural head movements.
Learned Spatial Representations for Few-shot Talking-Head Synthesis: This article introduces a novel approach for few-shot talking-head synthesis by factorizing the representation of a subject into its spatial and style components. The proposed method predicts a dense spatial layout for the target image and utilizes it for synthesizing the target frame, achieving improved preservation of the subject's identity in the source images.
Efficient Emotional Adaptation for Audio-Driven Talking-Head: This article proposes the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner. The approach utilizes lightweight adaptations to enable precise and realistic emotion controls, achieving state-of-the-art performance on widely-used benchmarks.
High-Fidelity and Freely Controllable Talking Head Video Generation: This article addresses the challenges faced by current methods in generating high-quality and controllable talking-head videos. It introduces a novel model that leverages self-supervised learned landmarks and 3D face model-based landmarks to model the motion, along with a motion-aware multi-scale feature alignment module. The proposed method produces high-fidelity talking-head videos with free control over head pose and expression.
Implicit Identity Representation Conditioned Memory Compensation: This article proposes a global facial representation space and a novel implicit identity representation conditioned memory compensation network for high-fidelity talking head generation. The network module learns a unified spatial facial meta-memory bank, which compensates warped source facial features to overcome limitations due to complex motions in the driving video, resulting in improved generation quality.
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head: This article focuses on the task of avatar fingerprinting, which verifies the trustworthiness of rendered talking-head videos. It proposes an embedding that groups the motion signatures of one identity together, allowing the identification of synthetic videos using the appearance of a specific individual driving the expressions.
Style Transfer for 2D Talking Head Animation: This article presents a method for generating talking head animation with learnable style references. It reconstructs 2D talking head animation based on a single input image and an audio stream, utilizing facial landmarks motion, style-pattern construction, and a style-aware image generator. The method achieves better results than recent state-of-the-art methods in generating photo-realistic and fidelity 2D animation.
One-Shot Free-View Neural Talking-Head Synthesis for Video: This article proposes a neural talking-head video synthesis model that learns to synthesize videos using a source image containing the target person's appearance and a driving video for motion. The model achieves high visual quality and bandwidth efficiency, outperforming competing methods on benchmark datasets.
Progressive Disentangled Representation Learning for Fine: This article presents a one-shot talking head synthesis method that achieves disentangled control over lip motion, eye gaze & blink, head pose, and emotional expression. It utilizes a progressive disentangled representation learning strategy to isolate each motion factor, allowing for fine-grained control and high-quality speech and lip-motion synchronization.
VideoReTalking: Audio-based Lip Synchronization for Talking Head: This article introduces VideoReTalking, a system for editing real-world talking head videos according to input audio. It disentangles the editing task into face video generation, audio-driven lip-sync, and face enhancement, ultimately producing a high-quality and lip-syncing output video. The system utilizes learning-based approaches in a sequential pipeline, without requiring user intervention.

Online Courses

Video Production: You Can Make Simple Talking Head Video | Udemy 🎥: This Video Production course teaches you how to create professional and engaging talking head videos.
The Complete Talking Head Video Production Masterclass | Udemy 🎥: Dive into the world of talking head video production with this comprehensive and in-depth course.
Video Production - Inexpensive Talking Head Video - Business | Udemy 🎥: Learn how to create simple and effective talking head videos for various business communication needs.
How to Create a Talking Head Video | Udemy 🎥: A beginner-friendly course that covers the technicalities of filming and creating talking head videos.

Research Papers

Talking-Heads Attention: Introducing "talking-heads attention," a variation on multi-head attention that improves language modeling and comprehension tasks.
Few-Shot Adversarial Learning of Realistic Neural Talking Head: Presenting a system that enables the generation of realistic talking head models from a few image views of a person.
MakeItTalk: Speaker-Aware Talking-Head Animation: Proposing a method for generating expressive talking heads from a single facial image with audio as the only input.
StyleTalk: One-shot Talking Head Generation with Controllable: Introducing a framework for generating one-shot talking heads with diverse personalized speaking styles.
DiffTalk: Crafting Diffusion Models for Generalized: Modeling talking head generation as an audio-driven denoising process using Latent Diffusion Models.
One-Shot High-Fidelity Talking-Head Synthesis with Deformable: Proposing a method for generating high-fidelity talking heads by employing explicit 3D structural representations.
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head: Presenting a framework for generating high-fidelity talking head videos directly from input audio using neural scene representation networks.
Depth-Aware Generative Adversarial Network for Talking Head: Introducing a self-supervised geometry learning method and leveraging dense 3D facial geometry for accurate talking head video generation.
What comprises a good talking-head video generation?: A Survey: This paper presents a benchmark and evaluation metrics for talking-head video generation, addressing the limitations of subjective evaluations. It explores desired properties such as identity preservation, lip synchronization, high video quality, and natural-spontaneous motion.
Text-based Editing of Talking-head Video: The paper proposes a method to edit talking-head videos based on their transcript, allowing for modifications in speech content while maintaining a seamless audio-visual flow. It utilizes annotations of facial features and a parametric face model for realistic video output.
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation: This work introduces an identity-preserving talking head generation framework that utilizes dense landmarks for accurate geometry-aware flow fields. It also proposes adaptive fusion of source identity during synthesis and a fast adaptation model using meta-learning for personalized fine-tuning.
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis: The paper introduces Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis, which generalizes to unseen identities with limited training data. It conditions face radiance field on 2D appearance images, allowing flexible adjustment to new identities with few reference images.
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motions: This work proposes an audio-driven talking-head method that produces photo-realistic videos from a single reference image. It addresses challenges of producing natural head motions matching speech prosody while maintaining appearance during large head motions. It utilizes a head pose predictor and a motion field generator.
Efficient Emotional Adaptation for Audio-Driven Talking-Head Synthesis: The paper introduces the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones through parameter-efficient adaptations. It utilizes adaptations from different perspectives to enable realistic emotion controls.
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion: This paper introduces a framework for one-shot audio-driven talking head generation using a probabilistic approach. It probabilistically generates facial motions matching input audio while maintaining audio-lip synchronization and overall photo-realism. It avoids the need for additional driving sources for controlled synthesis.
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing: The paper proposes a neural talking-head video synthesis model for video conferencing. It utilizes a source image and a driving video to synthesize the talking-head video. It outperforms competing methods on benchmark datasets and enables video conferencing with high visual quality and low bandwidth usage.
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Video Generation: This paper presents a text-based talking-head video generation framework that synthesizes facial expressions and head motions according to contextual sentiments and speech rhythm. It consists of a speaker-independent stage and a speaker-specific stage, allowing tailored video synthesis for different individuals.
Style Transfer for 2D Talking Head Animation: The paper introduces a method for generating talking-head animation with learnable style references. It reconstructs 2D animation based on a single input image and an audio stream by extracting facial landmarks motion and incorporating style patterns from reference images.

Tools & Software

LUCIA: Development of a MPEG-4 Talking Head Engine. 💻
Yepic Studio: Create and dub talking head-style videos in minutes without expensive equipment. 🎥
Mel McGee's Talkbots: A complete multi-browser, multi-platform talking head application in SVG suitable for web sites or as an avatar. 🗣️
face3D_chung: Create 3D character avatar head objects with texture from a single photo for your games. 🎮
CrazyTalk: Exciting features for 3D head creation and automation. 🤪
tts avatar free download - SourceForge : Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
Verbatim AI - Product Information, Latest Updates, and Reviews 2023 : A simple yet powerful API to generate AI "talking head" videos in near real-time with Verbatim AI. Add interest, intrigue, and dynamism to your chat bots! (🔧👄)
Best Open Source BASIC 3D Modeling Software : Includes talk3D_chung, a small example using obj models created with face3D_chung, and speak3D_chung_dll, a dll to load and display face3D_chung talking avatars. (🛠️🎭)
DVDStyler / Discussion / Help: ffmpeg-vbr or internal : Talking heads would get a bitrate which is unnecessarily high while using DVDStyler. (🛠️👄)
puffin web browser free download - SourceForge : Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
12 best AI video generators to use in 2023 [Free and paid] | Product ... : Whether you’re an entrepreneur, small business owner, or run a large company, AI video generators make it super easy to create high-quality videos from scratch. (🔧🎥)

Slides & Presentations

(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head Models: Presentation reviewing the few-shot adversarial learning of realistic neural talking head models.
Nethania Michelle's Character | PPT: Presentation discussing the improvement of a 3D talking head for use in an avatar of a virtual meeting room.
Presenting you: Top tips on presenting with Prezi Video – Prezi: Article providing top tips for presenting with Prezi Video.
Research Presentation | PPT: Resident Research Presentation Slide Deck.
Adding narration to your presentation (using Prezi Video) – Prezi ...: Learn how to add narration to your Prezi presentation with Prezi Video.

This initial version of the Awesome List was generated with the help of the Awesome List Generator. It's an open-source Python package that uses the power of GPT models to automatically curate and generate starting points for resource lists related to a specific topic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome talking-head

Table of Contents

GitHub projects

Articles & Blogs

Online Courses

Research Papers

Tools & Software

Slides & Presentations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome talking-head

Table of Contents

GitHub projects

Articles & Blogs

Online Courses

Research Papers

Tools & Software

Slides & Presentations