Skip to content

Latest commit

 

History

History
111 lines (91 loc) · 21.4 KB

README.md

File metadata and controls

111 lines (91 loc) · 21.4 KB

Awesome talking-head

Welcome to the Awesome List for Talking Head Generation! This curated collection of resources focuses on the intriguing domain of 'Talking Head Generation' - an area of computer graphics and artificial intelligence that strives to create lifelike digital recreations of human heads and faces. These 'talking heads' can be used in a variety of applications, from realistic video content and virtual reality, to advanced communication tools and beyond. This list aims to gather key research papers, state-of-the-art algorithms, seminal GitHub repositories, educational videos, inspiring blogs, and more. Whether you are an AI researcher, computer graphics professional, or an AI enthusiast, this list is your one-stop destination to dive into the world of Talking Head Generation. Happy exploring!

Table of Contents

GitHub projects

  • AudioGPT : Understanding and Generating Speech, Music, Sound, and Talking Head. 🗣️🎵
  • SadTalker : Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. 🎭🎶
  • Thin-Plate-Spline-Motion-Model : Thin-Plate Spline Motion Model for Image Animation. 🖼️
  • GeneFace : Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code. 👤💬
  • CVPR2022-DaGAN : Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation. 👥📹
  • sd-wav2lip-uhq : Wav2Lip UHQ extension for Automatic. 👄
  • Text2Video : ICASSP 2022: "Text2Video: text-driven talking-head video synthesis with phonetic dictionary". 🔤🎞️
  • OTAvatar : This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023]. 👤🎭
  • Audio2Head : Code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021. 🗣️👤
  • IP_LAP : CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors. 🔥🤖
  • Wunjo AI : Synthesize & clone voices in English, Russian & Chinese, real-time speech recognition, deepfake face & lips animation, face swap with one photo, change video by text prompts, segmentation, and retouching. Open-source, local & free. 🗣️👤💬
  • LIHQ : Long-Inference, High Quality Synthetic Speaker (AI avatar/ AI presenter). 🎙️👤
  • Co-Speech-Motion-Generation : Freeform Body Motion Generation from Speech. 🗣️🚶
  • Neural Head Reenactment with Latent Pose Descriptors : The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper. 🤖👤
  • NED : PyTorch implementation for NED (CVPR 2022). It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles. 😃🎭🎥
  • WACV23_TSNet : The pytorch implementation of our WACV23 paper "Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis". 🎬✨
  • ICCV2023-MCNET : The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation. 🎥🤖
  • Speech2Video : Code for ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses". 🗣️🎥💃
  • StyleLipSync : Official pytorch implementation of "StyleLipSync: Style-based Personalized Lip-sync Video Generation". 💋🎥

Articles & Blogs

  • How to Create Fake Talking Head Videos With Deep Learning (Code Tutorial): An article explaining the process of generating fake talking head videos using deep learning techniques.
  • AudioGPT: Understanding and Generating Speech, Music, Sound: A research paper introducing AudioGPT, a multi-modal AI system that can process complex audio information and understand and generate speech, music, sound, and talking head content.
  • Text-based Editing of Talking-head Video: An academic publication discussing the editing of talking-head videos using text-based instructions.
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head: A research paper presenting a system capable of learning personalized talking head models from just a few image views of a person, using adversarial training techniques.
  • DisCoHead: Audio-and-Video-Driven Talking Head Generation: A paper describing DisCoHead, a method that disentangles and controls head pose and facial expressions in talking head generation, without supervision.
  • Microsoft's 3D Photo Realistic Talking Head: A blog post showcasing Microsoft's 3D talking head technology, which combines photorealistic video with a 3D mesh model.
  • Depth-Aware Generative Adversarial Network for Talking Head: A research paper proposing a GAN-based approach that leverages dense 3D facial geometry to generate realistic and accurate talking head videos.
  • Talking-head Generation with Rhythmic Head Motion: This article presents a method for generating realistic talking-head videos with natural head movements, addressing the challenge of generating lip-synced videos while incorporating natural head motion. The proposed approach utilizes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module, resulting in controllable and photo-realistic talking-head videos with natural head movements.
  • Learned Spatial Representations for Few-shot Talking-Head Synthesis: This article introduces a novel approach for few-shot talking-head synthesis by factorizing the representation of a subject into its spatial and style components. The proposed method predicts a dense spatial layout for the target image and utilizes it for synthesizing the target frame, achieving improved preservation of the subject's identity in the source images.
  • Efficient Emotional Adaptation for Audio-Driven Talking-Head: This article proposes the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner. The approach utilizes lightweight adaptations to enable precise and realistic emotion controls, achieving state-of-the-art performance on widely-used benchmarks.
  • High-Fidelity and Freely Controllable Talking Head Video Generation: This article addresses the challenges faced by current methods in generating high-quality and controllable talking-head videos. It introduces a novel model that leverages self-supervised learned landmarks and 3D face model-based landmarks to model the motion, along with a motion-aware multi-scale feature alignment module. The proposed method produces high-fidelity talking-head videos with free control over head pose and expression.
  • Implicit Identity Representation Conditioned Memory Compensation: This article proposes a global facial representation space and a novel implicit identity representation conditioned memory compensation network for high-fidelity talking head generation. The network module learns a unified spatial facial meta-memory bank, which compensates warped source facial features to overcome limitations due to complex motions in the driving video, resulting in improved generation quality.
  • Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head: This article focuses on the task of avatar fingerprinting, which verifies the trustworthiness of rendered talking-head videos. It proposes an embedding that groups the motion signatures of one identity together, allowing the identification of synthetic videos using the appearance of a specific individual driving the expressions.
  • Style Transfer for 2D Talking Head Animation: This article presents a method for generating talking head animation with learnable style references. It reconstructs 2D talking head animation based on a single input image and an audio stream, utilizing facial landmarks motion, style-pattern construction, and a style-aware image generator. The method achieves better results than recent state-of-the-art methods in generating photo-realistic and fidelity 2D animation.
  • One-Shot Free-View Neural Talking-Head Synthesis for Video: This article proposes a neural talking-head video synthesis model that learns to synthesize videos using a source image containing the target person's appearance and a driving video for motion. The model achieves high visual quality and bandwidth efficiency, outperforming competing methods on benchmark datasets.
  • Progressive Disentangled Representation Learning for Fine: This article presents a one-shot talking head synthesis method that achieves disentangled control over lip motion, eye gaze & blink, head pose, and emotional expression. It utilizes a progressive disentangled representation learning strategy to isolate each motion factor, allowing for fine-grained control and high-quality speech and lip-motion synchronization.
  • VideoReTalking: Audio-based Lip Synchronization for Talking Head: This article introduces VideoReTalking, a system for editing real-world talking head videos according to input audio. It disentangles the editing task into face video generation, audio-driven lip-sync, and face enhancement, ultimately producing a high-quality and lip-syncing output video. The system utilizes learning-based approaches in a sequential pipeline, without requiring user intervention.

Online Courses

Research Papers

Tools & Software

Slides & Presentations


This initial version of the Awesome List was generated with the help of the Awesome List Generator. It's an open-source Python package that uses the power of GPT models to automatically curate and generate starting points for resource lists related to a specific topic.