Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there ANYWAY, to make this FAST, FASTER? And to use GPU, Pytorch, Batchj size etc? #202

Open
AIhasArrived opened this issue Jan 29, 2024 · 6 comments

Comments

@AIhasArrived
Copy link

Thanks.
I really need to make it faster please.

@mitalpatel-investing
Copy link

I second this question. I have a video 30 seconds in length and 4K in resolution and it took 5 hours. I have a decent computer so im not sure what else can be done to optimize it. I also should add the resulting video was in 4K but the subject in the video was pixelated and not actually in high quality.

@johndpope
Copy link

throw the code in chatgpt - ask it.

@mitalpatel-investing
Copy link

mitalpatel-investing commented Feb 23, 2024

throw the code in chatgpt - ask it.

Unfortunately ChatGPT is useless here to be able to understand what exactly is going on, and no amount of dumping the code into ChatGPT works either

@johndpope
Copy link

johndpope commented Feb 23, 2024

start with html
https://ar5iv.labs.arxiv.org/html/2211.14758
feed into chatgpt to provide context (just copy and paste 100% of code/ headers / footers / guff)

https://chat.openai.com/share/ffc38f7d-38d9-42f8-acb8-e1dd9cc94b90

I then provide the inference.py code - stating - (I will provide code and ask question in next prompt)


Identifying the slowest areas and suggesting improvements with newer technologies requires analyzing the script's key components:

Face Detection and Enhancement:

Slow Area: Enhancing faces using GPEN and GFPGAN can be slow, especially if processing high-resolution images or videos with many frames.
Improvements: Utilizing more efficient face enhancement models that are optimized for speed without significantly compromising quality. Employing models that are specifically designed for faster inference (e.g., lightweight GANs) could help. Additionally, optimizing the model inference on the GPU (if not already done) can significantly reduce processing time.
3DMM Extraction and Face Alignment:

Slow Area: 3D face reconstruction and alignment can be computationally expensive, especially if done frame-by-frame for a video.
Improvements: Implementing more efficient algorithms for 3D face reconstruction that require fewer computational resources. Machine learning models that have been optimized for speed and lower resource consumption could be used. Also, processing frames in batches rather than individually can improve speed.
Expression Control with GANimation:

Slow Area: Manipulating facial expressions for each frame can be time-consuming.
Improvements: Using newer, more efficient GAN models that are optimized for faster inference. Applying expression edits on keyframes and interpolating for intermediate frames could also be a strategy to reduce computation.
Audio Processing for Lip-Sync:

Slow Area: Converting audio into a suitable format and processing it for lip-sync can be slow, particularly for longer audio clips.
Improvements: Using more efficient audio processing libraries or optimizing the existing pipeline can help. Also, analyzing audio in chunks parallelly (if not already being done) might speed up the process.
Video Frame Processing:

Slow Area: The process of editing video frames based on audio input and enhancing features for each frame is resource-intensive.
Improvements: Implementing parallel processing techniques where frames are processed simultaneously rather than sequentially. Also, using faster algorithms or more efficient neural network models for video processing tasks can be beneficial.
Use of FFmpeg:

Slow Area: Subprocess calls to external tools like FFmpeg can introduce overhead.
Improvements: Minimizing the use of subprocess calls and instead using integrated libraries that can perform the same tasks internally within the Python environment.
Hardware Acceleration:

General Improvement: Leveraging GPU acceleration for both model inference and video/audio processing tasks can significantly reduce processing time. Ensuring that the script is optimized to make full use of available GPU resources can yield substantial speed improvements.
Model Optimization:

General Improvement: Employing model quantization and pruning techniques to reduce the size of the neural network models without significantly affecting their performance. Smaller models typically infer faster.
In summary, the key to improving the performance of such a script lies in optimizing the models for speed, employing efficient algorithms, parallel processing, and making full use of available hardware acceleration.

asking it to optimize code yields a few pointers -

Screenshot 2024-02-23 at 12 33 13 pm

UPDATE:

just searched through github and found this
@AIhasArrived https://github.com/Zz-ww/SadTalker-Video-Lip-Sync

@cparello
Copy link

I second this question. I have a video 30 seconds in length and 4K in resolution and it took 5 hours. I have a decent computer so im not sure what else can be done to optimize it. I also should add the resulting video was in 4K but the subject in the video was pixelated and not actually in high quality.

I have a i9 and a nice 4080, its like 20 mins i think. But to get good output you have to edit the code to use the 1024 or 2048 model

@savleharshad
Copy link

savleharshad commented Feb 29, 2024

can we can you ONNX, TensorRT to improve time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants