From d983be8fee5d377cfd4b09b54927dc2296183f38 Mon Sep 17 00:00:00 2001 From: Jack-ZC8 <73177056+Jack-ZC8@users.noreply.github.com> Date: Mon, 3 Jun 2024 16:53:40 +0800 Subject: [PATCH] Update index.html --- index.html | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/index.html b/index.html index f49515c..7a4633c 100644 --- a/index.html +++ b/index.html @@ -100,24 +100,14 @@

Paper - - - - - - - arXiv - - - + - Code + Github @@ -142,6 +132,16 @@

Demo + + + + +

🤖

+
+ Benchmarks +
+
@@ -155,7 +155,7 @@

Abstract

-
+

Publishing open-source academic video recordings is an emergent and prevalent approach to sharing knowledge online. Such videos carry rich multimodal information including speech, the facial and body movements @@ -344,15 +344,15 @@

Conclusion

-
+

- We release the Multimodal, Multigenre, and Multipurpose Audio-Visual Dataset with Academic Lectures - (🎓M3AV) covering a range of academic fields. This dataset contains manually annotated + We release the Multimodal, Multigenre, and Multipurpose Audio-Visual Dataset with Academic Lectures + (🎓M3AV) covering a range of academic fields. This dataset contains manually annotated speech transcriptions, slide text, and additional extracted papers, providing a basis for evaluating AI models for recognizing multimodal content and understanding academic knowledge. We detail the creation pipeline and conduct various analyses of the dataset. Furthermore, we build benchmarks and - conduct experiments around the dataset. We find there is still large room for existing models to - improve perceptions and understanding of academic lecture videos. + conduct experiments around the dataset. We find there is still large room for existing models to + improve perceptions and understanding of academic lecture videos.