diff --git a/index.html b/index.html index f49515c..7a4633c 100644 --- a/index.html +++ b/index.html @@ -100,24 +100,14 @@

Paper - - - - - - - arXiv - - - + - Code + Github @@ -142,6 +132,16 @@

Demo + + + + +

🤖

+
+ Benchmarks +
+
@@ -155,7 +155,7 @@

Abstract

-
+

Publishing open-source academic video recordings is an emergent and prevalent approach to sharing knowledge online. Such videos carry rich multimodal information including speech, the facial and body movements @@ -344,15 +344,15 @@

Conclusion

-
+

- We release the Multimodal, Multigenre, and Multipurpose Audio-Visual Dataset with Academic Lectures - (🎓M3AV) covering a range of academic fields. This dataset contains manually annotated + We release the Multimodal, Multigenre, and Multipurpose Audio-Visual Dataset with Academic Lectures + (🎓M3AV) covering a range of academic fields. This dataset contains manually annotated speech transcriptions, slide text, and additional extracted papers, providing a basis for evaluating AI models for recognizing multimodal content and understanding academic knowledge. We detail the creation pipeline and conduct various analyses of the dataset. Furthermore, we build benchmarks and - conduct experiments around the dataset. We find there is still large room for existing models to - improve perceptions and understanding of academic lecture videos. + conduct experiments around the dataset. We find there is still large room for existing models to + improve perceptions and understanding of academic lecture videos.