Project: TextSLAM: Visual SLAM with Semantic Planar Text Features
Authors: Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei and Wenxian Yu.
π [Project] β π [Paper] β π₯ [Code] β π§ [Extra Evaluation Tool]
This repository contains TextSLAM-Dataset, the Text-oriented Semantic Dataset.
β TextSLAM-Dataset is A Robust and Expansive Text-oriented Semantic Dataset covering various real-world scenarios, both indoor and outdoor, accompanied by comprehensive ground truth:
- This is the First Text-oriented dataset for SLAM method.
- Cover diverse Indoor and Outdoor scenes, including Rich Scene Texts with various sizes, fonts, languages, and backgrounds.
- Cover real-world complex environments with Rich Semantic Objects and multiple challengings, such as complex occlusion, glass reflection, dynamic pedestrians, and illumination changes.
- Provide Pose and Mapping Ground Truth with high precision.
- Provide Image Retrieval Ground Truth for Day-Night sequence, serving as a valuable resource for Visual Localization tasks.
β Dataset Overview:
- Comprise a total of 36 sequences covering a mix of indoor and outdoor scenes.
- Utilizing the depth-camera Intel RS-D455 for data collection.
- Provide text extraction results within sequences for fair comparisons. The text detection and recognition results in this paper are from AttentionOCR. Note that more advanced text extractors can be integrated if they are available.
- Refer our paper to find the performance of state-of-the-art SLAM algorithms in this dataset.
Our accompanying videos are now available on YouTube (click below images to open) and Bilibili1-outdoor, 2-night, 3-rapid.
β Please consider citing the following papers in your publications if the project helps your works.
@article{li2023textslam,
title={TextSLAM: Visual SLAM with Semantic Planar Text Features},
author={Li, Boying and Zou, Danping and Huang, Yuan and Niu, Xinghan and Pei, Ling and Yu, Wenxian},
booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year={2023}
}
@inproceedings{li2020textslam,
title={TextSLAM: Visual SLAM with Planar Text Features},
author={Li, Boying and Zou, Danping and Sartori, Daniele and Pei, Ling and Yu, Wenxian},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2020}
}
We provide sequences according to their collection scenes respectively. In the Download Table, the 'All' link allows users to download all the data within a single sequence. Additionally, individual item download links ('Images', 'Texts', 'Ground Truth', 'Image List') are provided in the following columns.
ββ‘οΈ Indoor sequences (10 sequences) : BaiduYun Link, Google Link
Refer to yaml/GeneralMotion.yaml
in TextSLAM algorithm and Table-2 in our paper.
β‘οΈ Indoor sequences for loop test (8 sequences): BaiduYun Link, Google Link
Refer to yaml/AIndoorLoop.yaml
in TextSLAM algorithm and Table-4 in our paper.
β‘οΈ Large Indoor sequences for loop test (9 sequences): BaiduYun Link, Google Link
Refer to yaml/LIndoorLoop.yaml
in TextSLAM algorithm and Table-5 in our paper.
β‘οΈ Day Sequences (8 sequences): BaiduYun Link, Google Link
Refer to yaml/Outdoor.yaml
in TextSLAM algorithm and Table-6 in our paper.
β‘οΈ Night Sequences (1 sequence): BaiduYun Link, Google Link
Refer to Figure-23 in our paper.
The structure of each file is as follows:
<Sequence name>
β
βββ <images>
βββββββββ [timestamp].png
βββββββββ .......
βββ <text>
βββββββββ [timestamp]_dete.txt // detection result for [timestamp].png. Each line: u1,v1,u2,v2,u3,v3,u4,v4
βββββββββ [timestamp]_mean.txt // recognition results for [timestamp].png. Each line: meaning, confidence
βββββββββ .......
βββ <Exper.txt> // image list for this sequence
βββ <gt.txt> // Each line (TUM format): timestamp tx ty tz qx qy qz qw
<Intrinsic Parameters> // First line: fx, fy, cx, cy; Second line: k1, k2, p1, p2, k3
For night sequence:
<match_gt.txt> // Each line: [night_image_name].png [matched day_image_name].png
Specifically, the [matched day_image_name].png
is from Seq_02
in the 4_Outdoor
sequences.
TextSLAM-Dataset is licensed under a CC BY-NC-SA 4.0 License, which is released for non-commercial research purpose only.