Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories
Hikaru Asano1 Ryo Yonetani2 Taiki Sekii2 Hiroki Ouchi2,3
1The University of Tokyo 2CyberAgent 3Nara Institute of Science and Technology
INLG 2024
Text2Traj2Text is a learning-by-synthesis framework designed to generate natural language captions that describe the contextual backgrounds of shoppers' trajectory data in retail environments. The framework comprises two primary components:
- Text2Traj: Generates customer behavior descriptions and corresponding trajectory data.
- Traj2Text: Trains a model to convert trajectory data into natural language captions.
We checked the reproducibility under the following environment.
- Operating System: Ubuntu (≥22.04) or macOS
- Docker: Version 24.0.7
- Docker Compose: Version 2.23.1
- CUDA 11.8.0: For GPU support
- Python 3.9+
We recommend using Docker to manage dependencies. Follow the steps below to set up the environment.
If you haven't installed Docker yet, please follow the Docker Installation Guide.
Execute the following command to build and run the Docker container:
docker compose up -d
This command initializes a containerized environment with all necessary dependencies.
Before training, preprocess the data. Place the raw training data in the data
directory and run:
bash scripts/preprocess.sh
To train and evaluate the Traj2Text model, execute the following command:
docker exec text2traj2text python3 scripts/train.py
By default, this command trains the model using the t5-small
architecture with 8 paraphrased datasets.
To fully reproduce our experiments, you will need:
- Access to the Azure OpenAI API (for Text2Traj dataset generation)
- A GPU (for training models like
t5-base
and evaluating with LLaMA)
You can customize the training process by specifying parameters. For example, to train the model using t5-base
with 0 paraphrased data points:
docker exec text2traj2text python3 scripts/train.py train.model_name=t5-base dataset.num_paraphrase=0
API keys are necessary for generating the Text2Traj dataset and evaluating models with LLaMA and OpenAI.
To generate the dataset or run evaluations, follow these steps:
-
Create a
.env
file in the root directory of the project. -
Add the following content to the
.env
file:AZURE_OPENAI_VERSION= AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_API_KEY= HUGGINGFACE_ACCESS_TOKEN=
- The Azure OpenAI API key is required for Text2Traj dataset generation and evaluation with ChatGPT.
- The Hugging Face access token is required for evaluation with LLaMA-2-7b.
To generate the Text2Traj dataset, follow these steps:
-
Generate User Captions:
docker exec text2traj2text python3 scripts/text2traj/generate_user_captions.py
-
Generate Purchase List:
docker exec text2traj2text python3 scripts/text2traj/generate_purchase_list.py
-
Generate Paraphrasing:
docker exec text2traj2text python3 scripts/text2traj/generate_paraphrasing.py
-
Generate Trajectory:
docker exec text2traj2text python3 scripts/text2traj/generate_trajectory.py
To run all the above steps sequentially, use:
bash scripts/generate_user_activity.sh
You can modify parameters such as num_generations
and model_name
directly in the script. For example, to generate 1000 data points using gpt-4o
with a temperature of 0.7:
docker exec text2traj2text python3 scripts/text2traj/generate_user_captions.py num_generations=1000 model_name=gpt-4o temperature=0.7
Generated datasets are stored in the data/raw_data/<project_name>
directory. To specify a different project name:
docker exec text2traj2text python3 scripts/text2traj/generate_user_captions.py project_name=your_project_name
After dataset generation, preprocess it before training:
bash scripts/preprocess.sh your_project_name
To evaluate using GPT series models (e.g., GPT-4, GPT-3.5-turbo):
docker exec text2traj2text python3 scripts/eval_chatgpt.py
To evaluate using open-source language models (e.g., LLaMA-2-7b):
docker exec text2traj2text python3 scripts/eval_llm.py
If you find our work useful in your research, please consider citing:
@inproceedings{asano2024text2traj2text,
title={{Text2Traj2Text}: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories},
author={Hikaru Asano and Ryo Yonetani and Taiki Sekii and Hiroki Ouchi},
booktitle = {Proceedings of the 17th International Natural Language Generation Conference},
year={2024},
pages={289--302},
}