Code-Mixed-Dialog

This repository contains the dataset used in our paper : Codeswitched Sentence Creation using Dependency Parsing.

The sentiment data for Hindi and Marathi are provided in the respective directories. We use the code provided in Sub-word-LSTM for training and testing.

Method Overview

Given a sentence we extract independent phrases using Stanford Parser. The independent phrases are translated using Google's On-device NMT and transliterated using Indic trans. The original phrases are replaced by these phrases in such a way that the CMI of the resulting codemixed sentence is maximum.

Dataset generation

We use the test set of Sentiment 140 as the original dataset for generating the codemixed dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Code-Mixed-Dialog

Method Overview

Dataset generation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Code-Mixed-Dialog

Method Overview

Dataset generation