TopicDics

TopicDisc is an unsupervised framework for jointly modeling topic content and discourse behavior in microblog conversations.

More details can be referred to:

Jichuan Zeng, Jing Li, Yulan He, Cuiyun Gao, Michael Lyu, Irwin King. What You Say and How You Say it: Joint Modeling of Topics andDiscourse in Microblog Conversations. TACL 2019.

Input Data

TREC-11 and TWT-16 Dataset are in data/twitter-conv/. We remove the text of each tweet according to Twitter's policy. One needs to download the tweets with the given tweet ID by Twitter API. All the Tweets text should be put in text_lst field of twitter_m.json accordingly.

For example:

"meta_lst": [
        {
            "id": "816147858076241920"
        },
        {
            "id": "816175837737123841"
        },
        {
            "id": "816190727323459584"
        },
        {
            "id": "816192396945924096"
        }
            ],
"text_lst": [
        "<hash> When you realize TransCult thinks Radical Feminism is more \"degenerate\" than the KKK &amp; NeoNazis.\u2026 <url>", 
        "<men> Kind of interesting trans cult is using KKK/Nazi rhetoric now -- worried that their bullying is only getting them so far? \ud83d\ude02", 
        "<men> <men> LOL. That's so funny.  This must really tap into some primitive fear. Witches zooming off to cavort with satan\ud83d\ude02", 
        "<men> <men> &lt;cackles&gt; \ud83d\ude02"
]

Usage

After download all the tweets, save the data file as 'twitter.json'.

You can run the main code as:

$ python topic_disc.py

You can visualize the topic and discourse distribution among words by enabling --output_vis True.

$ python topic_disc.py --output_vis True
$ cd vis/
$ python vis_atten.py   # you should put the gen_sample.txt to this folder.

The visualization looks like this:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/twitter-conv		data/twitter-conv
dataset		dataset
models		models
vis		vis
LICENSE		LICENSE
README.md		README.md
case_disc.png		case_disc.png
criterions.py		criterions.py
engine.py		engine.py
gen_utils.py		gen_utils.py
topic_disc.py		topic_disc.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TopicDics

Input Data

Usage

About

Releases

Packages

Languages

License

zengjichuan/Topic_Disc

Folders and files

Latest commit

History

Repository files navigation

TopicDics

Input Data

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages