Rappers

Automatic rap lyric generation tool

Requirements

# pip install beautifulsoup
python getlyrics.py -v > output.tsv

Extract lyrics archive, then run the following command to obtain a file data/juman_input.txt:

python preprocess.py -crawl data/lyrics_shonan_s27_raw.tsv

juman < data/juman_input.txt > data/juman_out.txt

python preprocess.py -juman data/juman_out.txt

The preprocessing step is finished. You will have three files in the /data folder:

string_corpus.txt as a string corpus file for LSTM training (one sentence per line), each song is separated from the previous one by one line
hiragana_corpus.txt as a hiragana corpus file for FFNN training (one sentence per line), each song is separated from the previous one by one line
daihyou_vocab.p file as a vocabulary file (keys correspond to surface forms, values to 代表表記) - this is used to lookup the embeddings during the LSTM training

inv train model

inv test model

run the command below at the directory chainer_model

python train_lstm_lm.py (--gpu 0)

You should use gpu to train (this code is very slow on cpu)

python generate_seq.py --model trained_model -O output_file N 10000

Make term-rhyme table using data/string_corpus.txt and data/hiragana_corpus.txt

python features/make_term_vowel_table.py -v --unknown-terms <path-to-unknown-terms:optional> > <path-to-output-table>

data/term_vowel_table.csv: term to vowel table (each row has term,vowels)
data/unknown_terms.txt: terms that did not have hiragana form in data/hiragana_corpus.txt. Currently they are filtered out from the table above

python NextLine.py -f data/sample_nextline_prediction_candidates.txt

After the processing, you will have the result test_lyrics.txt.

Note: You may need to comment out the lines below in NextLine.py

if __name__ == "__main__":
    ...
            temp.pop(0)
            temp.pop(-1)

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
NNLM		NNLM
api		api
chainer_model		chainer_model
data		data
preprocess		preprocess
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
NextLine.py		NextLine.py
README.md		README.md
feature_extract.py		feature_extract.py
feature_extract.sh		feature_extract.sh
getlyrics.py		getlyrics.py
make_features.py		make_features.py
preprocess.py		preprocess.py
rhyme.py		rhyme.py
utils.py		utils.py