Skip to content

amanjain1397/naamkaran

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

naamkaran

Pseudo-random word generation using random markov process.

Python implementation of foswig.js Allows you to create pseudo-random words based off arbitrary dictionaries using Markov chains.

Getting Started

  • Clone this repo:
git clone https://github.com/amanjain1397/naamkaran.git
cd naamkaran
  • Usage
python main.py --h
usage: main.py [-h] --dataroot DATAROOT [--order ORDER]
               [--minLength MINLENGTH] [--maxLength MAXLENGTH]
               [--allowDuplicates ALLOWDUPLICATES] [--maxAttempts MAXATTEMPTS]
               [--noOfWords NOOFWORDS] [--toLower TOLOWER]
               [--removeSpecial REMOVESPECIAL]
  --maxLength MAXLENGTH
                        The maximum length of the generated word (default: -1)
  --allowDuplicates ALLOWDUPLICATES
                        Whether generated words are allowed to be the same as
                        words in the input dictionary (default: False)
  --maxAttempts MAXATTEMPTS
                        The maximum number of attempts to generate a word
                        before failing and throwing an error (default: 100)
  --noOfWords NOOFWORDS
                        Number of words to be generated (default: 50)
  --toLower TOLOWER     lower case the vocabulary (default: False)
  --removeSpecial REMOVESPECIAL
                        retains only alphabetic chars (default: False)
  --exceptionSymbols EXCEPTIONSYMBOLS
                        retain the string of the symbols, e.g. "@$#" (default: '')

The corpus text file is stored at dataroot. A sample text file of corpus can be found at ./sample.txt.

Working Example

Text files can be found at ./data/. We will use the music band names corpus for the example. The corresponding text file can be found at ./data/band.names.txt.

python main.py --dataroot ./data/band.names.txt --order 4 --minLength 2 --maxLength 12 --toLower 1 --removeSpecial 1 --exceptionSymbols " "
['jackstrees', 'dr johanson', 'cyndicate', 'billy full', 'dave club', 'supertramps', 'kittyhawks', 'the roses', 'sunnymen', 'pete tombs', 'publime', 'tim bucking', 'bobby voices', 'palacement', 'rocked', 'diamones', 'lively darin', 'deeelies', 'the miller', 'the flag', 'spacements', 'soundgren', 'the matthews', 'the clark', 'joe who', 'bachard', 'peter four', 'the club', 'televators', ' manic young', 'dick drake', 'steppelin', 'judas prings', 'graham parks', 'gerry mann', 'bobby voices', 'peter cult', 'the marship', 'creeders', 'jon spectors', 'marians', 'bachard', 'the cochran', 'gene vega', 'the league', 'sonny puppy', 'diamondays', 'rocked', 'happenwolf', 'warrett']

A mixed corpus consisting of words from different sources can also be used.

More Info

Learn about Tries from here.

About

Random word generation using Random Markov Processes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages