Skip to content

Commit

Permalink
Added new force list to README
Browse files Browse the repository at this point in the history
  • Loading branch information
tfcbertaglia committed Jan 14, 2018
1 parent c2fa0c9 commit 7e93325
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ oq, o que
kk
etc
```
Lines containing a comma will assume that the word after the comma is a forced correction. Other lines will just force the word to be corrected regularly by the normaliser.

### Changing the Tokeniser
By default, the tokeniser used in Enelvo replaces some entities with pre-defined tags. Twitter usernames become ``USERNAME``, numbers (including dates, phone numbers etc) -> ``NUMBER``, URLs -> ``URL``, Twitter hashtags -> ``HASHTAG``, emojis -> ``EMOJI`` etc.
Expand Down

0 comments on commit 7e93325

Please sign in to comment.