Get words from text by grouping together characters from Unicode letter groups (Lu, Ll, Lt, Lm, Lo).
npm isntall uwords
var uwords = require('uwords');
var words = uwords('Привет - Hi');
// words now is [ 'Привет', 'Hi' ]
Returns words from given text by going through the text and grouping together characters that belongs to one the Unicode Letter groups: Lu, Ll, Lt, Lm, Lo.
Run the following command to benchmark the library on your system:
grunt benchmark
Sample output:
mb:uwords alex$ grunt benchmark
Running "benchmark" task
size=10000000 words=2000000 time=1056ms
The words can be extracted from the text by using the extended regular expressions library XRegExp. However, it is 1500x slower.
Run the following command to compare the libraries on your system:
grunt compare-uwords-xregexp
Sample output:
mb:uwords alex$ grunt compare-uwords-xregexp --stack
Running "compare-uwords-xregexp" task
library=uwords size=1000 words=200 time=2ms
library=xregexp size=1000 words=200 time=3384ms
You need to install XRegExp to run this comparison.