Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard handler for Numbers #334

Open
aaronchantrill opened this issue Apr 4, 2021 · 2 comments
Open

Standard handler for Numbers #334

aaronchantrill opened this issue Apr 4, 2021 · 2 comments

Comments

@aaronchantrill
Copy link
Contributor

Detailed Description

We would like to provide a special keyword for number, as opposed to the plugin-author-defined keywords like {ColorKeyword} or {DayKeyword} because recognizing and parsing a number is both more complex and extremely common. I propose either using square brackets ("[NUMBER]") or colons ("{:NUMBER:}") to distinguish system keywords from plugin keywords. Eventually I would like to have system keywords for Number, Date, and Time and I'm sure others will arise as we work on them.

Context

Right now, it would be difficult for an author to simply ask for a number in the template. For instance "WHAT IS {NumberKeyword} PLUS {NumberKeyword}" would require listing every possible number in the template itself:
NumberKeyword: [ONE, TWO, THREE, ...]
which would be incredibly time consuming, and when parsed into an expanded form would make the template take up as many lines as you added. In addition, there are numerous ways to say each number, so one person might say 'ONE NINE SIX FIVE' another might say 'ONE THOUSAND NINE HUNDRED SIXTY FIVE' another 'NINETEEN SIXTY FIVE' etc. This quickly becomes overwhelming.

Possible Implementation

There are rules for how numbers can be constructed. You might say, for instance "ONE HUNDRED THOUSAND" but you wouldn't say "THOUSAND". Since most language dictionaries are based on trigrams, I should be able to generate a set of trigrams for speaking numbers (ONE, ONE HUNDRED, ONE HUNDRED THOUSAND, SEVENTEEN OH ONE, SEVENTEEN THOUSAND AND, etc) and then insert only a list of words that may appear first or last in a number into the basic template. This should allow the language model to insert the full trigram model in its place.

@aaronchantrill
Copy link
Contributor Author

I've been working on this somewhat at Numbers.

The following are basic number words:
'ZERO',
'ONE',
'TWO',
'THREE',
'FOUR',
'FIVE',
'SIX',
'SEVEN',
'EIGHT',
'NINE',
'TEN',
'ELEVEN',
'TWELVE',
'THIRTEEN',
'FOURTEEN',
'FIFTEEN',
'SIXTEEN',
'SEVENTEEN',
'EIGHTEEN',
'NINETEEN',
'TWENTY',
'THIRTY',
'FORTY',
'FIFTY',
'SIXTY',
'SEVENTY',
'EIGHTY',
'NINETY',
'HUNDRED',
'THOUSAND',
'MILLION',
'BILLION',
'QUADRILLION'

'A' can be considered a number word if directly followed by a number (other than ONE):
'A HUNDRED'

'OH' can be considered a number word if it occurs next to another number
NINE OH TWO ONE OH

'AND' can be considered a number word if it is both preceded and followed by a number word
TWO AND TWENTY
TWO AND A HALF

I think what we need to do is create a special placemarker for numbers, then scan the transcription for number words and replace them with the placemarker before passing the transcription to the intent parser. After the intent parser does its work, the numbers are placed back into a NUMBERS match group which lists all the numbers.

I'm not sure how to handle number associations. For example:
Count to 100 from 1 -- ONE, TWO, THREE, ... , ONE HUNDRED
Count from 100 to 1 -- ONE HUNDRED, NINETY NINE, NINETY EIGHT, ... ONE
Count from 1 to 100 -- ONE, TWO, THREE, ... , ONE HUNDRED
Count to 1 from 100 -- ONE HUNDRED, NINETY NINE, NINETY EIGHT, ... ONE

In this case the order of the numbers and the order of the prepositions both matter and can change the meaning of the command. Unfortunately, I think the plugin author would have to analyze the exact phrase to determine the beginning and ending points of the count.

@aaronchantrill aaronchantrill added Hacktoberfest Small or non-core issues that could be worked on by Hacktoberfest participants Status: Available Type: Enhancement Priority: Low Status: In Progress and removed Hacktoberfest Small or non-core issues that could be worked on by Hacktoberfest participants Status: Available labels Sep 20, 2021
@aaronchantrill aaronchantrill self-assigned this Sep 24, 2021
@aaronchantrill
Copy link
Contributor Author

I'm planning to do slot types now. If you want a numeric value, you would create a template like
"Count from {from:number} to {to:number}"
When we pass the templates to the intent parser, we will replace the {from:number} with just , so the intent parser will see the template "Count from to ".

Then there will be a pre-parser that will look for any groups of number words using regex and put them into the matches for the variant, replacing the original locations with , so we will be passing "Count from to " to the intent parser, which should match the correct template.

Once the template is matched, then the identities of the numbers ("from" and "to") will be looked up and matched to the numbers.

It will be the responsibility of the template author to check that the numbers returned are in the correct range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant