Meeting Agenda

2024-11-12

Go over the Apple PR
Discuss license

2024-10-29

Status of Apple code contribution - one more approval
The presentation went well

2024-10-15

Status of Apple code contribution
Finalizing UTW presentation
Gaming localization use case

2024-10-01

Status of Apple code contribution
Please add your name, position etc to the contributor slide (slide 16)

2024-09-17

Discuss state of Apple code contribution
Discuss UTW participation (talk was accepted for 40min session

2024-09-02

Canceled due to Labor Day in USA, no agenda

2024-08-20

Status of Apple code contribution
UTW presentation (discuss abstract, what we want to cover)

2024-08-06

Short agenda after the break

Status of Apple code contribution
UTW presentation (anybody wants to co-present?)

Nebojša submitted a short abstract to the Unicode organizers:

"Noun inflection is an unsolved problem in message formatting/UI and affects 1.7B users from Slavic, Arabic, Hebrew, Indic and other languages. Most companies deploy UI work arounds that don't sound native or lose personalization available to English users.

I would like to evangelize the new Unicode WG effort and attract contributors, both engineers and linguists to help us scale to as many languages as possible."

2024-07-23 CANCELED

Many OOOs, no agenda, see email

2024-07-09 CANCELED

Many OOOs, no agenda, see email

2024-06-25

Covering Apple contribution (from George's email). These are the main parts of the wrapper.

Here are previous presentations that involve this wrapper code.

If Kyle joins we can discuss:

Dictionary & Rules & ML approach
Check if there's a way to attract NLP students to help scale

2024-06-11 (CANCELED - too many OOOs)

Covering Apple contribution (from George's email). These are the main parts of the wrapper.

Here are previous presentations that involve this wrapper code.

If Kyle joins we can discuss:

Dictionary & Rules & ML approach
Check if there's a way to attract NLP students to help scale

2024-05-28

George sent an email about Apple inflection code open-sourcing
Further discussion about FST & ML (LSTMs)
- There is a open source library for FST training
- LSTM approach with >90% accuracy, code & video/paper
Potential contributors from academia (no solid news here)

2024-05-14

Getting month data from Wikidata (thanks Denny)
- Lexemes https://w.wiki/A4ya
- Forms https://w.wiki/A4yg
- Labels https://w.wiki/A4yq
Serbian rules PR to showcase more complex rules
Rule generation using examples
Multiple results from API - some words can inflect in many ways depending on context (can be done with FSTs with weights), but higher level logic needs to decide which one to use

2024-04-30

Go over PRs
Some projects/questions:
- Expand the lexicon - form1: attr1, attr2; form2: attr1, attr3;...
- Investigate pulling Wikidata (script)?
- Use FST model to work with dates in English (CLDR lexicon/dates)
- Add a more complex example using Pynini (Serbian/Russian?)
- An interesting quote from the FST book

"In our opinion, finite-state methods still play a central role in speech and language technologies and are not going away any time soon. At Google, the OpenFst and OpenGrm libraries remain absolutely essential for latency-sensitive applications like voice search, automated captions in YouTube, and the Google Assistant. Many Google engineers and linguists working on speech and language processing specialize in WFST algorithms or grammar development.

While we cannot speak to practices elsewhere in the tech industry, Pusateri et al. (2017) reports that the Apple’s Siri assistant uses finite-state grammars—hybridized with a neural network for inverse normalization, i.e., to convert ASR transcripts to a human-readable form. The powerful Kaldi speech recognition toolkit—widely used by academic researchers uses a WFST decoder, implemented with OpenFst.

Other technologies - including modern neural networks — have begun to encroach on the state of the art for speech technologies, and may ultimately render WFSTs obsolete, but such technologies still struggle to compete on latency, particularly for embedded platforms (e.g., mobile devices) lacking the specialized hardware needed to support large neural networks."

2024-04-16

Go over PRs
Go over next steps, e.g. how to do inflection.

2024-04-02

Introductions
Go over the discussion

2024-03-19

Denny present Wikidata
Review “Issues” and prioritize them

2024-03-07

Introduce members
Discuss operations, e.g. meeting cadence/duration
Discuss goals and non-goals
Go over issues
Discuss repository structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly