Utterance stacker: Quick improvements #1431

kayaulai · 2023-06-13T15:34:13Z

Currently, lines with no actual words are all put in one gigantic Utterance. I suggest simply removing this Utterance.
Very large gapUnits, say above 5, should be disallowed. Currently, there are some ridiculous gapUnits produced by the Utterance stacker, which is difficult to detect without looking at the gapUnits (because a naive annotator might just assume those lines got assigned to different utterances).

JWD: See detailed comments below.

johnwdubois · 2023-06-15T22:43:24Z

My suggestion: (see #1446 )

Keep the current utterance concatenation rule (which concatenates successive units by the same speaker into one utterance), with the following exceptions:

follow the concatenation rule as long as all units in a sequence are verbal (unitType = verbal), but NOT when they are non-verbal --see below
gapUnits < 6 (Otherwise, start a new utterance)

Classify units as {verbal, laugh, pause, vocalism, annotation, other}.

If a unit contains at least one word (kind = word), then unitType = verbal
Else, if it contains a laugh, then unitType = laugh
Else, if it contains a pause or in-breath (or both), then unitType = pause
Else, if it contains a vocalism, then unitType = vocalism
Else, if it contains ONLY annotation (e.g. transcriber's comments, glosses, etc.), then unitType = annotation
Else, unitType = other

Assign utteranceType based on the unitType:

if all units in an utterance are verbal (unitType = verbal), then utteranceType = verbal

If a unit is nonverbal (not all utteranceType != verbal), then

if the next unit by the same participant has the same utteranceType, and gapUnits = 0, then extend the utterance to include it, and assign utteranceType to be the same as its component unitType value(s)
if the the next unit has a different utteranceType, end the utterance, and assign utteranceType to be the same as its component unitType value(s)
(see Utterance stacker algorithm #1446 )

kayaulai · 2023-06-20T23:38:07Z

I'm uncertain about using kind = word, because I fear that will make the stacker too SBC-specific.

johnwdubois · 2023-07-02T09:21:27Z

Point taken. Still, reference to "kind = word" is just one way to describe the algorithm/pseudocode.
The same effect can be gotten by writing a little routine that does the same thing (presumably with a higher error rate, but all you really need is to recognize one word per IU to get the main benefit.
(see #1446 )

kayaulai added the enhancement New feature or request label Jun 13, 2023

kayaulai self-assigned this Jun 13, 2023

johnwdubois added this to To do in Core via automation Jun 13, 2023

johnwdubois added polish This issue is about a small item of polishing, mainly aesthetics. and removed enhancement New feature or request labels Jun 13, 2023

johnwdubois moved this from To do to In progress in Core Jun 13, 2023

johnwdubois added enhancement New feature or request and removed polish This issue is about a small item of polishing, mainly aesthetics. labels Jun 15, 2023

johnwdubois moved this from In progress to To do in Core Jun 19, 2023

johnwdubois mentioned this issue Apr 18, 2024

Utterance stacker algorithm #1446

Open

johnwdubois changed the title ~~Quick improvements to Utterance stacker~~ Utterance stacker: Quick improvements Apr 18, 2024

johnwdubois mentioned this issue Apr 18, 2024

Utterance (a.k.a. prosodic sentence) as unit or stack #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utterance stacker: Quick improvements #1431

Utterance stacker: Quick improvements #1431

kayaulai commented Jun 13, 2023 •

edited by johnwdubois

Loading

johnwdubois commented Jun 15, 2023 •

edited

Loading

kayaulai commented Jun 20, 2023

johnwdubois commented Jul 2, 2023 •

edited

Loading

Utterance stacker: Quick improvements #1431

Utterance stacker: Quick improvements #1431

Comments

kayaulai commented Jun 13, 2023 • edited by johnwdubois Loading

johnwdubois commented Jun 15, 2023 • edited Loading

kayaulai commented Jun 20, 2023

johnwdubois commented Jul 2, 2023 • edited Loading

kayaulai commented Jun 13, 2023 •

edited by johnwdubois

Loading

johnwdubois commented Jun 15, 2023 •

edited

Loading

johnwdubois commented Jul 2, 2023 •

edited

Loading