You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The utterance stacker produces some bad utterances. In some cases they are way too long. (See #1431 )
Proposal
Keep the current utterance concatenation rule (which concatenates successive units by the same speaker into one utterance), with the following exceptions:
follow the concatenation rule as long as all units in a sequence are verbal (unitType = verbal), but NOT when they are non-verbal --see below
gapUnits < 6 (Otherwise, start a new utterance)
Classify units as {verbal, laugh, pause, vocalism, annotation, other}.
If a unit contains at least one word (kind = word), then unitType = verbal
Else, if it contains a laugh, then unitType = laugh
Else, if it contains a pause or in-breath (or both), then unitType = pause
Else, if it contains a vocalism, then unitType = vocalism
Else, if it contains ONLY annotation (e.g. transcriber's comments, glosses, etc.), then unitType = annotation
Else, unitType = other
Assign utteranceType based on the unitType:
if all units in an utterance are verbal (unitType = verbal), then utteranceType = verbal
If a unit is nonverbal (not all utteranceType != verbal), then
if the next unit by the same participant has the same utteranceType, and gapUnits = 0, then extend the utterance to include it, and assign utteranceType to be the same as its component unitType value(s)
if the the next unit has a different utteranceType, end the utterance, and assign utteranceType to be the same as its component unitType value(s)
If the corpus transcription data lacks annotation for one or more of features referenced above, create an automatic classification algorithm that achieves the same effect. For example, replace:
kind=word => algorithm that tests for the presence/absence of alphabetic characters (or other strategy, depending on the language)
kind=laugh => algorithm that tests for the presence of @ sign, or another user-specified symbol for laughter
etc.
The text was updated successfully, but these errors were encountered:
The utterance stacker produces some bad utterances. In some cases they are way too long. (See #1431 )
Proposal
The text was updated successfully, but these errors were encountered: