Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marks particle as error instead of the preceding Err/Orth of the same mwe #45

Open
duomdaamaendra opened this issue Jan 28, 2022 · 10 comments

Comments

@duomdaamaendra
Copy link
Contributor

duomdaamaendra commented Jan 28, 2022

Skjermbilde 2022-01-28 kl  04 17 25

(↑ is divvun/divvun-gramcheck-web#18 , ↓ is this issue)

@duomdaamaendra
Copy link
Contributor Author

Skjermbilde 2022-01-28 kl  04 25 50

this example is from erroneous word (correct: "-diehtagis"), but the marked part is korrekt: it is the not-enclitical particle "gis"

@snomos
Copy link
Member

snomos commented Jan 28, 2022

In both cases I need the original text to be able to reproduce and debug. The paragraph containing the problem should be enough, maybe even just the sentence.

@duomdaamaendra
Copy link
Contributor Author

Jos dal vel Sámis leat sullasaš dilit go davviriikkain muđuid, de fuobmá árvvoštallamiin goit ovtta erenoamáš ášši mii earuha sámi árvvoštallamiid omd. dáža árvvoštallamiin. Girječálli birra, ja su ođđa girji ovddeš bargguiguin veardádallon, gávnnat hárve sámi árvvoštallamiin. Čiekŋaleabbo dieđu go ahte gos čálli lea riegádan ja gos ássá, gávnnat hárve. Oalle dábálaš lea dákkár diehtu lohkkái: «Mus eai leat obanassii sánitge rámidit nn čehppodaga, dajan dušše ahte áŋgirit ja čeahpit gultturbargi ii gávnna ohcaminge.» (Samefolket 1/89, s. 92). Fuobmá maiddái dán čállosa ovdamearkka vuosttas siiddus: «- rohkkes Láhpoluobbala gollenieida …» Čállái báhcá goit rápmi, jos dal ii čiekŋalit ággaduvvon.

@duomdaamaendra
Copy link
Contributor Author

Jos ohcala siva dasa ahte čálli birra leat uhcán dieđut, de vástádus dáidá leat nu álki go ahte nie šaddá lunddolaččat servodagasgos [116] buohkat dovddadit. Árvvoštalli duhtá dasto daid dábálaš dieđuide mat juo buohkain leat čálli birra. Dás han maiddái lea sáhka dušše ođđa girjjiin, ja otnáš čálliin. Ii árvvoštalli arvva lebbet dieđuid čálli eallimis omd., jos dal vel oaivvildeš ahte leat leamaš váikkuheaddji áššit čálli loahpalaš bargui, go dát sáhttet leat oalle persovnnalaččat ja dan sivas eai gula almmolašvuhtii. Sáhttá gal maiddái árvvoštalli leat oaivvildeamen ahte eat dárbbaš dárkilieabbo dieđuid go mat mis juo buohkain leat. Dán dili bahá ja buriid beliid garvván dán oktavuođas. Muhto dattege, jos vuos dieavaslaččat áigut árvvoštallat sámi girjjiid, de fertet árvvoštaladettiin maiddái ohcalit ja čállit biográfalaš dieđuid, muhto dieđusge dakkár dieđuid mat leat relevánta. Geaid luhtte čálli lea ijastallan maŋemuš jagiid, ja galle luovosmáná sus leat, eai leat eanemus relevánta dieđut. Dattege sáhttet leat čálli birrasis dakkár váikkuheaddji elementtat maid birra sáhtášii leat dehálaš diehtit. Ahte čálli eallimis sáhttet leat váikkuheaddji olbmot, dáhpáhusat ja fearánat mat leat váikkuhan su go girjji čálii, leat girjjálašvuođadiehttagis dohkkehan dutkanveara áššin. Historjjábiográfalaš árvvoštallama vuogis dát lea guovddáš ášši, earret dieđusge teaksta dahje girji maid čálli almmuha.

@snomos snomos changed the title msword marks part of correct word as error MS Word marks part of correct word as error Jan 28, 2022
@duomdaamaendra
Copy link
Contributor Author

the same in Googledocs

@snomos
Copy link
Member

snomos commented Jan 28, 2022

The first case, gavnnat / nn, seems to be the same as this bug. That is, this bug is not restricted to GDocs.

@snomos
Copy link
Member

snomos commented Jan 28, 2022

The second example, the gis bug, is most likely on our end, but needs further investigation.

@duomdaamaendra
Copy link
Contributor Author

@lynnda-hill

@lynnda-hill
Copy link
Contributor

Skjermbilde 2022-01-28 kl 04 25 50

this example is from erroneous word (correct: "-diehtagis"), but the marked part is korrekt: it is the not-enclitical particle "gis"

"<girjjálašvuođadiehtta>"
"girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> #21->21
"girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> #21->21
"girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> #21->21
"girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> #21->21
""
"gis" Pcle <W:0.0> @pcle MAP:22087:r16 &typo #22->22 ADD:10066:Err/Orth-any
"diehtit" V TV Ind Prs Sg3 Err/Orth <W:0.0> SUBSTITUTE:4876 #22->22
typo
"gis" Pcle <W:0.0> @pcle MAP:22087:r16 &typo &SUGGEST #22->22 ADD:10066:Err/Orth-any COPY:10075:Err/
Orth-any
"diehtit" V Ind Prs Sg3 <W:0.0> SUBSTITUTE:4876 #22->22
diehtit+V+Ind+Prs+Sg3#gis+Pcle ?
:
This seems to be the old particle problem again, we should really do something about it

@unhammer unhammer changed the title MS Word marks part of correct word as error Marks particle as error instead of the preceding Err/Orth of the same mwe Mar 10, 2022
@unhammer
Copy link
Contributor

unhammer commented Mar 10, 2022

This is what's going on:

$ echo 'girjjálašvuođadiehttagis' | modes/trace-smegramrelease3-cg.mode 
"<girjjálašvuođadiehttagis>"
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>


$ echo 'girjjálašvuođadiehttagis' | modes/trace-smegramrelease4-mwe-split.mode
"<girjjálašvuođadiehtta>"
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
"<gis>"
        "gis" Pcle <W:0.0> <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876


$ echo 'girjjálašvuođadiehttagis' | modes/trace-smegramrelease.mode 
"<girjjálašvuođadiehtta>"
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
"<gis>"
        "gis" Pcle <W:0.0> <LastCohort> <firstCohort> @PCLE MAP:22090:r16 &typo ADD:10126:Err/Orth-any
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
typo
        "gis" Pcle <W:0.0> <LastCohort> <firstCohort> @PCLE MAP:22090:r16 &typo &SUGGEST ADD:10126:Err/Orth-any COPY:10135:Err/Orth-any
                "diehtit" V <EX-Nom-Ani> Ind Prs Sg3 <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
diehtit+V+Ind+Prs+Sg3#gis+Pcle  ?

Much simplified, we have the following from the analyser:

"<abc>"
	"c" Pcle "<c>"
		"b" V Err/Orth
			"a" N "<ab>"

which cg-mwesplit turns into

"<ab>"
	"a" N
"<c>"
	"c" Pcle
		"b" V Err/Orth

Now the generator gets sent

b+V#c+Pcle

which doesn't give any results. If we could send just b+V, we would get the correct form for that part, but then we'd need an input mark between "<a>" and "<b>" so we got

"<abc>"
	"c" Pcle "<c>"
		"b" V Err/Orth "<b>"
			"a" N "<a>"

or in the original example:

"<girjjálašvuođadiehttagis>"
	"gis" Pcle "<gis>"
		"diehtit" V TV Ind Prs Sg3 Err/Orth "<diehtta>"
			"girjjálašvuohta" N Cmp/SgGen Cmp "<girjjálašvuođa>"

which cg-mwesplit would turn into

"<girjjálašvuođa>"
	"girjjálašvuohta" N Cmp/SgGen Cmp
"<diehtta>"
	"diehtit" V TV Ind Prs Sg3 Err/Orth
"<gis>"
	"gis" Pcle

At least that's one possibility – I have no idea how hard that would be to do on the lexicon side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants