Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple <anyElement> results in invalid RELAX NG schema #631

Closed
sydb opened this issue Oct 5, 2023 · 4 comments
Closed

multiple <anyElement> results in invalid RELAX NG schema #631

sydb opened this issue Oct 5, 2023 · 4 comments
Assignees
Labels

Comments

@sydb
Copy link
Member

sydb commented Oct 5, 2023

Another bug that has probably been around for years if not decades and never been noticed. If you have more than 1 <anyElement> in a content model the resulting RELAX NG is invalid.

I will post an ODD file (bundled with the output of teitorelaxng (soon to be teitorng) run on that ODD, as Github makes me ZIP it, anyway) that demonstrates the problem on this ticket shortly.

I think the problem is entirely contained in odd2relax.xsl. It generates an <rng:define> for each <anyElement> that occurs in the compiled ODD input. Then, in the cleanup phase (in "pass3"), I think it tries to delete some of the <rng:define>s that have just been created — presumably all but one. But in this case it deletes them all.

This is some very convoluted, uncommented code. I am still trying to wrap my head around the "max( ancestor::rng:div )" part. An <rng:div> has no effect on validation, so why is it important to delete only the <rng:define> that is most deeply nested in <rng:div>s, and how does that ensure there is only 1 resulting <rng:define>?

In any case, this problem is sort of blocking progress on #627, not because it is interfering with writing the code for fixing #627, but because it makes it a pain in the neck to test said code.

Several possible solutions jump to mind, but until I understand at least what the current Stylesheets are doing, better still why, I doubt I could really endorse any of them, let alone one over the others:

  1. alter "pass3" to delete all <rng:define>s for the given pattern, then put one back
  2. change code so that only 1 <rng:define> per <anyElement>-inside-given-<elementSpec>-or-<macroSpec> is generated in the first place. (This should not be hard to do — change //tei:anyElement to //*[tei:anyElement] in both places, and change template "anyElement" to match. But who knows what else that might break? Might be a problem because different occurrences of <anyElement> in the same content model might have different @require or @except, no?)
  3. change code so that a separate <rng:define> with a different pattern name is generated for each and every occurrence of <anyElement>. (This should not be hard to do — change definition of $id in "anyElement" template from concat('anyElement-',$spec/@ident) to concat('anyElement-', generate-id() ). But changing the references to each such pattern might be hard; probably not, but I am nto sure. And in any case, who knows what else that might break?)

OK, having written that out long hand I take back my previous statement about not knowing which is best. Seems to me at the moment that (3) is probably the only viable solution. Otherwise how would different @require and @except values in the same content model work?

BTW, I may well continue work on #627 (since it is causing problems for the WWP) first, even with convoluted testing procedures, before paying more attention to this. Thus I have not assigned this issue to a milestone.

@sydb
Copy link
Member Author

sydb commented Oct 5, 2023

@martindholmes
Copy link
Contributor

@sydb How does the ATOP PLODD processing do on this? Same problem?

@sydb
Copy link
Member Author

sydb commented Oct 5, 2023

Did not even look. But no reason to think it has same problem, as code base is entirely different.
I have implemented a fix (using (3)), but have to adjust testing to match.
I am still working on Test/; I suspect the change to Test2/ will be easy, just a tweak to cleanForDiff.xsl.

@sydb sydb self-assigned this Oct 6, 2023
@sydb
Copy link
Member Author

sydb commented Oct 7, 2023

I have submitted a PR that I believe fixes this bug. It is like solution (3) proposed above, but I used concat('anyElement_',$spec/@ident,'_', [COUNT] ) as the @name of each pattern, where [COUNT] is the sequential number of this particular <anyElement> within the input.

I found this very hard to do (it took the better part of 2 days) in large part because I took some less-than-brilliant paths and detours, but also in large part because it turns out that code that generates <rng:ref>s that refer to these patterns occurs not only in odds/odd2relax.xsl, but also in odds/teiodds.xsl. Sigh.

sydb added a commit that referenced this issue Oct 8, 2023
… untested because of #631. This comit is *not* ready for prime time.
raffazizzi added a commit that referenced this issue Nov 9, 2023
Separate pattern for every instance of <anyElement> — fixes #631 (I hope)
@HelenaSabel HelenaSabel added this to the Release 7.56.0 milestone Nov 10, 2023
@raffazizzi raffazizzi added this to the Release 7.56.0 milestone Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants