-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are the n attribute values for ? #46
Comments
The "n" marks the hierarchy of the disjunctive cantillation marks in the verse. Its use is demonstrated in our Verse demo: OshbVerse Demo. We have a new version of the demo in the main repository, just waiting some technical details to be worked out, so we can deploy it. |
That might make sense to you, but what the n values actually mean is no clearer from your answer or from the demo. The demo didn't seem to be obviously related to the n values for each word. For newcomers to the repository, I think you need a much more detailed description of what these mean and how they were actually generated. |
Thinking further, it might seem worthwhile to consider if the SWORD API might be enhanced to make use of the n attribute. Much work for just one module? That's as may be. One thing is clear. We would need to understand its function first. |
There is a much fuller description of the "n" attributes and the hierarchy of the verse they represent, under structure. The identification of the cantillation marks, in the popups of the demo, is another example of the usage of the "n". SWORD does have a UTF8Cantillation option already, so it may be useful in that context. |
The n attribute is already assigned in SWORD for marking enumerating words. This would mean that any module using the n attribute for a different function could not at the same time support the enumerated words feature. i.e. While OSHB uses n for accents structure, it could not also be enhanced to enumerate words. |
The SWORD feature
merely strips the cantillation points from the Hebrew text. |
A lot of TEI vocabularies use the Is there a better way to encode cantillation hierarchy in OSIS? |
Is there a better way to encode cantillation hierarchy in OSIS?In theory, cantillation marks and vowel accents in Biblical Hebrew ought to be the sole domain of Unicode Normalisation rather than something to be implemented in XML. The issue was that "Unicode Normalisation breaks Biblical Hebrew", as Peter Kirk described in detail in the SBL Hebrew Font Manual. His proposal was to define a custom normalisation for Biblical Hebrew - one which does not change the order of Hebrew diacritics providing they were keyed in the same order as they were in the earliest digitisation of the Hebrew Bible made before Unicode was developed. (ie. the Michigan-Claremont encoding of the Westminster Electronic Hebrew Bible by Alan Groves). NB. BabelPad - (a Unicode text editor for Windows developed by Andrew West) - supports such a custom normalisation of Hebrew. This feature was added at my suggestion about seven years ago. This reply may seem to be tangential to the context of XML schema and encoding cantillation by means of attributes, but it does address the more fundamental underlying issue (assuming I've understood the question correctly). |
There just might be a better way to do that now, at least according to the Unicode Consortium. I have just started looking into this, though, and I may not understand this correctly. According to this FAQ, a former problem was fixed:
It looks like they are using CGJs for this purpose at Tanch.us. According to their Change Log:
I can use % ugrep '[\x{034F}]' *
Daniel.xml: <w>יְרוּשָׁלַ֖͏ִם</w>
Daniel.xml: <w>קֽ͏ָדָמַ֖י</w>
Daniel.xml: <q>עֽ͏ָלִּ֔ין</q>
Daniel.xml: <w>וֽ͏ַאֲחַשְׁדַּרְפְּנַיָּא֙</w>
Daniel.xml: <w>קֽ͏ָדָמַ֔יהּ</w>
Daniel.xml: <w>קֽ͏ָדָמ֣וֹהִי</w>
Daniel.xml: <w>עֽ͏ַד־</w>
Daniel.xml: <w>וְנֽ͏ֶחֱלֵ֙יתִי֙</w>
!!! SNIP !!! If you take one of those strings and put it into Tim Whitlock's Unicode inspector, you can confirm that it is there, e.g. But the CGJs are not present in OSHB files. One way of checking this would be to use the output of I have not yet done this, it's on my to-do list. But does anyone have a list of issues with cantillation that are not correctly handled by this? Or does anyone have time to take a good look at whether this fixed the problem? |
Question: Is there a way I can use the |
Thanks for the clarification. And sorry for the bunny trail. If I want to raise the CGJ issue, should I start a new issue for that and copy the text over? David's point about the use of
And as I pointed out, that's a normal TEI usage. Is there a better way to do cantillation hierarchy? |
The SWORD usage is not a standard, just a choice. As I understand it, the point is to show the correspondence of a translation to the original text word order. If that is the case, it would be useless for the original language text itself. |
But TEI is a standard, OSIS is a standard. I think this is relevant: The Not a showstopper. If you use |
The OSIS manual says:
From the TEI specification for @n you quoted earlier, the attribute is flexible and not pinned to any one specific interpretation. |
Jonathan - I've read the thread and remain puzzled over the use of an encoded cantillation hierarchy using @n? Even if you intend to query: cantillation (word separator) cantillation (word separator) cantillation (reading right to left) So you can search for a pattern of cantillation marks occurring on separate words, I'm not sure what an @n attribute would add? The hierarchy between marks isn't consistently applied but unless you are working with textual witnesses that's unlikely to be an issue. What am I missing about your question? Do you have an example of the result you are seeking? (Noting that you could have an operator that allows multiple words to occur between cantillation marks, assuming you want to study larger cantillation structures in the text. I've seen that done in some research in England. |
The w elements in OSIS XML files contain an n attribute with various values.
The n attribute does not seem to be documented anywhere, or, if it is mentioned, it's not easy to find.
What do these values signify in this context?
The text was updated successfully, but these errors were encountered: