-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Svarlang 01 #184
Svarlang 01 #184
Conversation
Let's see if we can enforce a single standard UTF-8 suffix style, which then allows us to check the conversions.
That is correct, albeit SvarLang's cats file have an extra provision for flagging "dirty" strings by prefixing them with a '?' mark:
I am not sure if this is relevant to what you do, just pointing it out.
It is the same with cats/kitten: at compile time some strings end up embedded within the executable. SvarLANG is no different on this point, only the way it is achieved is different because the strings are all kept in a separate obj file that is linked to the application instead of being spread out in the application's source code.
SvarLANG is able to load an LNG file that contains all the strings, and then the application is free to choose whatever language it wants. The main differences with cats/kitten are that there is only one (LNG) file that contains all languages, instead of having each language in a separate (*.EN, *.DE, *.PL, etc) file, and that said LNG file is not a text file, but a binary file that is assembled by a specialized tool (TLUMACZ.EXE). To simulate a behavior similar to how FreeDOS apps use cats/kitten, one would have to initialize SvarLANG as follows: More information about the SvarLANG library can be found here: http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.txt |
Hello Andrew, for FDISK it is handled like this: the translations for FDISK are stored in a single file called FDISK.LNG. This binary file is created in a two-step process while building FDISK.
The file format of the language files is described here: http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.txt. It is a binary format, which for each language contains a binary search table for the string ids with offsets into the per-language string table. At program start, the date area of the executable containing the default translation gets overwritten by the data for the selected language stored in the .LNG file. The selected language is determined by the LANG environment variable. |
Regarding extensability: The .LNG file can be extended by another language after FDISK has been compiled, but there is a constraint: The added language may not be more than 5% larger than the largest translation in existance when FDISK was compiled, because the space reserved for the translation area in the data segment of the executable has a fixed size. It is probably also not a good idea to mix the versions of the .LNG file and the executables, but beside a translation failing to load should not do any harm. |
Hi @mateuszviste and @boeckmann, thanks for your thorough replies, I get it now. So I guess the process to be followed comes down to how the freedos packager and the application builders want to establish the workflow. Currently I can imagine two workflows as Svarlang is flexible enough to cope with either scenario.
Workflow 1 has the advantage that application and language files are always in sync and translation changes are not size constrained. @shidel How do you see svarlang fitting into your current workflow? The CI is currently only used as a validator for kitten / help files to help prevent any non conformant translations creeping in, and its conversion output is unused. With this PR it would work to do the validation for Svarlang translations. |
Hi,
On Jan 2, 2024, at 7:54 AM, Andrew Bird ***@***.***> wrote:
Hi @mateuszviste <https://github.com/mateuszviste> and @boeckmann <https://github.com/boeckmann>, thanks for your thorough replies, I get it now. So I guess the process to be followed comes down to how the freedos packager and the application builders want to establish the workflow. Currently I can imagine two workflows as Svarlang is flexible enough to cope with either scenario.
Application builder driven.
Translators create / modify translations in fd-nls repo.
Application builder imports latest translations from fd-nls.
Application builder compiles application.
Application builder passes application including .lng file to FreeDOS packager.
FreeDOS packager publishes application in the usual manner.
FreeDOS packager driven.
Translators create / modify translations in fd-nls repo.
FreeDOS packager uses latest available application binary.
FreeDOS packager takes latest translations from fd-nls.
FreeDOS packager uses UTFTOCP and TLUMACZ tools to create a new .lng file (but has to be mindful of 5% growth)
FreeDOS packager creates new package using existing application binary and new .lng file.
Workflow 1 has the advantage that application and language files are always in sync and translation changes are not size constrained.
Workflow 2 has the advantage that translations can be updated without requiring an application rebuild.
Looking at the workflows you, 2.4 is not practical for several reasons.
First, the “be mindful of 5% growth”. Let’s say a program is not updated for years on end. That is a very frequent occurrence. During the period of development idleness, the translations can be updated multiple times. Lets say it gets 3 such revisions to the largest translation for the language. The first grows it by 4% which might by OK. The second shrinks it by 1% which is fine. And the final update grows it by 3%. No specific update increased the size by more than 5%. However, the total increase of all updates to that language would be 6%. It can be even worse if a different translation grows larger the the originally largest translation. Sure, there are ways to handle the problem. But, each come with their own complications. It is also something that only applies to programs with that 5% post-compile limitation.
Second, translation updates need to be provided in CodePage format to ensure they are displayed properly. Having a specific person (namely me) convert translations from UTF-8 to CodePage introduces the possibility of incorrect character set conversion. There are translations in numerous languages and I don’t know all of their CodePage values. Plus, I don’t think UTFTOCP supports all of the different languages. Therefore, any translators should supply the CodePage version and if possible a UTF-8 version of their translations.
I have know idea what TLUMACZ is or does.
As for item 2.3, this used to be handled by the RBE (release build environment) during creation of an operating system release. But, that was very time consuming. So, that burden was off-loaded to the package staging process.
All OS packages are staged in projects on GitLab. When the the program has an update, it is reorganized (if necessary) and pushed to GitLab. This is also what is done regarding it’s translations. I will pull the latest FD-NLS project. Then for projects that have updated translations, I run “fdvcs.sh -nls” which imports (most of) the current NLS CodePage translations (ignoring UTF-8 files) into the project. There are some packages that must be handled manually. I then push the updated package to GitLab for inclusion in the next release/build.
@shidel <https://github.com/shidel> How do you see svarlang fitting into your current workflow?
I don’t know what you are asking me.
So, I’ll answer based on software development.
All the different programs I write and maintain for FreeDOS do not use Kitten or SvarLang. Except for QCRT based programs, all the others use my V8 Power Tools based format for language translation. Those are extremely flexible and support things like descriptive text keys, no line length restrictions and no fixed restrictions to translation file size.
QCRT based programs are different. PGME (running on the QCRT framework) is a complex object oriented behemoth. Like all QCRT derived works, it does not use DOS CodePages to handle text. Instead it loads custom font’s for a language translation file. The actual translation file resembles an INI file with some differences. The individual translations in the file are namespace based text chains for the individual objects that override the built-in text for the item. Things like “DIALOG.OPTIONS.LANGUAGE.CAPTION=Language settings”.
As for the V8 based translations, they are similar to kitten. In fact when creating a program, you could do the translation using the kitten and I think the SvarLang format as well. But generally, I use the V8's support for string keys instead (like “FORMAT.HARDDISK=Format drive (Y/N)?”). I think it makes it much easier to understand how any given text is used. Along with the embedded comments in the translation file, I think it is generally easier for translators.
… The CI is currently only used as a validator for kitten / help files to help prevent any non conformant translations creeping in, and its conversion output is unused. With this PR it would work to do the validation for Svarlang translations.
—
Reply to this email directly, view it on GitHub <#184 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZYEWMWUUXG2KUSRRMBE3YMP7QFAVCNFSM6AAAAABBJEM6BGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZTHE4TCOBZGI>.
You are receiving this because you were mentioned.
|
I would prefer maintaining the translation files in the FDISK Github repository, like it is done now. That makes sure I do not miss incoming translations and can provide a new binary, if necessary. I also update the FreeDOS FDISK package repository located at Gitlab containing the binary and .LNG file whenever necessary. For the sake of completeness I can provide changed translation files to the FD-NLS repository, if requested. But if I understand Jerome correctly he already does this via script. |
@shidel wrote:
It's a tool that creates a single binary .lng file from multiple files that are codepage encoded. Svarlang programs only read this single file for translations as they don't use the codepage text files.
Sorry I was unclear, but it's fine anyway because you mostly answered my question above that, in what your packaging workflow is. I can see now that my workflow 2 is probably not going to work for you, even without the difficulties around language growth. @boeckmann wrote:
I hadn't actually considered that application developers might want to do this. This being the case I don't think there's any value in my PR here, so I'll drop it I think. |
I did a little work on adding svarlang type nls. It builds upon another PR I have open for CI changes, so as yet it's not ready to merge. The svarlang commits are the last three.
Essentially the svarlang files are cats format, like kitten, but in utf-8 encoding rather than codepages. With this in mind the CI validator does a temporary conversion to the necessary codepage in order to check that only utf-8 chars that can be converted into the target codepage have been used in the source file.
I don't fully understand the process that svarlang uses to select the required language, but at least some of it is being done at application build time. Whether there's any facility for the user to change languages at runtime and so require that additional language files are shipped (like is done with kitten) is unclear to me. So for now I see fd-nls as a place to create and modify translation files for svarlang based applications, and the packager would then retrieve them and add them into the build. This may be incorrect, so please correct me @boeckmann , @mateuszviste, @shidel
Refers to #172