Svarlang 01 #184

andrewbird · 2024-01-01T16:28:53Z

I did a little work on adding svarlang type nls. It builds upon another PR I have open for CI changes, so as yet it's not ready to merge. The svarlang commits are the last three.

Essentially the svarlang files are cats format, like kitten, but in utf-8 encoding rather than codepages. With this in mind the CI validator does a temporary conversion to the necessary codepage in order to check that only utf-8 chars that can be converted into the target codepage have been used in the source file.

I don't fully understand the process that svarlang uses to select the required language, but at least some of it is being done at application build time. Whether there's any facility for the user to change languages at runtime and so require that additional language files are shipped (like is done with kitten) is unclear to me. So for now I see fd-nls as a place to create and modify translation files for svarlang based applications, and the packager would then retrieve them and add them into the build. This may be incorrect, so please correct me @boeckmann , @mateuszviste, @shidel

Refers to #172

Let's see if we can enforce a single standard UTF-8 suffix style, which then allows us to check the conversions.

mateuszviste · 2024-01-01T19:50:00Z

Essentially the svarlang files are cats format, like kitten, but in utf-8 encoding rather than codepages.

That is correct, albeit SvarLang's cats file have an extra provision for flagging "dirty" strings by prefixing them with a '?' mark:

?1.1:Hello, World!

I am not sure if this is relevant to what you do, just pointing it out.

I don't fully understand the process that svarlang uses to select the required language, but at least some of it is being done at application build time.

It is the same with cats/kitten: at compile time some strings end up embedded within the executable. SvarLANG is no different on this point, only the way it is achieved is different because the strings are all kept in a separate obj file that is linked to the application instead of being spread out in the application's source code.

Whether there's any facility for the user to change languages at runtime and so require that additional language files are shipped (like is done with kitten) is unclear to me.

SvarLANG is able to load an LNG file that contains all the strings, and then the application is free to choose whatever language it wants. The main differences with cats/kitten are that there is only one (LNG) file that contains all languages, instead of having each language in a separate (*.EN, *.DE, *.PL, etc) file, and that said LNG file is not a text file, but a binary file that is assembled by a specialized tool (TLUMACZ.EXE).

To simulate a behavior similar to how FreeDOS apps use cats/kitten, one would have to initialize SvarLANG as follows:
svarlang_autoload_pathlist("myprog", getenv("NLSPATH"), getenv("LANG"));

More information about the SvarLANG library can be found here:

http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.txt
http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.h

boeckmann · 2024-01-01T19:50:15Z

Hello Andrew, for FDISK it is handled like this: the translations for FDISK are stored in a single file called FDISK.LNG. This binary file is created in a two-step process while building FDISK.

The UTF-8 encoded input files for the different translations are converted individually by the tool UTFTOCP to the correct codepage as specified by the makefile. The mapping language -> codepage is done manually, but generally could be automated I think by matching filename patterns, for example DE.TXT -> codepage 858 by taking the DE into account.
All converted language files are processed in one go by the program TLUMACZ, creating the FDISK.LNG file and a C file named DEFLANG.C containing the default translation (usually english). The DEFLANG.C has to be linked into the binary.

The file format of the language files is described here: http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.txt. It is a binary format, which for each language contains a binary search table for the string ids with offsets into the per-language string table. At program start, the date area of the executable containing the default translation gets overwritten by the data for the selected language stored in the .LNG file. The selected language is determined by the LANG environment variable.

boeckmann · 2024-01-01T20:04:15Z

Regarding extensability: The .LNG file can be extended by another language after FDISK has been compiled, but there is a constraint: The added language may not be more than 5% larger than the largest translation in existance when FDISK was compiled, because the space reserved for the translation area in the data segment of the executable has a fixed size.

It is probably also not a good idea to mix the versions of the .LNG file and the executables, but beside a translation failing to load should not do any harm.

andrewbird · 2024-01-02T12:54:14Z

Hi @mateuszviste and @boeckmann, thanks for your thorough replies, I get it now. So I guess the process to be followed comes down to how the freedos packager and the application builders want to establish the workflow. Currently I can imagine two workflows as Svarlang is flexible enough to cope with either scenario.

Application builder driven.
1. Translators create / modify translations in fd-nls repo.
2. Application builder imports latest translations from fd-nls.
3. Application builder compiles application.
4. Application builder passes application including .lng file to FreeDOS packager.
5. FreeDOS packager publishes application in the usual manner.
FreeDOS packager driven.
1. Translators create / modify translations in fd-nls repo.
2. FreeDOS packager uses latest available application binary.
3. FreeDOS packager takes latest translations from fd-nls.
4. FreeDOS packager uses UTFTOCP and TLUMACZ tools to create a new .lng file (but has to be mindful of 5% growth)
5. FreeDOS packager creates new package using existing application binary and new .lng file.

Workflow 1 has the advantage that application and language files are always in sync and translation changes are not size constrained.
Workflow 2 has the advantage that translations can be updated without requiring an application rebuild.

@shidel How do you see svarlang fitting into your current workflow?

The CI is currently only used as a validator for kitten / help files to help prevent any non conformant translations creeping in, and its conversion output is unused. With this PR it would work to do the validation for Svarlang translations.

shidel · 2024-01-02T14:52:43Z

Hi,

On Jan 2, 2024, at 7:54 AM, Andrew Bird ***@***.***> wrote: Hi @mateuszviste <https://github.com/mateuszviste> and @boeckmann <https://github.com/boeckmann>, thanks for your thorough replies, I get it now. So I guess the process to be followed comes down to how the freedos packager and the application builders want to establish the workflow. Currently I can imagine two workflows as Svarlang is flexible enough to cope with either scenario. Application builder driven. Translators create / modify translations in fd-nls repo. Application builder imports latest translations from fd-nls. Application builder compiles application. Application builder passes application including .lng file to FreeDOS packager. FreeDOS packager publishes application in the usual manner. FreeDOS packager driven. Translators create / modify translations in fd-nls repo. FreeDOS packager uses latest available application binary. FreeDOS packager takes latest translations from fd-nls. FreeDOS packager uses UTFTOCP and TLUMACZ tools to create a new .lng file (but has to be mindful of 5% growth) FreeDOS packager creates new package using existing application binary and new .lng file. Workflow 1 has the advantage that application and language files are always in sync and translation changes are not size constrained. Workflow 2 has the advantage that translations can be updated without requiring an application rebuild.

Looking at the workflows you, 2.4 is not practical for several reasons. First, the “be mindful of 5% growth”. Let’s say a program is not updated for years on end. That is a very frequent occurrence. During the period of development idleness, the translations can be updated multiple times. Lets say it gets 3 such revisions to the largest translation for the language. The first grows it by 4% which might by OK. The second shrinks it by 1% which is fine. And the final update grows it by 3%. No specific update increased the size by more than 5%. However, the total increase of all updates to that language would be 6%. It can be even worse if a different translation grows larger the the originally largest translation. Sure, there are ways to handle the problem. But, each come with their own complications. It is also something that only applies to programs with that 5% post-compile limitation. Second, translation updates need to be provided in CodePage format to ensure they are displayed properly. Having a specific person (namely me) convert translations from UTF-8 to CodePage introduces the possibility of incorrect character set conversion. There are translations in numerous languages and I don’t know all of their CodePage values. Plus, I don’t think UTFTOCP supports all of the different languages. Therefore, any translators should supply the CodePage version and if possible a UTF-8 version of their translations. I have know idea what TLUMACZ is or does. As for item 2.3, this used to be handled by the RBE (release build environment) during creation of an operating system release. But, that was very time consuming. So, that burden was off-loaded to the package staging process. All OS packages are staged in projects on GitLab. When the the program has an update, it is reorganized (if necessary) and pushed to GitLab. This is also what is done regarding it’s translations. I will pull the latest FD-NLS project. Then for projects that have updated translations, I run “fdvcs.sh -nls” which imports (most of) the current NLS CodePage translations (ignoring UTF-8 files) into the project. There are some packages that must be handled manually. I then push the updated package to GitLab for inclusion in the next release/build.

@shidel <https://github.com/shidel> How do you see svarlang fitting into your current workflow?

I don’t know what you are asking me. So, I’ll answer based on software development. All the different programs I write and maintain for FreeDOS do not use Kitten or SvarLang. Except for QCRT based programs, all the others use my V8 Power Tools based format for language translation. Those are extremely flexible and support things like descriptive text keys, no line length restrictions and no fixed restrictions to translation file size. QCRT based programs are different. PGME (running on the QCRT framework) is a complex object oriented behemoth. Like all QCRT derived works, it does not use DOS CodePages to handle text. Instead it loads custom font’s for a language translation file. The actual translation file resembles an INI file with some differences. The individual translations in the file are namespace based text chains for the individual objects that override the built-in text for the item. Things like “DIALOG.OPTIONS.LANGUAGE.CAPTION=Language settings”. As for the V8 based translations, they are similar to kitten. In fact when creating a program, you could do the translation using the kitten and I think the SvarLang format as well. But generally, I use the V8's support for string keys instead (like “FORMAT.HARDDISK=Format drive (Y/N)?”). I think it makes it much easier to understand how any given text is used. Along with the embedded comments in the translation file, I think it is generally easier for translators.

…

The CI is currently only used as a validator for kitten / help files to help prevent any non conformant translations creeping in, and its conversion output is unused. With this PR it would work to do the validation for Svarlang translations. — Reply to this email directly, view it on GitHub <#184 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZYEWMWUUXG2KUSRRMBE3YMP7QFAVCNFSM6AAAAABBJEM6BGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZTHE4TCOBZGI>. You are receiving this because you were mentioned.

boeckmann · 2024-01-02T15:35:23Z

I would prefer maintaining the translation files in the FDISK Github repository, like it is done now. That makes sure I do not miss incoming translations and can provide a new binary, if necessary. I also update the FreeDOS FDISK package repository located at Gitlab containing the binary and .LNG file whenever necessary. For the sake of completeness I can provide changed translation files to the FD-NLS repository, if requested. But if I understand Jerome correctly he already does this via script.

andrewbird · 2024-01-02T16:58:49Z

@shidel wrote:

I have know idea what TLUMACZ is or does.

It's a tool that creates a single binary .lng file from multiple files that are codepage encoded. Svarlang programs only read this single file for translations as they don't use the codepage text files.

How do you see svarlang fitting into your current workflow?

I don’t know what you are asking me.

Sorry I was unclear, but it's fine anyway because you mostly answered my question above that, in what your packaging workflow is. I can see now that my workflow 2 is probably not going to work for you, even without the difficulties around language growth.

@boeckmann wrote:

I would prefer maintaining the translation files in the FDISK Github repository, like it is done now.

I hadn't actually considered that application developers might want to do this. This being the case I don't think there's any value in my PR here, so I'll drop it I think.

andrewbird added 9 commits December 31, 2023 10:13

kittenc: Fixup filename case

fdbb6b4

kittenc: Rename UTF-8 files to std form

9b37dc9

kittenc: Remove duplicate files

edd6726

kittenc: Add kittenc to the test

0f87321

CI: Check for frequent non standard UTF-8 suffix

ddd4594

Let's see if we can enforce a single standard UTF-8 suffix style, which then allows us to check the conversions.

CI: Rename spurious UTF-8 suffixes to std form

4f08995

CI: Split Help from Kitten generation

5eb6c15

fdisk: Import Svarlang UTF-8 source files

b651ec5

CI: Add Svarlang UTF-8 > codepage conversion check

f5f4174

andrewbird force-pushed the svardos-01 branch from 12412f4 to f5f4174 Compare January 1, 2024 16:36

andrewbird closed this Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Svarlang 01 #184

Svarlang 01 #184

andrewbird commented Jan 1, 2024 •

edited

Loading

mateuszviste commented Jan 1, 2024

boeckmann commented Jan 1, 2024 •

edited

Loading

boeckmann commented Jan 1, 2024

andrewbird commented Jan 2, 2024

shidel commented Jan 2, 2024 via email

boeckmann commented Jan 2, 2024

andrewbird commented Jan 2, 2024

Svarlang 01 #184

Svarlang 01 #184

Conversation

andrewbird commented Jan 1, 2024 • edited Loading

mateuszviste commented Jan 1, 2024

boeckmann commented Jan 1, 2024 • edited Loading

boeckmann commented Jan 1, 2024

andrewbird commented Jan 2, 2024

shidel commented Jan 2, 2024 via email

boeckmann commented Jan 2, 2024

andrewbird commented Jan 2, 2024

andrewbird commented Jan 1, 2024 •

edited

Loading

boeckmann commented Jan 1, 2024 •

edited

Loading