Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Svarlang 01 #184

Closed
wants to merge 9 commits into from
Closed

Svarlang 01 #184

wants to merge 9 commits into from

Conversation

andrewbird
Copy link
Contributor

@andrewbird andrewbird commented Jan 1, 2024

I did a little work on adding svarlang type nls. It builds upon another PR I have open for CI changes, so as yet it's not ready to merge. The svarlang commits are the last three.

Essentially the svarlang files are cats format, like kitten, but in utf-8 encoding rather than codepages. With this in mind the CI validator does a temporary conversion to the necessary codepage in order to check that only utf-8 chars that can be converted into the target codepage have been used in the source file.

I don't fully understand the process that svarlang uses to select the required language, but at least some of it is being done at application build time. Whether there's any facility for the user to change languages at runtime and so require that additional language files are shipped (like is done with kitten) is unclear to me. So for now I see fd-nls as a place to create and modify translation files for svarlang based applications, and the packager would then retrieve them and add them into the build. This may be incorrect, so please correct me @boeckmann , @mateuszviste, @shidel

Refers to #172

@mateuszviste
Copy link

Essentially the svarlang files are cats format, like kitten, but in utf-8 encoding rather than codepages.

That is correct, albeit SvarLang's cats file have an extra provision for flagging "dirty" strings by prefixing them with a '?' mark:

?1.1:Hello, World!

I am not sure if this is relevant to what you do, just pointing it out.

I don't fully understand the process that svarlang uses to select the required language, but at least some of it is being done at application build time.

It is the same with cats/kitten: at compile time some strings end up embedded within the executable. SvarLANG is no different on this point, only the way it is achieved is different because the strings are all kept in a separate obj file that is linked to the application instead of being spread out in the application's source code.

Whether there's any facility for the user to change languages at runtime and so require that additional language files are shipped (like is done with kitten) is unclear to me.

SvarLANG is able to load an LNG file that contains all the strings, and then the application is free to choose whatever language it wants. The main differences with cats/kitten are that there is only one (LNG) file that contains all languages, instead of having each language in a separate (*.EN, *.DE, *.PL, etc) file, and that said LNG file is not a text file, but a binary file that is assembled by a specialized tool (TLUMACZ.EXE).

To simulate a behavior similar to how FreeDOS apps use cats/kitten, one would have to initialize SvarLANG as follows:
svarlang_autoload_pathlist("myprog", getenv("NLSPATH"), getenv("LANG"));

More information about the SvarLANG library can be found here:

http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.txt
http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.h

@boeckmann
Copy link

boeckmann commented Jan 1, 2024

Hello Andrew, for FDISK it is handled like this: the translations for FDISK are stored in a single file called FDISK.LNG. This binary file is created in a two-step process while building FDISK.

  1. The UTF-8 encoded input files for the different translations are converted individually by the tool UTFTOCP to the correct codepage as specified by the makefile. The mapping language -> codepage is done manually, but generally could be automated I think by matching filename patterns, for example DE.TXT -> codepage 858 by taking the DE into account.
  2. All converted language files are processed in one go by the program TLUMACZ, creating the FDISK.LNG file and a C file named DEFLANG.C containing the default translation (usually english). The DEFLANG.C has to be linked into the binary.

The file format of the language files is described here: http://svn.svardos.org/filedetails.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.txt. It is a binary format, which for each language contains a binary search table for the string ids with offsets into the per-language string table. At program start, the date area of the executable containing the default translation gets overwritten by the data for the selected language stored in the .LNG file. The selected language is determined by the LANG environment variable.

@boeckmann
Copy link

Regarding extensability: The .LNG file can be extended by another language after FDISK has been compiled, but there is a constraint: The added language may not be more than 5% larger than the largest translation in existance when FDISK was compiled, because the space reserved for the translation area in the data segment of the executable has a fixed size.

It is probably also not a good idea to mix the versions of the .LNG file and the executables, but beside a translation failing to load should not do any harm.

@andrewbird
Copy link
Contributor Author

Hi @mateuszviste and @boeckmann, thanks for your thorough replies, I get it now. So I guess the process to be followed comes down to how the freedos packager and the application builders want to establish the workflow. Currently I can imagine two workflows as Svarlang is flexible enough to cope with either scenario.

  1. Application builder driven.

    1. Translators create / modify translations in fd-nls repo.
    2. Application builder imports latest translations from fd-nls.
    3. Application builder compiles application.
    4. Application builder passes application including .lng file to FreeDOS packager.
    5. FreeDOS packager publishes application in the usual manner.
  2. FreeDOS packager driven.

    1. Translators create / modify translations in fd-nls repo.
    2. FreeDOS packager uses latest available application binary.
    3. FreeDOS packager takes latest translations from fd-nls.
    4. FreeDOS packager uses UTFTOCP and TLUMACZ tools to create a new .lng file (but has to be mindful of 5% growth)
    5. FreeDOS packager creates new package using existing application binary and new .lng file.

Workflow 1 has the advantage that application and language files are always in sync and translation changes are not size constrained.
Workflow 2 has the advantage that translations can be updated without requiring an application rebuild.

@shidel How do you see svarlang fitting into your current workflow?

The CI is currently only used as a validator for kitten / help files to help prevent any non conformant translations creeping in, and its conversion output is unused. With this PR it would work to do the validation for Svarlang translations.

@shidel
Copy link
Owner

shidel commented Jan 2, 2024 via email

@boeckmann
Copy link

I would prefer maintaining the translation files in the FDISK Github repository, like it is done now. That makes sure I do not miss incoming translations and can provide a new binary, if necessary. I also update the FreeDOS FDISK package repository located at Gitlab containing the binary and .LNG file whenever necessary. For the sake of completeness I can provide changed translation files to the FD-NLS repository, if requested. But if I understand Jerome correctly he already does this via script.

@andrewbird
Copy link
Contributor Author

@shidel wrote:

I have know idea what TLUMACZ is or does.

It's a tool that creates a single binary .lng file from multiple files that are codepage encoded. Svarlang programs only read this single file for translations as they don't use the codepage text files.

How do you see svarlang fitting into your current workflow?

I don’t know what you are asking me.

Sorry I was unclear, but it's fine anyway because you mostly answered my question above that, in what your packaging workflow is. I can see now that my workflow 2 is probably not going to work for you, even without the difficulties around language growth.

@boeckmann wrote:

I would prefer maintaining the translation files in the FDISK Github repository, like it is done now.

I hadn't actually considered that application developers might want to do this. This being the case I don't think there's any value in my PR here, so I'll drop it I think.

@andrewbird andrewbird closed this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants