Skip to content

janbabak/The-world-factbook-data-representation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

The world factbook data representation

XML XSLT XSLFO DTD RelaxNG JavaScript HTML5 CSS3

Task

  • Choose 4 states from The world factbook and represent them in XML format.
  • Develop a DTD and RelaxNG schema to validate the structure of the XML documents.
  • Utilize XSLT to generate HTML outputs, including one page for each state and an index that features links to all pages.
  • Create a navigation menu for every subpage with CSS.
  • Use XSL-FO to generate a PDF output, featuring a single PDF for each state and an additional PDF containing all states. Include headers, footers, and page numbers.
  • Add pictures of maps and flags to both the HTML and PDF output.

Project structure

src
├── author
├── DTDConcatenation
├── fop
├── generatedPdf
├── generatedWeb
│ ├── css
│ ├── html
│ └── javaScript
├── generatedXML
├── htmlParser
├── htmlSourceData
├── images
│ ├── flags
│ └── maps
├── saxon
├── scripts
├── validators
├── xslfo
└── xslt

The following folders are included in this project:

  • The author folder holds information about the author in XML format.
  • The DTDConcatenation folder contains a DTD file which combines all generated XML files into one.
  • The fop folder contains Fop 2.6 software which can be used to convert fo files into pdf.
  • The generatedPdf folder consists of generated pdf files that can be created using scripts.
  • The generatedWeb folder, which can also be generated by scripts, includes an HTML folder with all generated HTML files for the web.
  • The generatedXML folder contains XML files representing states that can be generated by scripts.
  • The htmlParser folder has a Nodejs program that parses htmlData and creates an XML representation of them.
  • The htmlSourceData folder stores HTML pages downloaded from The World Factbook. These are source files for the HTML parser.
  • The images folder contains images used for both the web and PDF.
  • The saxon folder contains jar files for Saxon home edition 10.3 software.
  • The scripts folder includes bash scripts for generating and validating files.
  • The validators folder has DTD and RelaxNG validation schemas.
  • The xslfo folder contains fo files for generating PDF, which can be generated by scripts.
  • The xslt folder contains XSLT stylesheets for transforming XML files into HTML and fo files.

Software Requirements

The following are the software requirements for running the scripts:

  • Linux: All scripts run on bash.
  • Nodejs: A runtime for JavaScript used by HTML parser.
  • Npm: A package manager that should be part of Nodejs.
  • xmllint: Used for DTD validation, and is the default tool in many Linux distributions.
  • trang: Used for generating an rng file from rnc.
  • Java: Used by Saxon (XSLT processor).

To run all the steps at once, follow these instructions:

  • Use a bash script to generate and validate files.
  • Run all commands from the project root directory.
  • Make all scripts executable by running
    chmod +x src/scripts/*
  • Run the "run all" script.
    src/scripts/runAll.sh
  • If any XML file is invalid, rerun the script. It will recreate the files and delete the old invalid files.

Running steps separately

Please follow these steps in the given order as they rely on each other.

Generating XML Files

The HTML files downloaded from the internet can be converted to XML files using the htmlParser program. This program is written in JavaScript and run by Nodejs. If the src/generatedXML folder does not exist, the program will create it. However, if it already exists, the folder will be deleted and recreated. The program HTML sources are located in src/htmlSourceData. To perform this step, you need to have Nodejs and npm package manager. Run all commands from the project root directory.

You can generate the XML files automatically using the bash script or manually. To generate XML files automatically:

  • Make script executable by running
    chmod +x src/scripts/generateXML.sh
  • Run script by running
    src/scripts/generateXML.sh

To generate XML files manually:

  • Go to folder by running
    cd src/htmlParser
  • Download dependencies by running the following command. This will download the dependencies defined in package.json.
    npm install
  • Start the program by running
    node main.js

Validating XML files

DTD Validation

To validate your XML files using DTD schema, follow the steps below:

  • The DTD schema can be found in src/validators/stateType.dtd.
  • Before proceeding, make sure to generate the XML files that you want to validate from the previous step.
  • To perform the validation process, you will need to have the software xmllint installed.
  • All commands should be run from the project root directory.
  • For automatic validation of all states using a bash script, follow these steps:
    • Make the script executable by running
      chmod +x src/scripts/validateDTD.sh
    • Run the script using
      src/scripts/validateDTD.sh
  • For manual validation of individual states, use the following command and replace <xml file> with the file that you wish to evaluate:
    xmllint --noout --dtdvalid src/validators/stateType.dtd src/generatedXML/<xml file>

RelaxNG Validation

To perform RelaxNG validation, follow these steps:

  • The RelaxNG grammar can be found in src/validators/stateType.rnc.
  • Before validating, you must generate the XML file.
  • This step requires the software trang and xmllint.
  • Make sure to run all commands from the project root directory.
  • To automatically validate all states using a bash script, make the script executable by running
    chmod +x src/scripts/validateRelaxNG.sh
    and then run the script using
    src/scripts/validateRelaxNG.sh
  • To manually validate individual states, generate the rng file by running
    trang src/validators/stateType.rnc src/validators/stateType.rng
    Then validate the state by replacing <xml file> with the file you want to evaluate, and running
    xmllint --noout --relaxng src/validators/stateType.rng src/generatedXML/<xml file>

Concatenating Generated Files into One

To concatenate generated files into one, follow these steps:

  • The DTD definition of the concatenated file can be found in src/DTDConcatenation/concatenatedXML.xml.
  • The resulting file will be saved in src/generatedXML/concatenated.xml.
  • Before starting, ensure that you have installed xmllint software.
  • All commands should be run from the project root directory.
  • Automatic concatenation using a bash script
    • First, make the script executable by running
      chmod +x src/scripts/concatenateStates.sh
    • Then, run the script using
      src/scripts/concatenateStates.sh
  • Manual concatenation
    • Run the following command:
      xmllint --noent src/DTDConcatenation/concatenateXML.xml > src/generatedXML/concatenated.xml

Generating HTML Files

  • To generate HTML files, you must use XSLT style sheets src/xslt/htmlHomePage.xsl and src/xslt/htmlSateStyle.xsl with XML files in src/generatedXML.
  • Once generated, the files will be available in src/generatedWeb/html.
  • This process requires Java, and all commands must be run from the project root directory.

Automatic generation using bash script

  • Make the script executable by running
    chmod +x src/scripts/generateWeb.sh
  • Run the script by running
    src/scripts/generateWeb.sh

Manual generation

  • Create a directory for output by running

    mkdir -p src/generatedWeb/html > /dev/null 2>&1
  • Generate the index by running

    java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/concatenated.xml src/xslt/htmlHomePageStyle.xsl > src/generatedWeb/html/index.html
  • Generate the states by running the following command. Remember to replace <state> with the name of the state (for example, France or Germany).

    java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/<state>.xml src/xslt/htmlStateStyle.xsl > src/generatedWeb/html/<state>.html

Generating PDF files

  • PDFs are generated from src/xslfo/*.fo files
  • src/xslfo/*.fo files are generated from XML files located in src/generatedXML using src/xslt/pdf*.xsl stylesheets
  • Generated pdf files will be in the src/generatedPdf folder.
  • This step requires the fop software, which I included in the src/fop directory.
  • All commands should be run from the project root directory.
  • Automatic generation using a bash script
    • Make the script executable by running
      chmod +x src/scripts/generatePdf.sh
    • Run the script by running
      src/scripts/generatePdf.sh'
  • Manual generation
  • Create the output directory by running the following command
    mkdir -p src/generatedPdf > /dev/null 2>&1` `mkdir -p src/xslfo > /dev/null 2>&1
  • Generate fo file from all the states (Germany, UK, Switzerland, France)
    java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/concatenated.xml src/xslt/pdfAllStatesStyle.xsl > src/xslfo/allStates.fo
  • Generate fo file for single state and don't forget to replace <state> by the actual name of the state.
    java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/<state>.xml src/xslt/pdfStateStyle.xsl > src/xslfo/<state>.fo
  • Generate pdf file of all the states
    src/fop/fop/fop src/xslfo/allStates.fo src/generatedPdf/allStates.pdf
  • Generate pdf file of single state. The <state> should be replaced by the actual name of the state.
    src/fop/fop/fop src/xslfo/<state>.fo src/generatedPdf/<state>.pdf

Sources

http://saxon.sourceforge.net/
https://relaxng.org/jclark/trang.html
https://relaxng.org/compact-tutorial-20030326.html
https://www.w3schools.com/xml/default.asp
https://www.w3schools.com/xml/xpath_intro.asp
https://www.w3schools.com/xml/xsl_intro.asp
https://www.w3schools.com/xml/xml_dtd_intro.asp
http://zvon.org/comp/r/tut-XSLT_1.html
https://www.youtube.com/watch?v=W--Yhp0m35A
https://www.youtube.com/watch?v=D2YzF4hm9NM
https://undraw.co/illustrations
https://fonts.google.com/
https://fontawesome.com/account
https://w3schools.sinsixx.com/xslfo/xslfo_lists.asp.htm
https://xmlgraphics.apache.org/fop/
https://www.kosek.cz/xml/schema/rng.html

About

Representation data about 4 countries in various formats.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published