This project provides scripts for transforming a MadCap Flare project into a Sphinx project. It consists of the following components:
- A Python script,
clean_html.py
, to process Flare's HTML files as follows:- Remove non-breaking spaces from the HTML files. ReSt does not like non-breaking spaces.
- Change the endings of internal links from ".htm" to ".html". While Flare's HTML files have the former endings, Sphinx generates files with ".html" endings. Changing the endings ensures that internal links do not break.
- A batch file,
convert.bat
, that runsclean_html.py
followed by a Pandoc command on every HTML file in the Flare project's "Contents" folder. The Pandoc command converts the processed HTML files into.rst
files. - A Python script,
parse_toc.py
, that reads the Flare project's table of contents (TOC) file, then appends the appropriatetoctree
directives to the RST files that have child topics.
The Python scripts use standard Python libraries and the HTML parsing library Beautiful Soup. If you are using pip as your Python package manager, you can install the latest version of Beautiful Soup using the command pip install beautifulsoup4
. The scripts were tested on Python 3, but they should run fine on Python 2---I don't think I used anything specific to Python 3.
You should install Pandoc in order to run convert.bat
. We use a batch script because Flare runs on Windows only; however, the commands in the script are exceedingly simple, so it should be trivial to convert the batch file into a bash script.
In parse_toc.py
, the extractSubMenus()
function assumes that file paths are in Windows, so the \
separator is used. If you are on a *nix system, you'll want to edit the lines that implement this separator.
The scripts assume the following folder structure for your Sphinx project:
ProjectFolder
| index.rst (top level of TOC; will be generated here by default)
| convert.bat
| clean_html.py
| parse_toc.py
| .fltoc file (Flare table of contents)
|---Contents folder (copied from Flare project)
| Flare HTML and generated RST files
In short,
- The project folder should contain all the conversion scripts.
- You should copy your Flare project's
Contents
folder into the project folder. - The scripts will generate all the
.rst
files in the same subfolders as their corresponding HTML files. The only exception is the file containing the top levels of the TOC,index.rst
. This will be stored directly in the project folder.
Flare projects allow for parent nodes in the TOC that have no content themselves, but that when expanded, contain child topics. parse_toc.py
currently does not have a foolproof way of handling such nodes. For parent nodes with content, it relies on the Link
attribute in the .fltoc
file. Contentless parent nodes do not have this attribute, so parse_toc.py
relies on the Title
attribute for these nodes, and writes the .rst
files corresponding to these nodes directly under the "Contents" folder. However, for reasons I haven't had time to understand yet, some contentless nodes fail to be replicated correctly this way.
At present, parse_toc.py
also prints to the console the titles of contentless parent nodes, so that you can have a list of them, manually check which ones aren't converted by the script, and manually convert them yourselves. If you do not have any contentless parent nodes in your project (i.e. if all TocEntry
elements in your .fltoc
file have the Link
attribute), there is no need for manual conversion.
The default script inserts a titlesonly
option under the toctree
directives it creates. This imitates Flare's TOC structure, since Flare does not include topic subheaders in its TOCs. To remove this option or to add more options (hidden
, captions, etc.), edit the first part of the extractPaths()
function.
The script also inserts the toctree
directive at the end of the file. If you want the local TOCs to be displayed elsewhere in the document, you will have to edit extractPaths()
. Note that you cannot put the toctree
directive before the first header in a document---it messes up Sphinx's interpretation of the TOC structure.
- Create the folder for your Sphinx project and put the scripts from this Git repository in it. You can simply clone the repository and use its folder as the Sphinx project folder.
- Copy the
Contents
folder of your Flare project into your Sphinx project folder. Make sure to copy theContents
folder that is in the same folder as the.flprj
file. Do not copy theContents
folder that is under theOutput
folder, as the latter will contain extra styling elements. - Copy the desired Flare table of contents into your Sphinx project folder. This should be an
.fltoc
file within theProject\TOCs
subfolder of your Flare project folder. - In the Windows Powershell or command terminal, run
convert.bat
. - In the Windows Powershell or command terminal, run the TOC parsing script by typing
python parse_toc.py yourFlareTOCFile
. ReplaceyourFlareTOCFile
with the name of the.fltoc
file that contains your TOC structure. - You should now have an
index.rst
file that contains atoctree
directive with the top-level topics in your TOC. YourContents
folder will now contain an RST file for each HTML file. RST files for topics with children will have the appropriatetoctree
directive at the end of each file. - You can test the new TOC by running the
sphinx-quickstart
command in the Sphinx project folder, and following the instructions for setting up a quick test build.