SGML2TeX is a prototype program to convert the SGML tags in a document to control sequences using the conventions of TeX. This document reflects v0.95β of the program.
The program is currently written in PCL, a language developed explicitly for the Intel 80*86 chips, because of the speed of execution possible and the availability of a royalty-free run-time execute module (included with the distribution of this program). The program is therefore restricted in its current state to execution on MS-DOS or in an MS-DOS task window of DesqView, DesqView/X or MS-Windows. A future version will be written in a more portable language system, probably CWEB.
SGML is the Standard Generalised Markup Language (ISO 8879), the international standard for text markup. An `SGML document' in the terms of the SGML2TeX program means an SGML `document instance': that is, the user text of an SGML document proper, without any DTD (Document Type Description) or other markup declarations. SGML2TeX does not perform any parsing or validating of the SGML document, as public-domain parsers are freely available for this purpose. It is therefore your responsibility to ensure that only a valid conformant orthogonal SGML document instance without minimisation is processed by this program, with the syntax inherent in the SGML Reference Concrete Syntax. No responsibility can be taken for the results of processing any other form of SGML text, but suggestions for improvement are always welcomed. (In plain language this means the program will process a normalised SGML file with no DTD or SGML declaration attached, but nothing else.)
TeX is the typesetting system devised by Donald Knuth, and its add-on variants such as LaTeX. The code output by the SGML2TeX program includes empty definitions for all elements, attributes and entities encountered, in a dummy style file. It is your responsibility to implement the style file in order to achieve the desired result. A configuration file option is available for the predefinition of known tags. The TeX system is available for almost all platforms in a choice of public-domain/shareware/freeware or commercial implementations: contact the TeX Users Group, Box 21041, Santa Barbara, CA 93121-1041, USA for further details (phone: [+1] (805) 963 1338; fax: [+1] (805) 963 8358; email tug@tug.org)
This is beta-release software. The program appears to perform as indicated below, but you are asked to report any bugs encountered to the author. The program is copyright of the author but unrestricted redistribution is permitted provided no modifications are made.
The program is distributed as a .zip file, so it must be copied to the root directory of your disk and unwrapped with the command
pkunzip -d -o sgml2texThis creates a directory called \pcl, containing the program and the PCL runtime module, and a batch file sgml2tex.bat in the root directory of the disk where the unzipping took place. This batch file can be moved to wherever in the path you keep your batch utilities.
The sgml2tex.htm file (this documentation) and its conversions are unwrapped into the \sgml directory. It is suggested that this directory is used for testing. Details of the HTML format (an application of SGML) are available online.
The TeX eplain macros are used in the default translation, and a copy of eplain.tex is included in the .zip file: this is unwrapped into the \emtex\texinput directory, so if you are not using emTeX, you should move this to wherever you keep your TeX macro files.
The batch file performs all necessary path-setting for execution, and resets the path afterwards, so no modification to config.sys or autoexec.bat is necessary.
A preprocessed copy of this document is included as file sgml2tex.ps for printing on PostScript printers.
If the sgml2tex.bat file is used, the command to run the program is
sgml2tex [/option [filename] ... ] sgmlfile [texfile [stylefile]]where
pcl run
sgml2tex with options and arguments as before. In this case it is your
responsibility to ensure that the \pcl directory is accessible to the
DOS path.
During processing, a percentage bar indicator shows how much of the file has been processed. Counters are displayed for lines, characters and words processed. Execution can be interrupted at any stage with Ctrl-Break, and the command quit can then be used to leave the program after doing so. After execution, control is returned to the DOS prompt.
SGML elements in the document are converted to a TeX-compatible form. The program represents the SGML of the source document by
&name;
to the form \name{} and similarly identifying them in the
style file;For an example, the SGML fragment
Goethe's use of storm imagery can be summarised in the last
lines of berstend reißt / Der Boden
unter meinen Füßen auf.
The style file output with the example above would contain the following entries:
The default configuration file for a given SGML document is taken from the file type of the SGML document file, but with its own file type of .cfg unless the /d option specifies otherwise (if this is not used, the default file sgml2tex.cfg is used, if present).
The configuration file can establish predefined equivalences for element names, attribute names and character entities, so avoiding the need to hand-edit a style file, and allowing further files using the same configuration file to be processed with reference to an existing style file.
The following statements can be put in the configuration file (a worked example is at the end of this document). The delimiter between tokens is one or more spaces. For this reason, space characters are not currently permitted within the TeX-strings.
element name TeX-start-string TeX-end-stringattribute name TeX-pre-string TeX-post-stringentity name TeX-stringspecial name keywordstyle namemap char string [string]Here is an example, suitable for the Goethe text quoted earlier:
A sample configuration file and style file, html.cfg and html.sty, are provided. The program can be tested with this present file, sgml2tex.htm, in order to convert, process and print the documentation.