This is the document which describes what HTML is and how it works. If you're reading this through a WorldWideWeb client program (a `browser'), then you already know how to get at it: if you're reading it on paper, then you should go to the end of this file, where there are some pointers about how to access this stuff online.
If you want further reading on paper, you may want to get something like The WorldWideWeb Handbook (ITCP, ISBN 1-85032-205-8), which describes the whole Web in more detail than is possible here, and includes a lot of illustrations and instructions on the finer points of making and managing Web files.
HTML stands for Hypertext Markup Language, and it's a way of describing the hypertext documents used in the WorldWideWeb. Hypertext is a concept which allows one piece of text to crossreference another, either in the same document or in other documents, especially on a computer or network (in the case of the Web, on the Internet). HTML is an application of SGML (Standard Generalized Markup Language), a language which is the international standard (ISO 8879) for describing text markup systems. HTML v2.0 is the official IETF version, codified as RFC 1866.
This document refers to several versions of HTML as well as the standard, notably the former (but still useful) HTML 3.0 draft published in March 1995, and the revised (but over-simplified) experimental HTML 3.2. The difference is that HTML 3.0 was a draft, but enough of it remains stable and supported in browsers for it to be usable, whereas HTML 3.2 introduced support for new presentations but removed a lot of useful information.
Since the publication of HTML3, a large number of additions have been proposed. Some of these were in the form of carefully thought-out extensions, many of which were adopted after discussion and refinement; others were hasty or ill-considered `enhancements', some of which either broke the language (SGML) or could otherwise seriously mislead users. Items of both kinds which are still under discussion are marked in this document with a `' symbol.
Contrary to what you may read in some magazine or newspaper articles (and even, alas, some books) written by people who should know better, HTML is not a wordprocessor or desktop publishing system like Microsoft Word, WordPerfect, Quark Xpress, TeX, or FrameMaker. Although any good browser can give an HTML file an appearance approaching that of a wordprocessed document, all it is is an interpretation of the way you describe things in the file itself.
The key is`describe': HTML lets you describe the meaning of the words you write (`this is a heading'; `this is an item in a list'), and it leaves the business of appearances largely to the reader's browser. An extensive style-sheet mechanism in HTML3 and 3.2 lets you suggest how things ought to look, but the final choice is the reader's.
Trying to reverse-engineer the Web into some kind of remote-control DTP is a pointless and non-productive task, and is ultimately doomed to failure: there are other, much more viable, proprietary, non-Web solutions such as Macromedia's Director, Adobe's Acrobat and Microsoft's PowerPoint for doing that.
This document is divided into three parts (like Cæsar's Gaul):
Beginner's introduction: simple documents, headings, paragraphs;
Adding more: lists, links, tables and forms;
Advanced features: mathematics, frames, scripts, style files, etc.
You should probably have a computer available if you want to experiment while you learn. If you just want help on a specific topic, see the table of contents where there are links to the relevant parts.
If you are going to be creating a lot of HTML documents, I recommend strongly that you get an SGML-conformant editor for your computer. There are lots available now, and some of them are free, and they will help ensure your files are constructed correctly. Without one, you may have a lot of extra typing to do, as HTML can be rather verbose, and you run the risk of confusing both your browser and your readers with ambiguous or misleading files.
Three of the most popular conformant editors used for HTML are:
Emacs using psgml-mode. Emacs is a text editor, originally for UNIX but now also available for PCs running DOS, Microsoft Windows, or DesqView/X; for Apple Macs and for VMS. It is freely distributable under the terms of the GNU manifesto. Copies can be downloaded from many archives by anonymous ftp.
Author/Editor from SoftQuad (Toronto, phone +1 416 239 4801). This is a commercial SGML editor which runs on UNIX/X, PC/MS-Windows and Apple Macintosh.
Apple Macs also have several quite good but non-conformant editors, such as Adobe's PageMill, or the BBedit extensions Macros, which you can use if you need to, but you should be aware that they offer no guarantees that the files you create will be valid.
On any platform you should be careful if you are using non-conformant software to edit someone else's conformant files: they were presumably made thay way in order to preserve their information over time, so you need to try hard not to destroy that conformancy.
But there is now plenty of SGML software, and there are also several conversion programs to filter text from other packages, especially systems like OmniMark.
The Standard Generalized Markup Language is a computer language for writing descriptions of the structure of documents. You can use it either to ensure compliance with a standard (for example, it could be used in an office to guarantee that forms had all the important parts filled in), or to describe how something looks or behaves (for example, it could be used to describe the typography of 16th century books).
The HyperText Markup Language (HTML) for the Web is written in SGML. It describes one possible structure for common office-style documents such as are widely used for documentation or information purposes: title, headings, paragraphs, sections, lists, items in lists, quotations, citations, emphasis, typed examples, figures, illustrations, tables, forms, mathematics, and so on.
It is not a wordprocessor or desktop publishing system. Many people, especially in the journalism and the marketing fields have regrettably been unintentionally misled into thinking that HTML is some kind of remote-control, networked DTP system. It's not: it's a tool for constructing files of information for the Web, and it works by describing (mainly) the structure of your document, rather than its final appearance - the ultimate display of a document is largely up to your readers' browsers and the facilities they have available.
Having said that, there is a lot you can do with HTML3 to produce an attractive display, but you do it by letting the markup work for you, rather than struggling against it (`Use the Source, Luke!'). This document explains how.
Here's how to get started with online access to the Web. You need:
A computer (terminal, PC, Mac, workstation) with a connection to the Internet, which can be done in one of three ways:
You use a terminal program like ProComm, Windows Terminal, xterm, MacTerminal or ZTerm to connect to a computer elsewhere (called your `host') which has direct Internet access. Your computer is connected to this host either
over a modem and phone line to a public or private Internet Service Provider (ISP) for a monthly or per-usage fee; or
over local office cabling to a computer run by your organisation (your organisation may also let you connect over the phone).
This gives you `character-mode' access only (you use typed commands or single keystrokes and a 24-line by 80-character screen or window; graphics and mouse access is not possible) but it's cheap, usually fast (although that depends on your modem) and you don't need to install any software of your own apart from your terminal program.
You have all the Internet services accessible direct from your windowing desktop (MS-Windows, Mac, X Windows etc) at a reasonably high speed, either
over a plain telephone dial-up line (like above) but using SLIP or PPP software provided by your ISP, which makes it work like a permanent Internet connection; or
using a network card in your workstation and local (in-house) networking cables to your computer center, where a data switch gives you Internet access or;
over a private leased data circuit from your local Telecom, which gives you direct permanent access through an ISP (but can cost a packet in many countries).
This way you can run native window-based applications with full graphical support. SLIP or PPP runs slower than a full connection using a network card, but avoids the expense of a direct line, although it is of course only active while your phone call remains connected.
Several ways exist of faking SLIP over a dial-up phone line:
Cyberspace Development Inc's Internet Adapter (TIA): a program which you run immediately on login to what would otherwise be a regular remote terminal connection, and which then takes over the connection and lets you run SLIP software on your PC/Mac by simulating the SLIP packets to and from the host;
SlipKnot, a Web browser which uses Lynx or WWW on your host to retrieve files, and Xmodem to download them to your PC: it does all this silently and displays a graphical interface in MS-Windows.
These are both slower than a real SLIP connection, but are cheaper (ISPs often charge a lot more for SLIP than for plain terminal access because it uses more resources). There are also other similar programs from other companies doing the same task (see the Usenet newsgroup alt.dcom.slip-emulators for more information.
A copy of a suitable Web client (browser) program. The most popular are listed below: there are others listed in the archives at the WorldWideWeb Organisation (W3O).
Arena, available free from the WWW development team at W3C, is the HTML 3.0 development anchorpoint (X Windows only at the moment);
NCSA Mosaic (X Windows, PC/MS-Windows and Mac) is available free from the National Centre for Supercomputing Applications at the University of Illinois at Urbana-Champaign;
Netscape (X Windows, Mac, MS-Windows) is free for personal use from Netscape Communications Corp.
Lynx works with VT-100 terminals (eg PCs/Macs/terminal connected to remote Unix or VMS computers) and comes free from the University of Kansas Computer Centre
www (the original line-mode version) works on most anything, even printing terminals like Teletypes™ and comes free of charge from CERN in Geneva
DOSLynx is a version of Lynx for MS-DOG done by a separate team, but available free from the University of Kansas Computer Centre
A suitable operating system and (if you want a graphical browser) a graphical user interface (GUI) such as the Mac, Microsoft Windows (3.11 or 95), X Windows, DECwindows etc;
`Helper' applications if you want them (get these from the same places as the browser software, usually - many of them are now supplied already built into browser software):
a graphical display program for GIF and JPEG graphics files;
a sound player program for AU or WAV audio files;
a video player program for MPEG movie files.
A little patience :-)
If you are in an organisation with a computer center or person responsible for networking, ask them first if they already have a copy of any of the above programs already installed. If so, you're in luck: if not, you'll have to retrieve one of them by anonymous ftp and install it on your computer. A little foreknowledge of computers and networking is required, but not much, as most of the programs are available precompiled for popular operating systems, but you do need to know how to download the files and unwrap them into the right place on your computer.
In extreme circumstances you can retrieve individual documents by electronic mail by using the WebMail server. To do so, you need to know the URL (Uniform Resource Locator) of the file you want. This is a kind of extended file specification, in the format:
for example (this file):
When you know the URL you want, send a one-line mail message to firstname.lastname@example.org saying
where you replace the url with the full URL of the document, for example:
Don't include any other text in the message. The file will be
retrieved and sent back to you in HTML format and as a UUencoded
representation of on-screen formatting. You can omit the UUencoded
part by using the keyword
GET instead of