PreviousSection [ Webmaster , 2 - Guidelines for delivering pages ] SectionNext

Section 2.3 - HTML standard RGD

Top of this screen document.

1 Summary

Note (1999-11-11): PNG is preferred nowadays, because of patent problems with GIF (Unisys).

HTML (HyperText Markup Language) is the document markup language used on the World Wide Web (WWW). In september 1995, there was a so called Internet Draft [1] that describes an HTML standard that appeared to be mature and stable (version 2.0). The Abstract from this standard clearly describes HTML:

The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent. HTML documents are SGML documents with generic Semantics that are appropriate for representing information from a wide range of domains. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information.

HTML has been in use by the World Wide Web (WWW) global information initiative since 1990. This specification roughly corresponds to the capabilities of HTML in common use prior to June 1994. HTML is an application of ISO Standard 8879:1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML).

The `text/html' Internet Media Type (RFC 1590) and MIME Content Type (RFC 1521) is defined by this specification.

RGD HTML documents shall largely follow this standard but also allow certain so called Netscape extensies [4], that can all be found in the HTML version 3.2 draft [2] ("Wilbur").
Consult the WWW server of the W3 consortium for the latest version of the HTML specifications.
By the way: HTML is not that difficult as it may sound from the above description ;-).

Netscape is a so called WWW browser: a browser is a WWW client application with which WWW documents, downloaded from a so called WWW server, can be viewed. Netscape is the most popular browser. According to different estimates, some 75% of the Internet surfers use Netscape.
A careful choice of a number of allowed extensions has been made in order to be able to create well designed documents on the one hand, and stick to the standard as much as possible. The latter is necessary, of course, to be able to easily maintain the documents and ensure that browsers, other than Netscape, can read the documents as well.
The following facts are worth mentioning as well:

Based on the above a careful RGD HTML standard has been formulated.

2 Introduction

HTML pages developed on the RGD shall conform partly to the 2.0 standard [1]. Exceptions and their rationale are described here.

All current WWW browsers support HTML 2.0, including Netscape 1.1 (and up). Netscape will be the WWW browser for the RGD. HTML 2.0, however, lacks a number of presentation markup elements, which are present in version 3.2 [2] and are very important for the proper design of WWW pages.
A number of elements from 3.0, but certainly not all, were already supported by Netscape (although sometimes differently). Furthermore, Netscape has a number of HTML extensions, which are ignored, if you are lucky, by other browsers, such as the popular Mosaic of the character-only Lynx, but can also lead to unexpected and undesirable presentation of a document, or even a crash.

As said, HTML is still evolving, much to the frustation of browser developers, such as Netscape (initiator, by the way, of a number of changes), who implement markup elements and have to change of delete them again when a new version of the specification comes out. This also gives the dilemma for the RGD: on the one hand we want WWW documents according to "standard" HTML (maintainability of pages, broad audience supported), on the other hand we want them to be, by their nice design, attactive and pleasing (user friendliness, ensuring people come back). Reference [3] provides good background information on this "HTML-standard dilemma".

Because the 2.0 draft provides clear guidelines how 2.0 browsers should handle unknown markup elements and attributes (e.g. 3.2 or Netscape specific) a clear, although provisional, RGD HTML standard can be formulated. This standard is described in the next paragraph in the form of a checklist.

Chapter 17 of the very good and complete Internet Handboek ([5]) of Jeroen Vanheste (in Dutch) extensively deals with "correct HTML". The Checklist (2.3.3) refers to a number of so called HTML validators.

3 Checklist

In this paragraph we well give a complete checklist of the items to which the HTML pages on the RGD WWW server should conform. They are divided in a number of sub paragraphs. The paragraph "Content" is important for the webeditor, the others are intended for the webdesigner.

3.1 Content

The person formally responsible for the content of the server (webeditor, webeditor@rgd.nl) should pay attention to the following issues. The HTML documents contain:

  1. A reference to the copyright and legal disclaimer page.
  2. Correct linguistic usage and, in general, no content that is offensive to (groups of) individuals.
  3. Adequate, complete and correct information.
  4. No confidential information (embargo).

3.2 Keywords

In choosing suitable keywords, the analogy with the publication of an article in a scientific journal is appropiate, where the Keywords are usually listed under the Summary.
The user should consult the library (library@rgd.nl) and include keywords in the HTML document. The library maintains a thesaurus of keywords.

  1. Keywords are put, e.g. under a title or summary ("Keywords: ...") and separated with comma's.
  2. Number of keywords: 2-5.
  3. Keywords may consist of more words, e.g. "quarternary geology".
  4. Keywords should be in lower case, unless it denotes a term, where the difference is crucial.
  5. Not too general: "geology" is too general (except in the Home page of the WWW server!), but "quarternary geology" not.
  6. Use dutch keywords if the document is located in the /dutch tree, or english if located in the /english tree.
  7. Verify which keywords are already used in comparable documents, thus ensuring overall consistency. Always consult the library.

3.3 General

Some browsers (Netscape, Mosaic and Lynx) and tools (weblint) are mentioned here: see the section on Tools (3.4) in the next chapter.

  1. Use the common template: see the Example pages (2.3.4).
  2. Creating directories in the document tree should be checked first with the webmaster.
  3. The use of symbolic links, which are used e.g. to get short URL's, should be checked first with the webmaster as well.
  4. Conformance to the HTML 2.0 standard [1]. The following extensions and situations are permitted:
    1. Text wrapping/flowing of <IMG>'s with ALIGN=LEFT or RIGHT.
    2. Netscape extensions, such as foreground and background colors and bitmaps, are permitted if they:
      • do not lead to strange presentation behaviour in Mosaic and Lynx
      • miss and equivalent in HTML (irrespective of the version)
      • are mentioned in HTML 3.2 [2] (perhaps only functionally) and can therefore be safely regarded as stable.
      This means that this is not permitted:
      • <CENTER>
        A "proper" HTML equivalent should be used for this, which is also supported by the other browsers, the ALIGN=CENTER attribuut.
      • <TABLE>
        They are not necessary for the toolbar and can always be implemented with a <PRE> section.
  5. The HTML syntax is checked on a routine basis with weblint, which can also be done via the Web, thanks to the the very user-friendly weblint frontend of James Carpenter.
    A final check is always done with the HTML Validation Service from HALSoft (in Europe, use the closer HTML Validation Service mirror in Austria), or the excellent Kinder, Gentler HTML Validator from Gerald Oskoboiny.
    Generally, the following elements will give errors: <BODY> with attributes, CENTER, LEFT and RIGHT alignments and <IMG> with BORDER=0.
  6. The presentation must always be verified with these Reference browsers: Note that character based presentation is supported explicitly (Lynx)!
    The presentation will be verified from time to time with these Optional browsers: It is a goal that the documents are at least "showable" with these browsers, but it is not obligatory.
  7. Hyperlinks (<A>) are checked manually, by loading the document and clicking on all links.

3.4 Graphical issues

The Design principles (sectie 3.3) for the RGD WWW site have led to the following graphical characteristics of the HTML pages. See also section 3.3 for information on how tools were used to implement and test the following:

  1. Images should always be GIF format, because this is a universally supported format (also note clickable maps), although in-line JPEG images are supported by more and more browsers. In extreme cases photographic material etc. can be in JPEG (only as in-line image, not in clickable maps).
  2. A screen resolution of 800 x 600 with 256 colors (8-bit) is assumed, which is reasonable considering the graphical capabilities of present PC technology (SVGA) and workstations. Presentation on 1024 x 768 screens (with possibly more colors) is also verified.
    On a monochrome screen or VGA 16 color, a user with a graphical browser should set "image loading" on "off" (hence text only, note that Lynx is explicitly supported!).
  3. An image may not have more than 32 colors (216 colors is the exceptional absolute maximum) and must come from the Netscape 216 color palet.
    At most 10, but preferably <= 5 (or none!) images per HTML page.
  4. Combine images where possible but do not use clickable maps (Lynx). If clickables are necessary, give a text alternative with complete table of contents.
  5. Images must be small: at most 70 x 70. Never put pictures directly in a document, but via so a called thumbnail: a "stamp sized" version of the picture, e.g. 50 x 50, that, when clicked upon, will show the real picture.
  6. Maximum download time of a page must be 25 seconds, calculated as follows: Example: an HTML file of 3200 bytes with 3 gif files of 1200, 1400 and 2000 bytes, respectively.
    Calculation: 2 + 2 + 2 + 4 + 3x1 = 13 Kb. Is < 25, so ok!

3.5 HTML files and directories

HTML files are ASCII files. Images are usally in GIF format an are binary.

  1. Naming of HTML and gif files and of directories.
    The following characteristics are meant to give short, meaningfull and descriptive file names that are not error prone:
    1. fully lower-case
    2. no underscores ("_"), but always dashes ("-") if necessary (use them sparingly)
    3. informative
    4. not too long
    5. not too short or cryptic, so e.g. no (DOS) 8.3 limitation
    6. extension is ".html" (HTML) and ".gif" (gif)
  2. One file name
    All pages, except those in /people, have one name. The dutch version is in the /dutch part of the document tree, the english version in the /english subtree.
  3. https://maryniak.home.xs4all.nl/images
    All images and picture, such as GIF files, are located in directory https://maryniak.home.xs4all.nl/images.
  4. /people
    This is the location of the personal Home pages of RGD employees. The following applies to these pages:
    1. The name is always: welcome.html. This is also the default name on the server, therefore the URL of the directory where the page is located, will suffice.
    2. The home pages are located in the directory /people/e-mail-address-of-person/. The "e-mail-address-of-person" conforms to the e-mail address scheme of the RGD.
      Thus, the URL "http://www.rgd.nl/e.maryniak/" is the home page of "e.maryniak@rgd.nl". It is not necessary to use the full name (http://www.rgd.nl/e.maryniak/welcome.html).
    3. Home pages of programs and projects will be set up the same, except that they will appear in two branches (/dutch and /english).
  5. /wwwforms
    Fill-in forms (<FORM>'s), which are sent by e-mail to an RGD address, are all located in this directory. A common setup applies to this directory and the documents therein.
    As an example we take the feedback form (feedback.html):
    1. The form itself is located in (dutch and english version, respectively):
        /wwwforms/dutch/feedback.html
        /wwwforms/english/feedback.html
      Thus, these are the files that have the <FORM>.
      Note that the bilingualism is repeated within /wwwforms, just as in the document root.
    2. The <FORM> in feedback.html has the following form (dutch and english, respectively):
        <FORM METHOD="POST"
         ACTION="/cgi-bin/feedback?/wwwforms/dutch/reply/feedback.html">
        <FORM METHOD="POST"
         ACTION="/cgi-bin/feedback?/wwwforms/english/reply/feedback.html">
      Note that the cgi-bin program and the reply (answer) document have the same name as the form itself.
    3. The reply document is located in the subdirectory reply and contains the following string to indicate the person to whom the filled-in form should be sent:
        <!-- X-RGD-EmailFormTo : webmaster@rgd.nl -->
      In this example it is the webmaster@rgd.nl, which is also the default if either the reply document or this string is absent.
    4. If some of the fields are not filled in, an error message document is present as follows (dutch and english, respectively):
        <INPUT TYPE=HIDDEN NAME="errorcheck" VALUE="1">
        <INPUT TYPE=HIDDEN NAME="errorform"
         VALUE="https://maryniak.home.xs4all.nl/wwwforms/dutch/error/feedback.html">
        <INPUT TYPE=HIDDEN NAME="errorcheck" VALUE="1">
        <INPUT TYPE=HIDDEN NAME="errorform"
         VALUE="https://maryniak.home.xs4all.nl/wwwforms/english/error/feedback.htm">
      The document with the error message is located in the ubdirectory error and again has the same name as the form itself. If the VALUE is not 1 there will be no check if all fields are filled in. This is also the case if the error document does not exist. The two hidden fields could be placed directly below the <FORM>.
    5. With some WWW server/client combinations there is a problem with receiving the last byte of the last field of a <FORM>. This dummy hidden field fixes that problem. It should be placed at the end of the <FORM>, thus, just above the </FORM>:
        <INPUT TYPE=HIDDEN NAME="dummy" VALUE="dummy">
      Thanks to Leo Willems (leo@tunix.kun.nl) for this hint!
  6. Document structure.
    1. HTML pages must preferably be <= 5 screen pages long (based on 800 x 600) and absolutely <= 10.
    2. Big documents, such as books, reports and catalogues, follow the three-layered decomposition, as described in the article of Steven Pemberton How Do You Make an Electronic Journal Readable? [6].
  7. Layout ASCII.
    1. The length of lines in the HTML ascii source should be <= 80. This improves offline readability in a View document source and the maintainability of the pages.
    2. Use indentation where possible and put starting and closing tags as much as possible on single lines. Exception: images, see HTML syntax (below).

3.6 HTML syntax

In this part of the checklist we will describe the hairy details of which HTML syntax constructs are allowed and which are not.

  1. Syntax in general.
    1. Documents must make use of the SGML characteristics of HTML with regard to structural and logical markup. With this we mean that apart from the obligatory tags, such as <HTML>, <HEAD>, <BODY> and <TITLE>; the following tags must also be used is applicable:
      1. <H1> .. <Hn> and <P> for documents with chapter and section structure: therefore, graphical tricks, such as <FONT SIZE=> and images to indicate (titles of) sections, must not be used. This is to facilitate later incorporation of the logical elementen of text documents in relationele databases and for proper indexing. Note that SGML (and therefore HTML) is for documents what SQL is for relational databases is, thus HTML markup should be used appropiately.
      2. <ADDRESS> for addresses and other contact information
      3. <OL>, <UL> and <DL> for ordered lists, (bulleted) enumerations and descriptive lists, respectively. Use them adequately! Do not use more than one (1) <DD> in a <DT> of a <DL>.
      4. <PRE>, <CODE>, <SAMP>, <VAR> and <KBD> for listings of code and descriptions of input.
    2. Do not skip header levels, e.g. do not go from <H1> to <H3>.
    3. Use logical markup <EM>, <STRONG> and <CODE> in stead of physical markup <I>, <B> and <TT> unless typographical markup is necessary.
    4. Markup elements (tags) and attributs of tags must ALWAYS BE UPPERCASE. This is especially important for CGI programs, where a METHOD=POST, when put in lower-case, will not be understood by the server. Furthermore, this improves readability and maintainability, because the separtion of text and markup is clear. Apart from that, there are a number of server scripts being developed (keyword indexers, amongst others), that depend on the fact that the HTML tags and attributs are uppercase.
    5. Do not use empty or "weird" tags, such as: <>, <!> and </>.
  2. Meta information.
    1. Autthors (X-RGD-CreatedBy and X-RGD-LastModBy) are mentioned with their full name and e-mail address (in parentheses).
    2. Dates (X-RGD-CreatedOn en X-RGD-LastModOn) are in ISO 8601:1988 format (yyyy-mm-dd). Times, when mentioned, will also confrom to ISO 8601:1988 format (hh:mm:ss), with hours (hh) ranging from 0 to 24. Example: 20:42:05.
    3. The <TITLE> and X-RGD-Contents must be identical. A <TITLE> must have a descriptive and context independent content, e.g.: Introduction research program Marine Geology and not just Introduction.
  3. Images (<IMG>)
    1. Use the ALT tag for the benefit of Lynx users and for those with "image loading" set to "off" in a graphical browser. If an ALT is not necessary, then still make the [IMAGE] disappear with ALT=" "; note that the space between the quotes is important!
    2. A row of buttons is indicated with ALT="{button text}", thus "{}" for a button.
    3. Let the first tag always be SRC, thus: <IMG SRC="..." ALT="..."> to be followed with other attributes if present.
    4. If an images and the adjacent text have the same anchor, they should be put in 1 anchor. The image will then have a ALT=" ", because there is no one needed for the image, because of the presence of anchored text.
    5. The use of (too big) images should be avoided as much as possible because of long download times for clients. Keep in mind that a lot of users still have a 14K4 modem and many universities (especially those across the ocean) will effectively operate at the same speed when the Internet is busy.
    6. If the <BODY> section has a BACKGROUND tag, then a BGCOLOR tag, coded black (000000), should be added. This will force a graphic flush. Otherwise, some PC's with Netscape will show white patches on a dark background. For an example, see the start page in the section on Example pages (2.3.4).
    7. There should be no carriage return (or other whitespace) in <IMG>'s that are anchored (<A>) and are supposed to appear adjacent. Otherwise, a small hyperlinked spate will appear at right of the image, which does not look nice. Usually, this will mean that a that should be no space or carriage return after a closing tag </A>.
  4. Special characters.
    1. A special character always starts with '&' and ends with ';'. Thus '&lt;' for the '<' character, that would otherwise have a special meaning in terms of HTML. This often happens when displaying source listings (<PRE> and <CODE>), but in normal text sometimes as well.
      The text between '&' and ';' must always be lower-case, thus do not use '&GT;'.
    2. Always use symbolic references (so called "mnemonics") for special characters, as listed in appendix B of [1]. Here is a list of often used special characters and their mnemonic:
      Special character          Mnemonic code
              &                  &amp;    
              <                  &lt;     
              >                  &gt;     
       no-break space            &nbsp;
              ©                  &copy;   
              ®                  &reg;    
              à                  &agrave; 
              á                  &aacute; 
              ä                  &auml;   
              è                  &egrave; 
              É                  &Eacute; 
              é                  &eacute; 
              ë                  &euml;   
              ï                  &iuml;   
              ö                  &ouml;   
              ü                  &uuml;   
      
    3. Do not use characters above code 127 in HTML source (with the ALT key). Use a mnemonic for special characters, as listed in the table above. Also, do not use the TAB character and control characters, such as ^F.
  5. Specific tags.
    1. <CENTER> was Netscape specific and is permitted in 3.2, but is not necessary: headers (<H1> etc.) and paragraphs (<P>) all have an ALIGN=CENTER attribute that is understood by all browsers and is better.
    2. Paragraphs in the <BODY> must always start with <P>. A <P> indicates the beginning of a paragraph, it is not a paragraph separator or "strong" newline!
    3. <BR> serves as a newline within a paragraph <P> and must be used to artificially create vertical space. If really necessary, use an empty <PRE>-block, or paragraph with just a non breakable space.
    4. <PRE>-formatted text.
      Do not use structural tags, such as <P> and <H3>, in preformatted text, because this can lead to strange effects. Only hyperlinks (<A>) and special characters may appear in preformatted tekst.
      Never just put a block of ascii text in a <PRE>-block! Always make special characters of '&', '<', '>' etc.
    5. <!-- Comment -->.
      Each comment, whether or not it appears directly below another one, must always be indicated separately between <!-- and -->, on each line, which is in contrast to the convention used in many free format programming languages.
    6. Do not forget to put the "URL", occuring in a hyperlink (<A>), between quotes. Especially the closing quote is often forgotten and can lead to strange browser behaviour. Thus: <A HREF="URL">.

3.7 HTML style

Reading documents from a screen and the nature of HTML (hypertext) demand a special approach in "HTML-ification" of documents. Steven Pemberton's article [6] is a good starting point for determining the optimal "human computer interface" for documents.

Furthermore, one should should pay attention to:

  1. Avoid saying "Click on this or that".
    This is browser specific: with a command-line browser, such as Lynx, it is not possible to click on hypertext links.
  2. Avoid saying "Click here" or "Select here".
    This is browser specific and furthermore, in an index, based on <A>-anchored text, the term "here" is meaningless!
  3. Avoid saying "Back to the Home Page".
    If the reader has gone directly to the page, the word Back is not appropriate.
  4. Avoid saying "Home Page".
    Try to give the reader some context to where they are in cyberspace. E.g. "RGD Home Page".
Please note that this list is not complete.
You are referred to good books about HTML, such as those of Laura Lemay [7].

4 Example pages

Apart from the top home page and the Dutch and English "sub" home page, there are two document types on the RGD WWW server:

  1. Navigation documents, e.g. the info page.
  2. Actual documents, e.g. this RGD webmaster handbook of the report CCCEE - Climate Change and Coastal Evolution in Europe
The "actual" documents follow the threey-layer model as described in in the article of Steven Pemberton How Do You Make an Electronic Journal Readable? [6]. The framework of these documents is automatically generated by a quickly hacked up Perl script, written by Eric Maryniak, pemberton.pl. For smaller documents the three layers have been combined in one HTML file.

Use the "View source" facility to see what the source of these HTML files looks like!

5 Literature

[1] Hypertext Markup Language - 2.0.
T. Berners-Lee & D. Connolly.
MIT/W3C. September 22, 1995.
URL: http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_toc.html
[2] HyperText Markup Language Specification Version 3.2 (W3C, draft)
D. Raggett.
W3C. September 9, 1996.
URL: http://www.w3.org/pub/WWW/MarkUp/Wilbur/
[3] Appendix B: Netscape/HTML 3.0. [from an HTML guide]
Case Western Reserve University.
URL: http://www.cwru.edu/help/introHTML/AppB.html
[4] Netscape's extensions to HTML 2.0 and Netscape's commitment to open standards
Netscape Communications
URL: http://home.netscape.com/assist/net_sites/html_extensions.html en
URL: http://home.netscape.com/newsref/std/standards_qa.html
[5] Het Internet Handboek (Dutch)
Jeroen Vanheste
Addison-Wesley. 1995.
URL: http://www.tunix.kun.nl/handboek.html
[6] How Do You Make an Electronic Journal Readable?
Steven Pemberton
URL: Publishing on the World Wide Web. Conference book. Autumn 1995.
URL: SIGCHI Bulletin (online from januari 1996)
[7] Various books on HTML from Laura Lemay
Laura Lemay
URL: Laura's Web Zone.

 Keyword search in this handbook


PreviousSection [ Webmaster , 2 - Guidelines for delivering pages ] SectionNext