1 Summary
Note (1999-11-11): PNG is preferred nowadays, because of patent problems with GIF (Unisys).
HTML (HyperText Markup Language) is the document markup language used on the World Wide Web (WWW). In september 1995, there was a so called Internet Draft [1] that describes an HTML standard that appeared to be mature and stable (version 2.0). The Abstract from this standard clearly describes HTML:
RGD HTML documents shall largely follow this standard but also allow certain so called Netscape extensies [4], that can all be found in the HTML version 3.2 draft [2] ("Wilbur").The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent. HTML documents are SGML documents with generic Semantics that are appropriate for representing information from a wide range of domains. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information.
HTML has been in use by the World Wide Web (WWW) global information initiative since 1990. This specification roughly corresponds to the capabilities of HTML in common use prior to June 1994. HTML is an application of ISO Standard 8879:1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML).
The `text/html' Internet Media Type (RFC 1590) and MIME Content Type (RFC 1521) is defined by this specification.
;-)
.
Netscape is a so called WWW browser: a browser is a WWW client
application with which WWW documents, downloaded from a so called WWW server,
can be viewed. Netscape is the most popular browser.
According to different estimates, some 75% of the Internet surfers use
Netscape.
A careful choice of a number of allowed extensions has been made in order
to be able to create well designed documents on the one hand, and stick
to the standard as much as possible.
The latter is necessary, of course, to be able to easily maintain the
documents and ensure that browsers, other than Netscape, can read the
documents as well.
The following facts are worth mentioning as well:
Based on the above a careful RGD HTML standard has been formulated.
HTML pages developed on the RGD shall conform partly to the 2.0 standard [1]. Exceptions and their rationale are described here.
All current WWW browsers support HTML 2.0, including Netscape 1.1 (and up).
Netscape will be the WWW browser for the RGD.
HTML 2.0, however, lacks a number of presentation markup elements, which
are present in version 3.2
[2]
and are very important for the proper design of WWW pages.
A number of elements from 3.0, but certainly not all, were already supported
by Netscape (although sometimes differently). Furthermore, Netscape has a
number of HTML extensions, which are ignored, if you are lucky, by other
browsers, such as the popular Mosaic of the character-only Lynx, but can
also lead to unexpected and undesirable presentation of a document, or even
a crash.
As said, HTML is still evolving, much to the frustation of browser developers, such as Netscape (initiator, by the way, of a number of changes), who implement markup elements and have to change of delete them again when a new version of the specification comes out. This also gives the dilemma for the RGD: on the one hand we want WWW documents according to "standard" HTML (maintainability of pages, broad audience supported), on the other hand we want them to be, by their nice design, attactive and pleasing (user friendliness, ensuring people come back). Reference [3] provides good background information on this "HTML-standard dilemma".
Because the 2.0 draft provides clear guidelines how 2.0 browsers should handle unknown markup elements and attributes (e.g. 3.2 or Netscape specific) a clear, although provisional, RGD HTML standard can be formulated. This standard is described in the next paragraph in the form of a checklist.
Chapter 17 of the very good and complete Internet Handboek ([5]) of Jeroen Vanheste (in Dutch) extensively deals with "correct HTML". The Checklist (2.3.3) refers to a number of so called HTML validators.
In this paragraph we well give a complete checklist of the items to which the HTML pages on the RGD WWW server should conform. They are divided in a number of sub paragraphs. The paragraph "Content" is important for the webeditor, the others are intended for the webdesigner.
3.1 Content
The person formally responsible for the content of the server (webeditor, webeditor@rgd.nl) should pay attention to the following issues. The HTML documents contain:
3.2 Keywords
In choosing suitable keywords, the analogy with the publication of an
article in a scientific journal is appropiate, where the
Keywords are usually listed under the Summary.
The user should consult the library
(library@rgd.nl)
and include keywords in the HTML document.
The library maintains a thesaurus of keywords.
/dutch
tree, or english if located in the
/english
tree.
3.3 General
Some browsers (Netscape, Mosaic and Lynx) and tools (weblint) are mentioned here: see the section on Tools (3.4) in the next chapter.
<IMG>
's with
ALIGN=LEFT
or RIGHT
.
<CENTER>
ALIGN=CENTER
attribuut.
<TABLE>
<PRE>
section.
<BODY>
with attributes,
CENTER
, LEFT
and RIGHT
alignments and
<IMG>
with BORDER=0
.
<A>
) are checked manually, by
loading the document and clicking on all links.
3.4 Graphical issues
The Design principles (sectie 3.3) for the RGD WWW site have led to the following graphical characteristics of the HTML pages. See also section 3.3 for information on how tools were used to implement and test the following:
3.5 HTML files and directories
HTML files are ASCII files. Images are usally in GIF format an are binary.
.html
" (HTML) and ".gif
"
(gif)
/people
, have one name.
The dutch version is in the /dutch
part of the document
tree, the english version in the /english
subtree.
https://maryniak.home.xs4all.nl/images
https://maryniak.home.xs4all.nl/images
.
/people
welcome.html
. This is also the
default name on the server, therefore the URL of the directory
where the page is located, will suffice.
/people/e-mail-address-of-person/
.
The "e-mail-address-of-person
" conforms to the
e-mail address scheme
of the RGD.
http://www.rgd.nl/e.maryniak/
"
is the home page of "e.maryniak@rgd.nl
".
It is not necessary to use the full name
(http://www.rgd.nl/e.maryniak/welcome.html
).
/dutch
and /english
).
/wwwforms
<FORM>
's), which are sent by e-mail
to an RGD address, are all located in this
directory. A common setup applies to this directory and the
documents therein.
feedback.html
):
/wwwforms/dutch/feedback.html
/wwwforms/english/feedback.html
<FORM>
.
/wwwforms
, just as in the document root.
<FORM>
in feedback.html
has the following form (dutch and english, respectively):
<FORM METHOD="POST"
ACTION="/cgi-bin/feedback?/wwwforms/dutch/reply/feedback.html">
<FORM METHOD="POST"
ACTION="/cgi-bin/feedback?/wwwforms/english/reply/feedback.html">
reply
and contains the following string to indicate
the person to whom the filled-in form should be sent:
<!-- X-RGD-EmailFormTo : webmaster@rgd.nl -->
webmaster@rgd.nl
, which
is also the default if either the reply document or this string
is absent.
<INPUT TYPE=HIDDEN NAME="errorcheck" VALUE="1">
<INPUT TYPE=HIDDEN NAME="errorform"
  VALUE="https://maryniak.home.xs4all.nl/wwwforms/dutch/error/feedback.html">
<INPUT TYPE=HIDDEN NAME="errorcheck" VALUE="1">
<INPUT TYPE=HIDDEN NAME="errorform"
VALUE="https://maryniak.home.xs4all.nl/wwwforms/english/error/feedback.htm">
error
and again has the same name as the form itself.
If the VALUE
is not 1
there will be no
check if all fields are filled in. This is also the case if the
error document does not exist.
The two hidden fields could be placed directly below the
<FORM>
.
<FORM>
. This dummy hidden field fixes that
problem. It should be placed at the end of the
<FORM>
, thus, just above the
</FORM>
: <INPUT TYPE=HIDDEN NAME="dummy" VALUE="dummy">
3.6 HTML syntax
In this part of the checklist we will describe the hairy details of which HTML syntax constructs are allowed and which are not.
<HTML>
, <HEAD>
,
<BODY>
and <TITLE>
;
the following tags must also be used is applicable:
<H1>
.. <Hn>
and
<P>
for documents
with chapter and section structure: therefore, graphical
tricks, such as <FONT SIZE=>
and images
to indicate (titles of) sections, must not be used.
This is to facilitate later incorporation of the logical
elementen of text documents in relationele databases and
for proper indexing.
Note that SGML (and therefore HTML) is for documents
what SQL is for relational databases is, thus HTML markup
should be used appropiately.
<ADDRESS>
for addresses and other
contact information
<OL>
, <UL>
and
<DL>
for ordered lists, (bulleted)
enumerations and descriptive lists, respectively.
Use them adequately!
Do not use more than one (1)
<DD>
in a <DT>
of a <DL>
.
<PRE>
, <CODE>
,
<SAMP>
, <VAR>
and <KBD>
for listings of code and
descriptions of input.
<H1>
to <H3>
.
<EM>
, <STRONG>
and
<CODE>
in stead of physical markup
<I>
, <B>
and
<TT>
unless typographical markup is necessary.
ALWAYS BE UPPERCASE
.
This is especially important for CGI programs, where a
METHOD=POST
, when put in lower-case, will not be
understood by the server.
Furthermore, this improves readability and maintainability,
because the separtion of text and markup is clear.
Apart from that, there are a number of server scripts being
developed (keyword indexers, amongst others), that depend on
the fact that the HTML tags and attributs are uppercase.
<>
, <!>
and
</>
.
X-RGD-CreatedBy
and
X-RGD-LastModBy
) are mentioned with their full name
and e-mail address (in parentheses).
X-RGD-CreatedOn
en
X-RGD-LastModOn
) are in ISO 8601:1988
format (yyyy-mm-dd
).
Times, when mentioned, will also confrom to ISO 8601:1988 format
(hh:mm:ss
), with hours (hh
) ranging
from 0 to 24.
Example: 20:42:05
.
<TITLE>
and X-RGD-Contents
must be identical.
A <TITLE>
must have a descriptive and context
independent content, e.g.:
Introduction research program Marine Geology and not
just Introduction.
<IMG>
)
ALT
tag for the benefit of Lynx users
and for those with "image loading" set to "off" in a
graphical browser.
If an ALT
is not necessary, then still make the
[IMAGE] disappear with ALT=" "
; note that the
space between the quotes is important!
ALT="{button text}"
, thus "{}
"
for a button.
SRC
, thus:
<IMG SRC="..." ALT="...">
to be followed with other attributes if present.
ALT=" "
, because there
is no one needed for the image, because of the presence of
anchored text.
<BODY>
section has a
BACKGROUND
tag, then a BGCOLOR
tag,
coded black (000000
), should be added.
This will force a graphic flush.
Otherwise, some PC's with Netscape will show white patches on a
dark background.
For an example, see the start page in the section on
Example pages (2.3.4).
<IMG>
's that are anchored
(<A>
) and are supposed to appear adjacent.
Otherwise, a small hyperlinked spate will appear at right of the
image, which does not look nice. Usually, this will mean that a
that should be no space or carriage return after a closing
tag </A>
.
&
' and ends with ';
'.
Thus '<
' for the '<' character, that
would otherwise have a special meaning in terms of HTML.
This often happens when displaying source listings
(<PRE>
and <CODE>
),
but in normal text sometimes as well.
&
' and ';
' must
always be lower-case, thus do not use
'>
'.
Special character Mnemonic code & & < < > > no-break space © © ® ® à à á á ä ä è è É É é é ë ë ï ï ö ö ü ü
ALT
key).
Use a mnemonic for special characters, as listed in the table
above. Also, do not use the TAB
character and control
characters, such as ^F
.
<CENTER>
was Netscape specific and is
permitted in 3.2, but is not necessary:
headers (<H1>
etc.) and paragraphs
(<P>
) all have an ALIGN=CENTER
attribute that is understood by all browsers and is better.
<BODY>
must always
start with <P>
.
A <P>
indicates the beginning of a
paragraph, it is not a paragraph separator or "strong"
newline!
<BR>
serves as a newline within
a paragraph <P>
and must be used to
artificially create vertical space. If really necessary, use an
empty <PRE>
-block, or paragraph with just a
non breakable space.
<PRE>
-formatted text.
<P>
and <H3>
, in
preformatted text, because this can lead to strange effects.
Only hyperlinks (<A>
) and special
characters may appear in preformatted tekst.
<PRE>
-block!
Always make special characters of '&
',
'<
', '>
' etc.
<!-- Comment -->
.
<!--
and -->
, on each line,
which is in contrast to the convention used in many free format
programming languages.
<A>
), between quotes.
Especially the closing quote is often forgotten and can lead to
strange browser behaviour.
Thus: <A HREF="URL">
.
3.7 HTML style
Reading documents from a screen and the nature of HTML (hypertext) demand a special approach in "HTML-ification" of documents. Steven Pemberton's article [6] is a good starting point for determining the optimal "human computer interface" for documents.
Furthermore, one should should pay attention to:
<A>
-anchored text, the term "here" is
meaningless!
Apart from the top home page and the Dutch and English "sub" home page, there are two document types on the RGD WWW server:
Use the "View source" facility to see what the source of these HTML files looks like!
Keyword search in this handbook