Understanding Elements and TagsWeb documents are created using something called HTML, which, as you may have heard, stands for "HyperText Markup Language." The term markup derives from a technological era of publishing that predates the web: originally it referred to actual marks made by an editor on a paper manuscript that guided the printer or typesetter (notice that editor and printer here refer to people: that's how old this is) regarding how to lay out the document. As electronic publishing developed, "markup" came to refer to particular commands embedded into electronic text that likewise were intended to guide the layout of the document, rather than becoming part of it. HTML is one of many such systems of commands, or languages, that have been developed over the years. HTML, in particular, is intended for describing hypertext documents.
[An additional note for the curious: Unlike most traditional languages for computer programming, which are procedural, HTML is completely declarative (or descriptive). (It isn't a programming language, per se, at all.) If you want to think of this in relation to more familiar languages (like English), what this means is that there are no verbs. Everything is a (possibly qualified) noun phrase that defines what everything is, rather than what to do.]
So what does HTML look like? Consider the code below; this is sufficient that it could be a complete web document. The text is shown like this and the markup is shown like this, so that you can distinguish them. Please bear in mind that in an actual document, both the text and the markup would simply be ordinary characters; what actually distinguishes markup from text, as far as the browser is concerned, is which characters they are. In simplest terms, the markup is anything enclosed in angle brackets (also known as less-than and greater-than signs, "<" and ">"). [There is one special case, also shown below, that we will discuss in a moment.]
| HTML Code | Resulting Page |
|
<HTML> <HEAD> <!-- This is an example --> <TITLE>Example HTML</TITLE> </HEAD> <BODY> <H1 ALIGN="center">Sample Page</H1> <P>This <I>sample page</I> shows what a hypothetical "simple" HTML document might look like.</P> <P>Author: Who Knows?<BR> Last Modified: Too Long Ago...</P> </BODY> </HTML> |
Sample Page This sample page shows what a hypothetical "simple" HTML document might look like. Author: Who Knows? |
If you compare the part of the code shown as text to the text that actually appears in the result, it's pretty easy to see where most everything came from. The markup, on the other hand, doesn't directly appear in the result at all. Instead, the markup describes the text, either by identifying what it is (the role or function it has within the document, such as indicating that a particular bit of text forms a paragraph) or suggesting how it should be presented (such as indicating that the heading should be centered). This pretty much explains how HTML works, too. The browser receives a complete document, and it interprets the markup in order to decide what to do with the text.
When you use a language, you usually don't need to think about it much. But to describe a language to someone else, you need special terminology in order to name the parts of things and explain the rules for how to put them together. To describe English, you need to talk about nouns and verbs and prepositional phrases and dependent clauses, for example. After you have learned the rules (grammar), the terms don't matter much: you may not remember (or care) whether "which" is a relative pronoun or a demonstrative adjective, but things like that are useful when you are learning why "A good was done by this each boy," is an improperly-formed sentence. HTML is the same way. So, while we are talking about HTML, it will be handy to use the proper terminology.
A basic building block in HTML. A tag consists of an open angle bracket (less-than) character, followed by one or more other characters, ending finally with a close angle bracket (greater-than) character. Characters inside a tag are part of the markup. Characters between tags (not inside any tag) are part of the text.
In English, the fundamental unit of structure is the sentence. In HTML, the fundamental unit of structure is something known as an element. An element is an identifiable part of a document that has some specific function. For example, the paragraph element identifies a portion of text that is set off from other text as a complete, self-sufficient unit.
Most elements are containers: they enclose the portion of the document that they describe (not unlike parentheses). These contents can include not only text, but also other elements (like nested parentheses (like this)). The element is considered to include the thing contained (or, to put it another way, the contents are part of the element, just as a parenthetical remark is more than just the parentheses at either end).
An element begins with a tag, which is often nothing more than a keyword enclosed in angle brackets. This keyword is the name of the element. The element ends with another tag, which repeats this name, but with a slash in front of the keyword. Thus the element is enclosed in a pair of tags. The first tag is called the start tag, the final tag is the end tag, and the element itself constitutes both start and end tags and everything in between. An example of this is shown below.
This example shows the paragraph element. The element name, P, appears in both the start and end tags. [Note: There are cases where the end tag of one element is implied by the start tag of a related element that follows it; in such cases, the end tag of the preceeding element is allowed to be omitted. The P element, when followed immediately by another P element, happens to be one such case. However, it is not wrong to include the end tag, even when it is not required, and doing so is probably a good practice for the beginner--it will help prevent mistakes.]
Some elements are objects all by themselves, and cannot be containers for other things. These are referred to as empty elements. Notice that this term is not used for a container element which simply happens to be empty (i.e., the start tag immediately follows the end tag with nothing in between), or for one where the end tag is optional, but only for an element which can never be a container. Such elements have only a start tag; no end tag is ever used. These are relatively uncommon (as stated above, most elements are container elements), and it is not difficult to remember which few elements are empty. An example of an empty element is the <BR> in the first example at the top of the page. This is the break element, and it is used to cause a line break in the layout.
Sometimes the markup for an element needs to include more information than just the element's name. There may be several such pieces of information: each is referred to as an attribute. If you think of the element name as a noun, then attributes serve a function analogous to adjectives, adverbs, and prepositional phrases. In other words, they serve as modifiers that elaborate upon, refine, specify, or limit the meaning of the element.
An example of an attribute occurs in the HTML code sample given at the top of this page. In the heading element, which begins:
The 'ALIGN="center"' part is an attribute which modifies the H1 element. The keyword, ALIGN, is known as the attribute name. The portion in quotes, "center", is known as the attribute value. (While most attributes look something like this, with a keyword, an equals sign, and a value, some attributes only have a name. This type is less common.) The attribute value can be a number, a keyword from a limited set of possible choices, or it may be a more general character string enclosed within double quotes. Values that are numbers or special keywords do not have to be enclosed within quotes, but it is always legal to do so, and it saves you from having to remember which values need quotes and which ones don't.
Attributes are always placed in the start tag (never the end tag), following the element name. One element can be modified by several different attributes. The attributes are separated from the element name and from each other by spaces, like words in a sentence. Different elements have different sets of attributes which can be used. For some elements, there are attributes which must be included every time the element is used. Such attributes are said to be required. All of the other attributes which can be used with that element are considered optional. For many element types (such as, for example, the paragraph or P element introduced earlier), all of the possible attributes are optional.
Like many other computer languages, HTML provides a way to insert comments into a document. These are generally intended as notes from an author to himself, or to other HTML authors who might examine or further modify the HTML code. Comments can be used to keep track of the modification history of a document, or to explain the intended function of codes whose purpose may not be obvious. Comments look like the following:
Notice that the beginning and ending sequences are not symmetrical. The double hyphens ("--") are part of the comment markers; they are required.
The final bit of HTML syntax you need to know concerns what to do with special characters. Since an HTML file is just text, characters other than the ones you would find on a keyboard need some special way of being denoted. Also, characters that would ordinarily be taken to have a special meaning, like the angle bracket characters, must also have a way of being inserted when you want to use them as just normal characters.
HTML Provides for this with something called a character entity. All character entities begin with an ampersand character ("&") and end with a semicolon (";"). In between, an abbreviated name indicates which character should be inserted. There are a number of these character entities, but there are only four that are used often enough to be worth memorizing:
(Character entities are the only markup that doesn't begin and end with angle brackets. Also, unlike other markup, since they result in a character being inserted into the document, they are considered in that sense to be part of the text, which other markup is not.)