by Joe Gillespie — Sep 1, 2003


Love them or hate them, the Web design business is awash with them and very often they are used without even knowing DTD, XML, XHTML, SHML and many more. Although these abbreviations are marginally easier to remember than the words they represent, I still find it difficult to type SMTP instead of STMP without saying 'Simple Mail Transport Protocol' out loud and punching the keyboard at the start of each word.

And is that DTD or DDT?

The fact is that many creative people suffer from dyslexia to a greater or lesser degree and getting the order of characters incorrect in acronyms is one of the first symptoms. The strange thing is that I don't show any symptoms of dyslexia when I write with a pen, it's only when I type at a keyboard that I get letter orders reversed in some words.

Never mind. This month we are going to look at one of those acronyms – XHTML. When you understand the concept that the abbreviation refers to, it is easier to remember. If you know what HTML is, and you should if you are reading this editorial, it's just a matter of sticking an 'X' in front and knowing what the extra character represents.

Still confused? Read on for four easy steps to XHTML.

Four months ago, I changed the WPDFD editorial page from a tables-based layout to one using just CSS. I haven't had any reports of problems with the page so, last month I took the next step. I changed over from using HTML 4.01 to XHTML 1.0 without saying anything to see if anyone noticed. Nobody complained. Excellent!

Now the question is, if there's little difference between HTML and XHTML, why bother? To answer that we have to look at how HTML has evolved over the past ten years or so. I touched on the subject in last month's article when I discussed the concept of 'documents' and how Netscape and Microsoft added their own 'enhancements' to HTML to try to make their browsers more attractive to surfers than each other.

Very soon, some 'features' that worked in one browser, didn't work in the other and designers had to resort to the dubious practice of 'browser sniffing' to deliver pages that would work in either. HTML was diverging along two separate paths and would have continued to do so if the W3C hadn't stepped in and laid down some standards. The idea was that browser developers would conform to these 'official' standards and all browsers would behave in the same predictable way for any given page markup.

One of the main goals was to separate the basic structure of a page from its presentation. The structure is the steelwork and the presentation, the cladding. Although some presentational elements such as <font face> still remain in HTML for backward compatibility, they are going to disappear eventually and their use is now discouraged. It is much better to use Cascading Style Sheets for Web typography. CSS offers so much more control than HTML was ever intended to handle leaving HTML to do the job it was meant to do.

HTML is now at version 4.01 although some people still use the older 3.2 so that their pages validate with legacy markup.

HTML 4.01 comes in three flavors - Transitional, Strict and Frameset. Transitional allows more 'old habits' and is more forgiving of browser quirks than Strict. If I try to use the dreadful Netscape only <blink> tag, I have to go way back to a HTML 2.0 DocType to get it to validate.

Frameset is not as popular these days as it was a few years back. Frames have fallen out of favour because of their many drawbacks – such as bookmarking and indexing by search engines.

With version 4.01, HTML has gone just about as far as it can go and now, XHTML 1.0 carries on where it left off.

The X before HTML means 'eXtensible' but also alludes to its XML parentage. I know that all these acronyms can get confusing but you can find the official explanations of XHTML and XML on the W3C site if you want to wade through them but the more relevant questions are probably, "What's in it for me?" and "What do I have to do that I didn't do before?" Those are the questions I’m going to cover here.

What's in it for me?

In the short term, probably not a lot, but the Web is moving very quickly and you always have to think ahead. If using XHTML only helps to tighten-up your markup, getting rid of ambiguities and sloppy coding, then it's already improving the robustness of your pages across browsers.

Older browsers have become big and bloated, partly because they have to be very forgiving of poor markup and pages that were built some time ago containing deprecated elements. More modern browsers are smaller and faster but they do this at the expense of not being so forgiving of invalid code. The use of valid XHTML helps to guarantee that any pages you build now will work well into the foreseeable future in all current and future browsers.

And the 'extensible' bit? Well, HTML evolved much in the way a farmhouse sprawls out over the years. You start with a basic structure, then a few year later, add-on an extra living room and a few years later stick a new bedroom on top of that. Later still, the kitchen is extended and a few outhouses added, then another bedroom... After a number of years, the original house is unrecognisable and with all the bits stuck on as afterthoughts, looks horrible and doesn't work as efficiently as a house designed with those extra facilities from the outset.

Contrast that with a modern 'organic' home that is designed to grow with the family. It will have been designed as a large house even though only part of it is actually built at the outset. When it comes time to extend it, all the utilities will be in the right place and new rooms can be slotted in with minimal compromise to the original design.

So, 'extensible' means that it is designed to be added to at the outset – giving flexibility without compromise. Although it might take a little extra effort to build your Web pages with XHTML at first, it is much more 'futureproof' than HTML.

What do I have to do that I didn't do before?

Migrating your current HTML 4.01 markup to XHTML 1.0 is surprisingly easy. Last month, I converted this page (which is quite long) to XHTML in a couple of minutes using BBEDit's 'Search and Replace' although doing it completely manually would be no big deal.

The first thing to do is to replace the HTML 4.01 DOCTYPE with the corresponding XHTML 1.0 equivalent - Transitional, Strict or Frameset.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

Having done that, the BBEDit's syntax checker immediately tells me what is not valid XHTML 1.0 markup. Other editors have similar facilities.

Where HTML 4.01 doesn't give a hoot whether tags are upper case, lower case or an arbitary mixture of both, XHTML requires them to be lowercase only, thereby forcing consistency. As I already use lower case tags in my HTML markup, that's not a problem.

The next thing is to do with closing tags. Most elements in HTML have an opening tag <head> and a closing tag </head> but some closing tags are optional – and there are a few exceptions.

In HTML, a closing paragraph tag </p> is sometimes omitted and most current browsers understand what to do. To get rid of any abiguities, XHTML 1.0 likes to see things kept tidy and a terminating </p> tag is no longer optional. It's mandatory.

Now, consider the line break <br>. It doesn't really need a closing tag as it doesn't wrap around anything. It has a position, but it doesn't start and end. It just is! XHTML requires every tag to have a corresponding terminator and the only way to satisfy this seemingly paradoxical situation is to start and end the <br> tag at the same time – thus <br/>. That's a space followed by a forward slash. Most browsers will accept <br/>, without the space but it's best to include it for maximum predictability and consistency.

Other examples of HTML elements that don't close in HTML are <img>, <link> and <meta>. Like the <br />, they can be terminated in XHTML with a space and a forward slash.

<img src="../images/logo.gif" alt="Company Logo" width="252" height="125" border="0" align="bottom" />

<link rel="StyleSheet" href="../styles.css" type="text/css" title="Style Sheet" media="screen" />

There's another potential ambiguity that XHTML doesn't allow - all attributes have to be inside quotes. Things like width=252 height=125need to be width="252" height="125". Non numerical attributes such as <td align="left" valign="middle"> have to have the values within quotes too.

Well, there are four simple steps to XHTML. If you aren't using it already, now is a good time to start. The simple fact that you are verifying it will make sure that you less likely to get unpleasant surprises in 'other' browsers but it doesn't mean that you shouldn't still check your pages in as many browsers as you can, there are still plenty of other things that can go wrong!

Del.icio.us Digg Technorati Blinklist Furl reddit Design Float