File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes HTML, CSS and JavaScript and the fly likes XHTML documents vs. XHTML syntax in HTML documents Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » HTML, CSS and JavaScript
Bookmark "XHTML documents vs. XHTML syntax in HTML documents" Watch "XHTML documents vs. XHTML syntax in HTML documents" New topic

XHTML documents vs. XHTML syntax in HTML documents

Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63868

I noticed that throughout your discussions that you have emphasized the use of XHTML over HTML.

I know that many people like and use XHTML syntax, but in pages that are DOCTYPEd as HTML 4 because of layout limitations that they have encountered trying to use pages DOCTYPEd as actual XHTML.

Can I assume that almost any type of layout that people would like to accomplish are covered using actual XHTML pages and the patterns in your book?

Are there layout configurations that are just not possible when using actual XHTML?

Any other down-sides to declaring pages as XHTML, or is it all thumbs up?

[Asking smart questions] [About Bear] [Books by Bear]
Mike Bowers
Ranch Hand

Joined: Oct 08, 2007
Posts: 42
Should I use XHTML documents or XHTML syntax in HTML documents?

This is a great question. I discuss this important topic on pages 38-40 of my book, Pro CSS and HTML Design Patterns.
After several years of pondering this issue and trying many different approaches, I have developed a pragmatic opinion.

I code everything in XHTML and I validate it as XHTML. I do this because XHTML is future compatible and allows XML parsers, such as XSLT, to manipulate documents. Most importantly, though, XHTML guarantees a browser will interpret the hierarchical structure of my code exactly as I intended. This is essential for CSS because CSS selectors choose XHTML elements based on their location in the document hierarchy. The problem is that HTML allows you to take shortcuts, such as using <p> without </p> and these shortcuts cause different browsers to interpret the hierarchy of your code differently. (You can read about this in more detail and see examples of this in my book.) This problem is eliminated when you use valid XHTML and this eliminates a lot of troubleshooting in your CSS code!

I strongly recommend that everyone who uses CSS use XHTML. There are no limitations to coding in XHTML -- other than the inconvenience of using a more exacting syntax.

So once you code your pages in XHTML, what MIME content type should you use?

The HTTP protocol requires a web server to specify what type of document it is delivering to the browser. This is the MIME content type. The idea is that a browser will know the type of document so it can display it properly. HTML documents have a MIME type of "text/html".

The W3C allows XHTML documents to be represented by four different document types: "text/html", "application/xhtml+xml", and "application/xml", and "text/xml". This has created no end of confusion. How do we know which MIME type is better?

If you want your web pages to work in all the major browsers, there is only one answer: "text/html". This is the only answer because all major browsers have problems displaying XHTML documents when the MIME type is anything other than "text/html". On the other hand, all major browsers have no problems displaying XHTML documents when the MIME type is "text/html" � as long as you put an extra space between the element name and the closing slash in singleton elements like <br />.

In other words, I use XHTML with a MIME type of "text/html". Sadly, any other choice simply doesn't work in the real world.

For example, if you use one of the XML MIME types for your XHTML document, Internet Explorer (versions 6 and 7) has major problems displaying it. The page renders two to three times slower than HTML; IE ignores interactive elements like buttons; and IE displays all elements inline. The result is a real mess! Since IE still owns 85% of the market, I prefer to deliver my web documents as "text/html" so they are rendered correctly.

Even Firefox has problems rendering documents with an XHTML MIME content type. For example, when it gets an XHTML MIME type from the server, it downloads the entire XHTML document before it begins rendering so it can validate the document first. This is a performance problem because the user is used to seeing a page load incrementally. Even more importantly, if the document has even the smallest error, Firefox won't render it! This requires all web pages to be 100% valid XHTML, and it requires us never to make a mistake and never to forget to validate no matter how hectic things get. All it takes is one tiny typo and Firefox won't display the web page � unless it is delivered with a MIME type of "text/html".

My book, Pro CSS and HTML Design Patterns, contains much more information and examples about XHTML and MIME content types. If you are the kind of developer that cares about these issues, you'll love my book. I wrote this book for developers because all other books on CSS were written for designers and didn't have the in depth coverage needed by developers.

To get a feel for the breadth of my book and its quality, you can examine hundreds of examples from my book at
[ October 13, 2007: Message edited by: Mike Bowers ]
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63868

Thanks Mike, you addressed the MIME type of "text/html" well, but what of the actual DOCTPYE declaration on the page?

What's the best cross-browser setting between XHTML, HTML 4 Strict, HTML 4 Transitional. or what have you?
Mike Bowers
Ranch Hand

Joined: Oct 08, 2007
Posts: 42
What doctype should I use for XHTML documents?

The short answer is that I use the following doctype in all my web pages:

I also recommend using an HTML and CSS validator that ignores the doctype in my HTML documents and lets me choose what I want to validate. That way, I can use this doctype to trigger the correct browser rendering while including non-standard elements and attributes in my markup so I can take advantage of browser innovations in both HTML markup and CSS properties.

Here is the long answer...

This doctype specifies the type of document as "html" with a version of "XHTML 1.0 Transitional". It also specifies the location of the Document Type Definition (DTD) file that should be used for validating the document. In this doctype the location is the W3C website at "".

Doctype is related to the MIME content type, but has a completely different purpose. The MIME content type alone defines the type of document. It is set by the web server using the HTTP protocol when a document is downloaded. In spite of its name, a doctype has nothing to do with the identifying the type of a document.

A browser uses the MIME content type to determine what "driver" it will use to parse and display the document. In other words, if you use a MIME content type of "text/html", a browser will load its HTML parser and rendering engine. If you use a MIME content type of "application/xhtml+xml", it will load its XML parser and rendering engine. In other words, a browser will not even try to parse and render a document until it knows what type of document it is, and that information comes exclusively from the MIME type.

So what is a doctype and what is it used for?

A doctype is an element embedded inside an the head of an HTML or XHTML document. The doctype specifies the version of an HTML document and what Document Type Document (DTD) should be used to validate the document.

The doctype comes from the SGML roots of the HTML and XHTML specifications. SGML is the language used to define the HTML, XML, and XHTML languages. SGML is a powerful and complex language for defining markup languages.

SGML parsers look for a doctype element inside a document so they can load a DTD file to validate and parse the document. When an SGML parser encounters a doctype, it can use the DTD to determine the rules it should use for parsing and validating the document.

In the doctype I listed above, an SGML parser would go to and retrieve the DTD file and use it to parse and validate the document. You can specify any DTD file, and it can be located anywhere as long as it can be downloaded. The DTD can be on your local hard drive, on your own website, on someone else's website, etc.

Validator programs typically read the doctype, retrieve the DTD and use the DTD to validate your document. The W3C has a free validator at It reads the doctype of your document and validates it based on that doctype.

You can even download the DTD files supplied by the W3C and modify them to create your own rules for validating HTML and XHTML documents. In your doctypes, you can specify your own custom file as the DTD that should be used for validation. This is a perfectly appropriate technique. This is the original purpose of the doctype. If you are a programmer, DTDs are not hard to understand. Any good book on XML will show you how to create a DTD. All you need is a validator program, such as XML spy. You can specify any DTD in your doctype and XML spy will retrieve it from the location specified in the doctype and it will validate your document using it.

For example, there are many HTML and XHTML elements and attributes that the W3C has deprecated. Most of these are deprecated for good reasons and I don't recommend using them. But a few elements and attributes are fully supported by browsers and should not have been deprecated or were never included in the W3C specifications for political reasons. There is nothing wrong with including these elements and attributes in your documents. By using custom DTDs, you can use these elements and attributes in your documents and still validate your documents. Of course, be sure to test non-standard elements and attributes in all browsers to be sure they work the way you want.

In other words, a valid document is valid when it validates against a DTD � any DTD specified by the doctype. It doesn't have to validate against the W3C's DTDs. This is a very important point that the W3C doesn't advertise because they want you to use their standard because they want to be the only standard!

Don't get me wrong. I like the W3C and I like standards, but the W3C is not the only standard. HTML and XML are based on DTDs that can be modified beyond the W3C standard. The Internet succeeded because of the right balance of basic, open standards, and the freedom for anyone to extend these standards.

In fact the Internet is not as exciting as it used to be because the W3C is not in the business of innovating. Its business is standardizing, which by its nature opposes change. Browser vendors did the innovating and the W3C followed behind trying to set a standard upon which they all could agree � the process was political and was full of compromises. Now that Microsoft has conquered the browser market and killed Netscape, innovation is stagnant. Mozilla is not a standard setter, but a standard follower. Once we realize that standards are only a starting point, we can start genuinely innovating again.

What makes things complicated with the doctype is that web browsers are not SGML parsers. They don't follow SGML rules. They don't look up the DTD file at the location specified in the doctype. They don't use the DTD to validate documents. Instead, they use their own rules for parsing HTML -- and each browser vendor has different rules. Until CSS entered the picture, browsers completely ignored the doctype!

Unfortunately, during the early days of the Internet browser vendors rushed out new versions faster than the W3C could define specifications. Different browsers implemented early draft versions of CSS specifications differently. Once CSS became standardized, browser vendors needed a way to know which rules to use when rendering a document: the old HTML rules, the new CSS rules, or rules that were backward compatible to some proprietary vendor standard.

The browser vendors decided to use the doctype for this purpose. This is called doctype sniffing. This was a mistake. It is a kludge of the worst kind, but we are stuck with it.

There are dozens of different doctype signatures and all the different browser vendors use different signatures to trigger different CSS behaviors. You can read about these signatures at

After years of research and experimentation, I have found only one doctype that reliably triggers compatible behavior among all the browsers. It is the doctype I listed previously. All the design patterns in my book use this doctype.

If you want to use HTML instead of XHTML (which I don't recommend), you can use either of the following HTML doctypes:

Further, if you plan on using elements and attributes that are not supported by XHTML 1.0 Transitional, you can use an HTML validator that ignores the doctype in your documents and lets you choose what you want to validate. This way, you can choose the doctype that triggers the right CSS behavior in browsers and allows you to include non-standard elements and attributes in your markup. This will free you to take advantage of browser innovations in both HTML markup and CSS properties.

Lastly, XML has an untapped power for document markup: it allows you to use your own custom elements and attributes to extend the semantic and structural meaning of your markup. The real power of CSS is that it allows you to style your custom markup as you please. This approach doesn't yet work consistently in all web browsers, but I can't wait until it does!

Chapter 2 of my book, Pro CSS and HTML Design Patterns, on pages 38-40 also discusses doctypes in a simpler and more practical manner. You can also see an example at
[ October 13, 2007: Message edited by: Mike Bowers ]
I agree. Here's the link:
subject: XHTML documents vs. XHTML syntax in HTML documents
It's not a secret anymore!