Here's a real world example of how you can use XSL to convert an XML-based resumé into various data formats to suit your needs.
Developers are easily some of the most restless workers in the country, hopping from one job to the next on average once every 18 months. One downside to job hunting is that you can spend a significant chunk of time updating and revising your resumé. This can present a dilemma if you job search a lot or if you are a consultant with multiple clients. You want your resumé to look flashy, to stand out from the crowdaccessible online if at all possible. A prospective employer wants to get a sample of what you can do, but that client's same HR department wants something as manageable as possiblea simple formatted page, or better yet, a page in straight text. To make this process even more of a nightmare, you want to be able to update your resumé with as little hassle as possible, and perhaps even maintain three or four different resumés that you can submit in different circumstances (I usually have one for my programming clients, a second to show off my writing skills, a third to showcase my graphics abilities, and then a general purpose one when I just want to give an overview of what I can do).
This situation is one for which Extensible Markup Language (XML) was practically tailor made. A resumé is a prime example of a case where you want to be able to separate your data (jobs, skills, publications, and so forth) from the layout of those skillsthe classic dichotomy between the data layer and the presentation layer. HTML is not much help here because changing the data also entails modifying the markup around it, but the newest technologies, XML and XSL, give you data and presentation in two easily manageable documents.
XSL, or Extensible Style Language, is often compared to CSS (Cascading Style Sheets) as a way of applying specific formats to XML tags. However, this comparison is actually a little misleading. CSS reads each XML element as it is scanned in the document and applies styles in that order. In other words, CSS doesn't change the structure of the XML; it only changes the visual appearance of each node. If you put your name at the bottom of the XML document, CSS will place your name at the bottom of the document unless you explicitly position it elsewhere with position:absolute. Furthermore, CSS will treat each tag of a given type in exactly the same mannerthere's no mechanism for doing things like placing a rule above the first paragraph in a set of paragraphs without explicitly renaming the paragraph class.
XSL, on the other hand, is a transformational language. It can take an XML document (or a rigorously valid HTML document) and convert it to another XML document, an HTML document, a printable HTML document, a standard ASCII text file, a proprietary text format, or conceivably even a binary representation. Given that a significant proportion of all computer programs out there exist for the sole purpose of transforming one set of data into a different set of data, the potential for XSL is in some respects even broader than the already burgeoning interest in XML.
From the Web developer's standpoint, you can achieve the greatest flexibility using a combination of all three technologies: XML contains the data, CSS, either in the form of external style sheets or internal style attribute, handles the presentation, while XSL is used to modify the structure of the document. By separating out the pieces in this fashion, you get the added benefit of being able to modify the data, specify alternative presentation layers, and control which content gets delivered where independently of one another.
Structure Your Data
A particular pet-peeve of mine is that so many XML samples are so shallow as to provide no real context about how useful they are for handling "rich" data. A resumé is not a trivial document and is a perfect example of where XML and XSL can come in handy. Typically, it encapsulates information from a number of different "objects": addresses, employers, schools, skills, and so forth. The trick to constructing a good XML file that describes this information is to work from the general to the specific. A <resume> object would contain an <address>, which would in turn contain a <street>, a <city>, and so forth. Likewise, the resumé would also contain <group>s of <skill>s, <employer>s, <publication>s, and so forth, each of which may in turn contain additional information. So a basic resumé structure might look something like this:
Kurt Alan Cagle
209 Hamilton Avenue
Palo Alto
California
94301
USA
(555) 555-5555
(555) 555-5555
(555) 555-5555
I went to school.
University of Illinois
Champaign
IL
Physics
Bachelor of Science
Minors in Mathematics and Astronomy
These are skill areas, with descriptions.
Most programmers pick up numerous languages over the course of their career,
depending upon the needs involved, and I'm no different in that regard.
I would describe myself as an advanced interpreted language developer,
in that I have specialized in interpreted or scripting languages
over the years rather than compiled languages such as C++.
My first experience with Visual Basic was when it was a "toy" language with version 1.0.
I've worked with most versions of the language since then,
typically while they were in beta development,
and have written a book on Visual Basic Internet Database development
for Coriolis and articles for the Visual Basic Programmer's Journal
and Web Builder Magazine.
I've worked with Director nearly as long as I've worked with VB, have written two books
on programming in Lingo, Director's scripting language, and was Contributing Editor and
Technical (Managing) editor of the Macromedia Users Journal.
Employers and contract positions I have held.
President and Chief Bottle Washer
Olympia
WA
This example falls into the category of real world XML. The excerpt is a fairly small subsection of the whole resumé, although most of the major elements (for example, tag names) are here. This brings an interesting point to XML data structureswhile some are fairly simple (two or three levels deep with a handful of tags), XML's power comes in its ability to create hierarchical data structures of some complexity, such as the many elements that make up a typical resumé. This data is useful but handing an XML structure off to a prospective employer in its present form is probably not going to get you the job. This is where the extensible nature of XML comes into play.
CSS Styling of Your XML Documents
If, by some chance, you know that you'll be able to specify to your prospective employer that they should look at your resumé in Internet Explorer 5.0 or in the beta Netscape Mozilla 5.0 browser, you can make use of one of the coolest new features of this generation: CSS styling of XML documents. To do this effectively, you need to create a separate CSS style sheet that can be referenced by the XML document. Such a style sheet should look familiar if you've created style sheets for HTML. The one difference with an XML file is that every element in the XML file will need to have some basic definition associated with it, as a tag in an XML document has nothing telling the browser how to display it.
If you're familiar with CSS, most of the properties should likewise be comfortable to you, but in addition to such stalwart styles as color, font-weight, and font-family, you will also need to indicate for the tag its display attribute. This attribute determines how the element flows in the page. A display value of block indicates that the element should be contained within its own bounding rectangle. A <P> tag is a good example of a block element, as is the staple of DHTML programming, the <DIV> tag.
Other HTML elements are contained as part of the flow, such as the <B> or <I> tags, or the DHTML <SPAN>. These elements are described with the inline value for display. Inline elements don't support some capabilitiesyou can't put a border around an inline object, for instancebut for the most part inline and block elements support the same set of CSS attributes.
Finally, in order to take an item out of the flow entirely, you'd use the display:none
attribute. Setting a display to none removes it from the rendering stream entirelyit doesn't appear, the space that it would occupy if rendered is reclaimed, and several critical events are not fired on the element. With XML, if a style is not supplied in the style sheet for a given tag, then that tag is rendered as nonein other words, it is not rendered at all. This guarantees that you don't have to handle those tags in an XML document that you're not interested in, and provides you with a certain (limited) level of ordering control. A basic CSS type style sheet for the resumé might have this structure:
File Resume.css
name {display:block;font-size:24pt;font-family:Arial,sans-serif;}
street {display:block;font-size:11pt;font-family:Times,serif;}
city {display:inline;font-size:11pt;font-family:Times,serif;}
state {display:inline;font-size:11pt;font-family:Times,serif;}
zipcode {display:inline;font-size:11pt;font-family:Times,serif;clear:all;}
...
In order to use this style sheet, the XML document also needs to declare it as a processing instruction (more widely know as a PI). This PI should appear after the XML version declaration of the XML file (resume.xml) as follows:
...
Note that a processing instruction uses the notation <? and ?> to designate that it is not to be treated as a standard XML element. The notation that is shown here, xml-stylesheet, represents one area where Microsoft's attempt at being out the gate early with XML technology may have backfired. Recently, the World Wide Web Consortium (W3C), the standards organization that ratifies Web nomenclature, shifted from use of a hyphen to the use of a colon for indicating a namespace. Unfortunately, that's made the PI <?xml-stylesheet..?> obsolete. For compatibility purposes, it's likely that the older notation will continue to be accepted by older browsers for a while, but be aware that it is deprecated.
Unfortunately, if you look at the previous XML structure, you'll notice that the CSS model fails pretty quickly here. The CSS standard has support for including HTML both before and after a given element, but Microsoft does not currently support that particular facet of CSS in its present form. As a result, when this document does get rendered in your browser, the skill nodes will be unable to render properly, since critical pieces of information about the skill are given as attributes rather than text in the node.
You could argue that this problem is a flaw in the design of the XML document (and you'd be pretty close to the mark) but the flaw actually runs a little deeper. CSS works well in dealing with XML that is presented in an irregular manner (such as is typical of most Web pages) because such behavior emulates how Cascading Style Sheets deal with normal HTML. Regular data, on the other hand, presents a few problems: CSS can't filter, can't re-order data, can't add text or subordinate HTML structures. To a certain extent this problem can be ameliorated through the use of DHTML behaviors, but such behaviors are both proprietary solutions at this stage (Microsoft HTC behaviors have been submitted for consideration to the W3C, but they're a long way from being ratified) and expensive in terms of memory if you need a lot of them (as you would for a resumé).
Output Your XML
Extensible Stylesheet Language (XSL) gives you a complementary solution to the problem of formatting XML. Unlike CSS, which applies stylistic information to each XML node as that node is encountered in the stream, XSL effectively replaces one stream of information with another. Note the generic quality of this statementXSL can transform XML into a different arrangement of XML, HTML, XSL, text, or conceivably even into SQL. Unlike other transformational languages, XSL has the benefit of being written in XML itself, which means that the same parser that can handle manipulating XML data can also reference, retrieve, and manipulate the XSL.
XSL itself consists of a series of templates that can be used to match some aspect of an XML documentusually, but not always, one or more nodes in the document. These templates apply patterns to the input XML stream that transform it to an output stream, which in this case will contain HTML code. Since XSL contains a number of tools for making conditional comparisons, sorting, and performing group operations, the output no longer needs to be tied to the order in which the elements appear in the original document. For example, consider the XSL code that handles creation of a "name" header in the output:
This simple XSL template will get called anytime the XSL processor finds a <name> tag in the XML (which in this case will only be once). When a match occurs, the XSL parser will take the text of the tag (the :value-of part of the <xsl:value-of />) and place it in between two <h1> tags in the output stream. In other words, for the tag <name>Kurt Cagle</name>, the output will be <h1>Kurt Cagle</h1>, or a first level header tag with the person's name. Of course, you can do the same thing with CSS, without the rather arcane conventions that XSL brings to the plate. A more illustrative example is the address block, which contains more complex formatting needs. The XML block itself looks like this:
209 Hamilton Avenue
Palo Alto
California
94301
USA
(555) 555-5555
(555) 555-5555
(555) 555-5555
This XML block provides a good sample test for XSL, since the output has several requirements. Firstly, city, state and zip code all need to be on the same line, formatted correctly. Also, labels need to appear for the three phone numbers and the state code needs to appear here, rather than the state name. Finally, if the resumé is accessed from outside the U.S., the USA tag will need to appear, but otherwise it won't. None of these capabilities are possible within CSS1, although limited labels are possible with CSS2. However, pulling off all of these requirements in XSL is fairly easy:
,
:
The syntax can be a little overwhelming at first, admittedly. The xsl: declaration that you see at the beginning of many of the tags indicates that this tag is part of the XSL namespace. Along with the proper declaration in the header, tags that start with xsl: have predefined functionality that indicates to the browser how they should be interpreted. In order to enable this functionality, you need to make sure that the XSL document is enclosed with the <xsl:stylesheet> at the beginning of the document:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
The xmlns:xsl attribute signals to the XSL parser that this document is an XSL style sheet, and that specific information concerning this style sheet is located at the given URL. The URL is typically a DTD of some sort and, at least with the case of IE5, is never explicitly referenced (the download is made only if the resources involved aren't available locally). The use of namespaces merits its own article, and I won't get into much more detail about them here.
The street and city insertions are pretty straightforward, but state requires some more explanation. Here you require the postal abbreviation for the state in question, not the name of the state itself. However, your XML document may be one of many resumés, and having the state name could be useful for other applications. So in this case, the state postal code is given as an attribute called stateCode. XSL uses the
'@' character to indicate that a given expression refers to an attribute rather than an element (a tag name).
All XSL tags work within a given context. A context is typically the current node making the request (in this case, the node matched in the xsl:template), and can be thought of as the node from which all other queries are made. For the address node, the expression <xsl:value-of select="state/@stateCode"/> means "look for the state node, which is a child of the current address node, then retrieve the attribute stateCode from that state node." Effectively, this code will return the string "CA", which can then be inserted into the stream. Of all the code contained in the template, the only line of real complexity is:
<xsl:if match=".[@locale=./country]"><xsl:value-of select="country"/><xsl:if><br/>
In this case, the template checks to see if the address's locale attribute is the same as the country name. If it is, then the country name shouldn't be placed in the stream. The locale normally starts off in this way, but it is possible from the code to change this value in the XML (in a process that's outside the scope of this article).
The XSL makes use of two new concepts herethe use of the dot "." to indicate the current context explicitly, and the use of a filter. Filters are part of what gives XSL some of its horsepower. You can use data from some other point in the XML structure to determine which set of nodes to process. In this case, the filter says that when the locale attribute of the address node (for example, "[@locale=") is the same as the text in the address' country node ("./country]") then the if condition is satisfied and everything within the xsl:if subtree gets evaluated.
The final line in the address template, <xsl:apply-templates select="phone" /> seems pretty innocuous, but actually lays at the heart of XSL. The apply-templates command instructs the parser to select all elements in the XML document that match the given condition, then apply appropriate templates to them. In this case, the match is simple: find all children nodesremember that this applies to the current context (address)that have the tag name of "phone", then apply the phone template if it exists. As it so happens, I do have a phone template match in the XSL document with this syntax:
:
This code sets the current node (temporarily) to the phone node. The first value-of selects the attribute "name" and puts the attribute's value into the stream as a bold label. The second value-of returns the text of the phone node. This second expression is actually something of a shorthand for the complete expression <xsl:value-of select="./value()">take the current context, then retrieve the text (the value() here) from the node's contents.
Why XSL Isn't Commonplace
To try to recap all of XSL in a single article would be pretty difficult, especially since much of it isn't immediately obvious. Part of this complexity comes from the basic nature of XSLit is not related syntactically to such languages as Visual Basic, C++, or Java, but instead exists primarily for the act of transformation. As such, many developers who have looked at XSL work with it for about two weeks without much success, until suddenly the perspective shifts and it becomes extremely understandable, to the extent that you wonder why XSL isn't commonplace.
You can see these principles and many more at work in my sample resumé (yes, it is mine, and is way too verbose, but it illustrates the programming nicely). I've prepared three different versions of the XML. One has an "application" style sheet applied to it, showing it as an interactive piece with highlights and rollovers in place (note, this one requires Internet Explorer 5.0, although I'm working on a similar piece for non-IE5 browsers). The second shows the same exact data, but this time it's formatted for output to a printer. Then I include the XML source for this same data.
Take a look at the sample resumé included with this article, as well as the sample XSL file that transforms the data. The code extends well beyond the focus of the article, demonstrating how XML and XSL can be combined with JavaScript to create complex applications. You may also want to take a look at several sites devoted to XML, including:
Finally, keep at it. XML alone is an interesting data format, but in combination with XSL and XQL (XML Query Language), it offers a mechanism to change the way that we all work with data. By building a bridge of commonality, in expression if not necessarily in agreement on standards, the X*L family makes it possible to effectively communicate both data and structure regardless of platform, point of origin, or means of transmissionnot to mention that it's pretty cool technology.
Kurt Cagle is a writer and programmer living in Olympia, Washington, where he gets a wonderful view of the mountains and the Puget Sound in one swell foop. He is working on a book on Internet Explorer 5.0 and XML programming for 29th Street Press, and discovering first hand the joys of working with XML. You can reach him at cagle@olywa.net.