Ask the DHTML Pro 10-Minute Solutions

Transform Your Data With XSL
By Kurt Cagle

Here's a real world example of how you can use XSL to convert an XML-based resumé into various data formats to suit your needs.

Developers are easily some of the most restless workers in the country, hopping from one job to the next on average once every 18 months. One downside to job hunting is that you can spend a significant chunk of time updating and revising your resumé. This can present a dilemma if you job search a lot or if you are a consultant with multiple clients. You want your resumé to look flashy, to stand out from the crowd—accessible online if at all possible. A prospective employer wants to get a sample of what you can do, but that client's same HR department wants something as manageable as possible—a simple formatted page, or better yet, a page in straight text. To make this process even more of a nightmare, you want to be able to update your resumé with as little hassle as possible, and perhaps even maintain three or four different resumés that you can submit in different circumstances (I usually have one for my programming clients, a second to show off my writing skills, a third to showcase my graphics abilities, and then a general purpose one when I just want to give an overview of what I can do).

This situation is one for which Extensible Markup Language (XML) was practically tailor made. A resumé is a prime example of a case where you want to be able to separate your data (jobs, skills, publications, and so forth) from the layout of those skills—the classic dichotomy between the data layer and the presentation layer. HTML is not much help here because changing the data also entails modifying the markup around it, but the newest technologies, XML and XSL, give you data and presentation in two easily manageable documents.

XSL, or Extensible Style Language, is often compared to CSS (Cascading Style Sheets) as a way of applying specific formats to XML tags. However, this comparison is actually a little misleading. CSS reads each XML element as it is scanned in the document and applies styles in that order. In other words, CSS doesn't change the structure of the XML; it only changes the visual appearance of each node. If you put your name at the bottom of the XML document, CSS will place your name at the bottom of the document unless you explicitly position it elsewhere with position:absolute. Furthermore, CSS will treat each tag of a given type in exactly the same manner—there's no mechanism for doing things like placing a rule above the first paragraph in a set of paragraphs without explicitly renaming the paragraph class.

XSL, on the other hand, is a transformational language. It can take an XML document (or a rigorously valid HTML document) and convert it to another XML document, an HTML document, a printable HTML document, a standard ASCII text file, a proprietary text format, or conceivably even a binary representation. Given that a significant proportion of all computer programs out there exist for the sole purpose of transforming one set of data into a different set of data, the potential for XSL is in some respects even broader than the already burgeoning interest in XML.

From the Web developer's standpoint, you can achieve the greatest flexibility using a combination of all three technologies: XML contains the data, CSS, either in the form of external style sheets or internal style attribute, handles the presentation, while XSL is used to modify the structure of the document. By separating out the pieces in this fashion, you get the added benefit of being able to modify the data, specify alternative presentation layers, and control which content gets delivered where independently of one another.

Structure Your Data

A particular pet-peeve of mine is that so many XML samples are so shallow as to provide no real context about how useful they are for handling "rich" data. A resumé is not a trivial document and is a perfect example of where XML and XSL can come in handy. Typically, it encapsulates information from a number of different "objects": addresses, employers, schools, skills, and so forth. The trick to constructing a good XML file that describes this information is to work from the general to the specific. A <resume> object would contain an <address>, which would in turn contain a <street>, a <city>, and so forth. Likewise, the resumé would also contain <group>s of <skill>s, <employer>s, <publication>s, and so forth, each of which may in turn contain additional information. So a basic resumé structure might look something like this:

<resume> <name>Kurt Alan Cagle</name> <category name="Curriculum Vitae"> <group name="Contact Information"> <address name="Home Address" id="homeAddress"> <street name="Street">209 Hamilton Avenue</street> <city name="City">Palo Alto</city> <state name="State" stateCode="CA">California</state> <zipcode name="Zip Code">94301</zipcode> <country name="Country">USA</country> <phone name="Home Phone" id="homePhone">(555) 555-5555</phone> <phone name="Cell Phone" id="cellPhone">(555) 555-5555</phone> <phone name="Fax" id="fax">(555) 555-5555</phone> </address> </group> <group name="Education"> <description>I went to school. </description> <institution name="College" graduated="yes"> <school>University of Illinois</school> <location> <city>Champaign</city> <state>IL</state> </location> <major>Physics</major> <dateStart month="September" year="1981"/> <dateEnd month="June" year="1985"/> <degree>Bachelor of Science</degree> <comments>Minors in Mathematics and Astronomy</comments> </institution> </group> <group name="Skills"> <description> These are skill areas, with descriptions. </description> <group name="Programming Languages"> <description> Most programmers pick up numerous languages over the course of their career, depending upon the needs involved, and I'm no different in that regard. I would describe myself as an advanced interpreted language developer, in that I have specialized in interpreted or scripting languages over the years rather than compiled languages such as C++. </description> <skill name="Visual Basic" years="7"> My first experience with Visual Basic was when it was a "toy" language with version 1.0. I've worked with most versions of the language since then, typically while they were in beta development, and have written a book on Visual Basic Internet Database development for Coriolis and articles for the Visual Basic Programmer's Journal and Web Builder Magazine. </skill> <skill name="Macromedia Shockwave" years="7"> I've worked with Director nearly as long as I've worked with VB, have written two books on programming in Lingo, Director's scripting language, and was Contributing Editor and Technical (Managing) editor of the Macromedia Users Journal. </skill> </group> </group> <group name="Employers"> <description> Employers and contract positions I have held. </description> <employer name="Cagle Communications"> <title>President and Chief Bottle Washer</title> <location> <city>Olympia</city> <state>WA</state> </location> </employer> </group>

This example falls into the category of real world XML. The excerpt is a fairly small subsection of the whole resumé, although most of the major elements (for example, tag names) are here. This brings an interesting point to XML data structures—while some are fairly simple (two or three levels deep with a handful of tags), XML's power comes in its ability to create hierarchical data structures of some complexity, such as the many elements that make up a typical resumé. This data is useful but handing an XML structure off to a prospective employer in its present form is probably not going to get you the job. This is where the extensible nature of XML comes into play.

CSS Styling of Your XML Documents

If, by some chance, you know that you'll be able to specify to your prospective employer that they should look at your resumé in Internet Explorer 5.0 or in the beta Netscape Mozilla 5.0 browser, you can make use of one of the coolest new features of this generation: CSS styling of XML documents. To do this effectively, you need to create a separate CSS style sheet that can be referenced by the XML document. Such a style sheet should look familiar if you've created style sheets for HTML. The one difference with an XML file is that every element in the XML file will need to have some basic definition associated with it, as a tag in an XML document has nothing telling the browser how to display it.

If you're familiar with CSS, most of the properties should likewise be comfortable to you, but in addition to such stalwart styles as color, font-weight, and font-family, you will also need to indicate for the tag its display attribute. This attribute determines how the element flows in the page. A display value of block indicates that the element should be contained within its own bounding rectangle. A <P> tag is a good example of a block element, as is the staple of DHTML programming, the <DIV> tag.

Other HTML elements are contained as part of the flow, such as the <B> or <I> tags, or the DHTML <SPAN>. These elements are described with the inline value for display. Inline elements don't support some capabilities—you can't put a border around an inline object, for instance—but for the most part inline and block elements support the same set of CSS attributes.

Finally, in order to take an item out of the flow entirely, you'd use the display:none attribute. Setting a display to none removes it from the rendering stream entirely—it doesn't appear, the space that it would occupy if rendered is reclaimed, and several critical events are not fired on the element. With XML, if a style is not supplied in the style sheet for a given tag, then that tag is rendered as none—in other words, it is not rendered at all. This guarantees that you don't have to handle those tags in an XML document that you're not interested in, and provides you with a certain (limited) level of ordering control. A basic CSS type style sheet for the resumé might have this structure:

File Resume.css
name {display:block;font-size:24pt;font-family:Arial,sans-serif;} street {display:block;font-size:11pt;font-family:Times,serif;} city {display:inline;font-size:11pt;font-family:Times,serif;} state {display:inline;font-size:11pt;font-family:Times,serif;} zipcode {display:inline;font-size:11pt;font-family:Times,serif;clear:all;} ...

In order to use this style sheet, the XML document also needs to declare it as a processing instruction (more widely know as a PI). This PI should appear after the XML version declaration of the XML file (resume.xml) as follows:

<?xml version="1.0" ?> <?xml-stylesheet type="text/css" href="resume.css" ?> <resume>... </resume>

Note that a processing instruction uses the notation <? and ?> to designate that it is not to be treated as a standard XML element. The notation that is shown here, xml-stylesheet, represents one area where Microsoft's attempt at being out the gate early with XML technology may have backfired. Recently, the World Wide Web Consortium (W3C), the standards organization that ratifies Web nomenclature, shifted from use of a hyphen to the use of a colon for indicating a namespace. Unfortunately, that's made the PI <?xml-stylesheet..?> obsolete. For compatibility purposes, it's likely that the older notation will continue to be accepted by older browsers for a while, but be aware that it is deprecated.

Unfortunately, if you look at the previous XML structure, you'll notice that the CSS model fails pretty quickly here. The CSS standard has support for including HTML both before and after a given element, but Microsoft does not currently support that particular facet of CSS in its present form. As a result, when this document does get rendered in your browser, the skill nodes will be unable to render properly, since critical pieces of information about the skill are given as attributes rather than text in the node.

You could argue that this problem is a flaw in the design of the XML document (and you'd be pretty close to the mark) but the flaw actually runs a little deeper. CSS works well in dealing with XML that is presented in an irregular manner (such as is typical of most Web pages) because such behavior emulates how Cascading Style Sheets deal with normal HTML. Regular data, on the other hand, presents a few problems: CSS can't filter, can't re-order data, can't add text or subordinate HTML structures. To a certain extent this problem can be ameliorated through the use of DHTML behaviors, but such behaviors are both proprietary solutions at this stage (Microsoft HTC behaviors have been submitted for consideration to the W3C, but they're a long way from being ratified) and expensive in terms of memory if you need a lot of them (as you would for a resumé).

Output Your XML

Extensible Stylesheet Language (XSL) gives you a complementary solution to the problem of formatting XML. Unlike CSS, which applies stylistic information to each XML node as that node is encountered in the stream, XSL effectively replaces one stream of information with another. Note the generic quality of this statement—XSL can transform XML into a different arrangement of XML, HTML, XSL, text, or conceivably even into SQL. Unlike other transformational languages, XSL has the benefit of being written in XML itself, which means that the same parser that can handle manipulating XML data can also reference, retrieve, and manipulate the XSL.

XSL itself consists of a series of templates that can be used to match some aspect of an XML document—usually, but not always, one or more nodes in the document. These templates apply patterns to the input XML stream that transform it to an output stream, which in this case will contain HTML code. Since XSL contains a number of tools for making conditional comparisons, sorting, and performing group operations, the output no longer needs to be tied to the order in which the elements appear in the original document. For example, consider the XSL code that handles creation of a "name" header in the output:

<xsl:template match="name"> <h1><xsl:value-of/></h1> </xsl:template>

This simple XSL template will get called anytime the XSL processor finds a <name> tag in the XML (which in this case will only be once). When a match occurs, the XSL parser will take the text of the tag (the :value-of part of the <xsl:value-of />) and place it in between two <h1> tags in the output stream. In other words, for the tag <name>Kurt Cagle</name>, the output will be <h1>Kurt Cagle</h1>, or a first level header tag with the person's name. Of course, you can do the same thing with CSS, without the rather arcane conventions that XSL brings to the plate. A more illustrative example is the address block, which contains more complex formatting needs. The XML block itself looks like this:

<address name="Home Address" id="homeAddress" locale="USA"> <street name="Street">209 Hamilton Avenue</street> <city name="City">Palo Alto</city> <state name="State" stateCode="CA">California</state> <zipcode name="Zip Code">94301</zipcode> <country name="Country">USA</country> <phone name="Home Phone" id="homePhone">(555) 555-5555</phone> <phone name="Cell Phone" id="cellPhone">(555) 555-5555</phone> <phone name="Fax" id="fax">(555) 555-5555</phone> </address>

This XML block provides a good sample test for XSL, since the output has several requirements. Firstly, city, state and zip code all need to be on the same line, formatted correctly. Also, labels need to appear for the three phone numbers and the state code needs to appear here, rather than the state name. Finally, if the resumé is accessed from outside the U.S., the USA tag will need to appear, but otherwise it won't. None of these capabilities are possible within CSS1, although limited labels are possible with CSS2. However, pulling off all of these requirements in XSL is fairly easy:

<xsl:template match="address"> <xsl:value-of select="street"/><br/> <xsl:value-of select="city"/>, <xsl:value-of select="state/@stateCode"/> <xsl:value-of select="zipcode"/> <xsl:if match=".[@locale=./country]"> <xsl:value-of select="country"/> <xsl:if><br/> <xsl:apply-templates select="phone" /> </xsl:template> <xsl:template match="phone"> <b><xsl:value-of select="@name"/>:</b> <xsl:value-of /><br/> </xsl:template>

The syntax can be a little overwhelming at first, admittedly. The xsl: declaration that you see at the beginning of many of the tags indicates that this tag is part of the XSL namespace. Along with the proper declaration in the header, tags that start with xsl: have predefined functionality that indicates to the browser how they should be interpreted. In order to enable this functionality, you need to make sure that the XSL document is enclosed with the <xsl:stylesheet> at the beginning of the document:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

The xmlns:xsl attribute signals to the XSL parser that this document is an XSL style sheet, and that specific information concerning this style sheet is located at the given URL. The URL is typically a DTD of some sort and, at least with the case of IE5, is never explicitly referenced (the download is made only if the resources involved aren't available locally). The use of namespaces merits its own article, and I won't get into much more detail about them here.

The street and city insertions are pretty straightforward, but state requires some more explanation. Here you require the postal abbreviation for the state in question, not the name of the state itself. However, your XML document may be one of many resumés, and having the state name could be useful for other applications. So in this case, the state postal code is given as an attribute called stateCode. XSL uses the '@' character to indicate that a given expression refers to an attribute rather than an element (a tag name).

All XSL tags work within a given context. A context is typically the current node making the request (in this case, the node matched in the xsl:template), and can be thought of as the node from which all other queries are made. For the address node, the expression <xsl:value-of select="state/@stateCode"/> means "look for the state node, which is a child of the current address node, then retrieve the attribute stateCode from that state node." Effectively, this code will return the string "CA", which can then be inserted into the stream. Of all the code contained in the template, the only line of real complexity is:

<xsl:if match=".[@locale=./country]"><xsl:value-of select="country"/><xsl:if><br/> 
In this case, the template checks to see if the address's locale attribute is the same as the country name. If it is, then the country name shouldn't be placed in the stream. The locale normally starts off in this way, but it is possible from the code to change this value in the XML (in a process that's outside the scope of this article).

The XSL makes use of two new concepts here—the use of the dot "." to indicate the current context explicitly, and the use of a filter. Filters are part of what gives XSL some of its horsepower. You can use data from some other point in the XML structure to determine which set of nodes to process. In this case, the filter says that when the locale attribute of the address node (for example, "[@locale=") is the same as the text in the address' country node ("./country]") then the if condition is satisfied and everything within the xsl:if subtree gets evaluated.

The final line in the address template, <xsl:apply-templates select="phone" /> seems pretty innocuous, but actually lays at the heart of XSL. The apply-templates command instructs the parser to select all elements in the XML document that match the given condition, then apply appropriate templates to them. In this case, the match is simple: find all children nodes—remember that this applies to the current context (address)—that have the tag name of "phone", then apply the phone template if it exists. As it so happens, I do have a phone template match in the XSL document with this syntax:

<xsl:template match="phone"> <b><xsl:value-of select="@name"/>:</b> <xsl:value-of /><br/> </xsl:template>

This code sets the current node (temporarily) to the phone node. The first value-of selects the attribute "name" and puts the attribute's value into the stream as a bold label. The second value-of returns the text of the phone node. This second expression is actually something of a shorthand for the complete expression <xsl:value-of select="./value()">—take the current context, then retrieve the text (the value() here) from the node's contents.

Why XSL Isn't Commonplace

To try to recap all of XSL in a single article would be pretty difficult, especially since much of it isn't immediately obvious. Part of this complexity comes from the basic nature of XSL—it is not related syntactically to such languages as Visual Basic, C++, or Java, but instead exists primarily for the act of transformation. As such, many developers who have looked at XSL work with it for about two weeks without much success, until suddenly the perspective shifts and it becomes extremely understandable, to the extent that you wonder why XSL isn't commonplace.

You can see these principles and many more at work in my sample resumé (yes, it is mine, and is way too verbose, but it illustrates the programming nicely). I've prepared three different versions of the XML. One has an "application" style sheet applied to it, showing it as an interactive piece with highlights and rollovers in place (note, this one requires Internet Explorer 5.0, although I'm working on a similar piece for non-IE5 browsers). The second shows the same exact data, but this time it's formatted for output to a printer. Then I include the XML source for this same data.

XML DocumentXSL Document
Resumé Application (IE5 Only) XSL for Resumé Application
Printed Resumé XSL for Printed Resumé
Raw XML Not Applicable

Take a look at the sample resumé included with this article, as well as the sample XSL file that transforms the data. The code extends well beyond the focus of the article, demonstrating how XML and XSL can be combined with JavaScript to create complex applications. You may also want to take a look at several sites devoted to XML, including:

Finally, keep at it. XML alone is an interesting data format, but in combination with XSL and XQL (XML Query Language), it offers a mechanism to change the way that we all work with data. By building a bridge of commonality, in expression if not necessarily in agreement on standards, the X*L family makes it possible to effectively communicate both data and structure regardless of platform, point of origin, or means of transmission—not to mention that it's pretty cool technology.

Kurt Cagle is a writer and programmer living in Olympia, Washington, where he gets a wonderful view of the mountains and the Puget Sound in one swell foop. He is working on a book on Internet Explorer 5.0 and XML programming for 29th Street Press, and discovering first hand the joys of working with XML. You can reach him at cagle@olywa.net.

 
Other 10-Minute Solutions
 Understand and Leverage XML
 Transform Your Data With XSL
 Dynamically Change the Color of an Image
 Customizing Style Sheets on the Fly
 Communicate With the Server Using XMLHTTP
 Implement a DHTML Mouseover Effect Using the DOM, HTML 4, and CSS
 Disable a Form Button: The Power of Three
 Get Ready for Navigator 5.0 DHTML
 Implement a Pull-Down Menu
 Handle User Events With DHTML
 Click Anywhere Links
 Displaying XML Data Islands with JavaScript
 Add Persistence to Your XML Data Islands
 Essential JavaScript: 8 Cross-Browser Solutions
 Automate Your Form Validation
 Build an Animation Generator
 Encapsulate Your JavaScript: Keep Private Methods Private
 Create a Tabbed User Interface
 Generate Tabbed Interfaces Automatically With JavaScript OOP
 Integrating News Feeds
 Build A JavaScript Tree Control
 Build an Object-Oriented Tree Control using JavaScript
 Build an XML-based Tree Control with JavaScript
 How To Move Items Between Lists with JavaScript
 Create an Object-Oriented JavaScript Calendar Using the Façade Design Pattern


Ask the DHTML Pro | Who is the Pro? | Usage Policies | Ask a Question | Search | Feedback


Sponsored Links


Advertising Info  |   Member Services  |   Contact Us  |   Help  |   Feedback  |   Site Map
Jupiterweb networks

internet.comearthweb.comDevx.comClickZ

Search Jupiterweb:

Jupitermedia Corporation has four divisions:
JupiterWeb, JupiterResearch, JupiterEvents, and JupiterImages

Copyright 2004 Jupitermedia Corporation All Rights Reserved.
Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Jupitermedia Corporate Info | Newsletters | Tech Jobs | E-mail Offers