SD TImes: Software Development Click for BZ Media
Search
NewsCurrent IssueBack IssuesColumnsOpinionsResource LinksSubscription services
About UsMedia KitSite MapContact UsHome

Advertisement







INTEGRATION WATCH: The Other Side of XML
By Andrew Binstock

Andrew Binstock
April 15, 2004 — One consequence of the popularity of Web services and service-oriented architecture (SOA) is that XML has become a wildly popular data format. Today, it seems, if you’re writing new apps that don’t store data in XML, you’re so far out of the technology mainstream that you might as well be using packed decimals from COBOL. This pervasive adoption of XML, I believe, credits it with too much capability and woefully underestimates how difficult it is to use.

Let’s start at the beginning: XML is easy to understand. However, as with most things that are easy to understand, the simplicity masks a progressive complexity, which appears when you want to do useful things. This complexity is delivered in the form of extensions that fill out the capabilities missing in the original simple specification.

Such extensions cannot resolve all limitations, and so incongruities persist in XML. For example, while XML is good at representing structured data—data in which all fields are cohesive, discrete entities that can be placed in a hierarchy—it does a terrible job with other kinds of data. This is because XML cannot represent data that requires tags to overlap.

Suppose you have a document that you want to mark up using XML. Suppose you want the first two-thirds of the paragraph to be bold and the last two-thirds to be italic (meaning that the middle third will be italic and bold). Because XML requires a tag to be closed before another tag of the same order opens, you cannot frame the bold portion with a pair of tags and the italic section with another pair, because these tags will overlap. The data is unstructured and, as a result, XML cannot express it without kludge workarounds. Using native formats, of course, the solution is trivial: Turn italics on until you see a marker that says turn it off. Done.

This limitation is masked by virtue of the fact that most business data is structured. However, the following problem cannot be easily hidden: XML is difficult to process well. Anyone who has used DOM or SAX knows this. To access XML through these standard interfaces, you must employ a data model that is far different from those used in reading and writing simple binary data. In fact, parsing can become incredibly complex, even for simple tasks. If you go one step further and attempt to perform validation of a document via schemas or DTDs, you have moved completely into the province of specialists. As a result, a lot of XML is not verified—only its syntax is checked.

In addition, XML is not terribly robust. A single corrupted character can make an entire document unreadable to a parser. And, under certain circumstances, finding the error can prove to be tremendously frustrating.

Finally, there is the data itself. Native formats using data record layouts have much to recommend them: They don’t need to be parsed, and the data does not need to be converted. In fact, if you can access files in native binary formats or in XML, the native format will almost always be easier, faster and more reliable. More reliable because errors within the data fields themselves do not affect your ability to read other fields.

The beauty and strength of XML are not inherent. Where XML shines is in the metadata that can be used in transformation engines. XSLT, XSL-FO and similar standards are making it possible to easily change the nature of documents automatically, and that is a compelling advantage.

This is the key point: It’s the fact that XML is agreed to and widely adopted by other tools that makes it powerful, not the notation itself. For this reason, XML is appealing in Web services (where interchange of disparate data is crucial) and of little interest to EDI users (who already have set up their interchange protocols).

As more XML databases become available, and SOA is deployed around the enterprise, there is likely to be a facile and uncontroversial decision made in many IT departments: Store all data in XML. As I have tried to show, this decision should not be automatic, but rather the product of careful thought.

Data that will be actively interchanged between systems and that lacks real-time processing requirements is a good candidate for XML; likewise data that benefits from transformations facilitated by XSLT. But even in these cases, the use of traditional data formats might still be preferable. In the event XML interfaces are later needed, native data can be exported in XML via simple-to-write conversion routines. The judicious use of XML will benefit most sites. The indiscriminate use, however, will prove cumbersome and costly.






Andrew Binstock is the principal analyst at Pacific Data Works LLC


Columns
Alan Watch

And Another Thing...

First Look


Industry Watch

INTEGRATION WATCH

Java Watch

Windows & .NET Watch

E-mail your comments to Andrew Binstock

Advertisement



Click here for a complete listing of Integration Watch Columns

Click here to see a complete Column Archive.


 


 

 

 

 

 

 

 

 

 

 

 

 

  





 Back to Top



news
| current issue | back issues | columns | opinions | resource links
about | site map | subscriptions | media kit | contact us

Copyright © 1999-2004 BZ Media, LLC, all rights reserved.
Phone: 516-922-2101 • E-mail: info@bzmedia.com