Special: Make threat forecasting work for you. FREE webcast explains how.
ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topic Map Changing the way you view IT
XHTML - a recommendation and a warning (but mostly a warning)
E-BUSINESS IN THE ENTERPRISE --- 12/09/2003

Sean McGrath

One of the most dangerous non-sequitur in this business has to do with the difference between storing something in digital form and processing something in digital form.
Advertisement
On this topic
E-BUSINESS IN THE ENTERPRISE
E-Business in the Enterprise. Sign up Now!

It is tempting, but fallacious, to conclude that computerized storage of information carries with it the promise of clever computerized processing of that information.

Multimedia is the most glaring example of this phenomenon. For example, you show the MPEG video of your kid's soccer game to your friend. She asks if the computer can find all the places where your kid appears and then create a separate video from the snippets. The request sounds reasonable. After all, all the information is in the computer, right?

The problem of course is that the information about where your kid appears in the video really cannot be *seen* by the computer. Computers cannot see. Only certain biological life forms can see. To the rest of the world - from paramecia to PCs - are blind to the concept of sight - if you see what I mean.

Of much more relevance to e-business is the inability of computers to 'see' documents. For example, you show the draft of your first novel to your friend. She asks if the computer can highlight all the areas where there is a change of scene so that she can hop from scene to scene without missing any. Again, a reasonable sounding request at first blush. Again, not something that the computer can do because although it has full visibility of every word of your novel, it has no understanding of what the novel says. It does not know what a scene is. Only homosapiens can read text. To the rest of the world - from Neanderthals to Nano-machines, text is just a bunch of pseudo-random numbers.

The time honored way to get around these problems with computers is to rigorously and tediously spell out exactly what each and every bit of information represents. This is a date. This is a ten digit product code and so on. The quintessential expression of this idea in e-business computing is the concept of a database.

There was a time when the word database served to cleanly split the world of digital information into two. Stuff in database form was amenable to clever data processing and stuff outside of a database was not.

In recent times, the distinction between so-called structured and un-structured forms of digital information has continued to blur. The dam began to burst in the e-business world with Lotus Notes - a tool which was simultaneously a traditional database and a traditional document system rolled into one. With Lotus Notes a single 'record' could contain a mixture of highly structured fields - dates, product codes and so on - alongside unstructured items known as 'Rich Text Fields'. These rich text fields looked and felt like documents. You could create headings, tables, bold, italic and so on.

The introduction of Rich Text in e-business applications was both a blessing and a curse. There are times when the ability to just pour words into a word processor-style interface is exactly what you need to compliment your structured fields. However, the fact that you can type

  • anything* into one of these fields made them very alluring. Unless the dangers of free flowing text are properly mitigated against you can end up with a situation where most, if not all, of your information creeps into these rich text fields. The result? The computer becomes slowly but surely, blind to the presence of structured information. Computers cannot see into rich text fields to ferret out the telephone numbers and product codes mixed in with the beautiful headings and tables cells and font changes.

Shortly after the Web grabbed the world and shook it gently by the throat, the concept of a 'Rich Text Field' became embodied in the now household term 'HTML'. Up and down the e-business landscape you will see applications that allow structured information to be complimented with chunks of HTML for free-flowing document content.

Exactly the same risks exist with HTML as existed with the Lotus Notes concept of a Rich Text Field. Used injudiciously, a structured database application can degrade into a mush of unstructured data in which true business information decomposes into the lush undergrowth of table cells and fonts and level 1 headings.

Today however, another dangerous non-sequitur is doing the rounds. It goes like this - if you store your information in XML all will be well. In the case of free flowing, document style information in structured database applications this equates to the idea that if you store your document stuff in XHTML all will be hunky dory.

Not so. Yes, XHTML is machine processible in the sense that all applications can unambiguously figure out where paragraphs start and end, where font changes occur and so on. This is good and I recommend that XHTML be used instead of HTML wherever possible. However, it does not follow, that computers can dig into XHTML and ferret out those telephone numbers or product codes you know are in there. Computers are as blind to them as they would be looking into a Lotus Notes rich text field.

XHTML is an incremental improvement over HTML in terms of the ability of machines to process it but it will not magically turn unstructured information into structured information. Doing that typically requires the ability to both read and see. Computers are not going to perform those feats for you any time soon.

 

Sean McGrath is CTO of Propylon. He is an internationally acknowledged authority on XML and related standards. He served as an invited expert to the W3C's Expert Group that defined XML in 1998. He is the author of three books on markup languages published by Prentice Hall. Visit his site at: http://seanmcgrath.blogspot.com.



ITworld.com Site Network
 www.itworld.com
 security.itworld.com
 smallbusiness.itworld.com
 storage.itworld.com
 utilitycomputing.itworld.com
 wireless.itworld.com
Advertisement
Sponsored links
HP Wireless Solutions for business. Proven technology. Superior service.
How do you maximize return on your IT investments? Learn more now.
Setting the pace of PC technology. HP Compaq Desktops, starting at $367.
By networking your storage, you can reduce costs, protect your information--and simplify management.
Tips to Optimize Your Revenue Assurance Investment
Free webcast: Stepping up your SMB Network Infrastructure
Find the Right Balance Between Useful Wireless Networks and Security
Latest News, Webcasts, White Papers and Newsletters on UTILITY COMPUTING
Experts estimate that more bioinformatic data will be created over the next three years than in the last 40-thousand years combined! Learn what to do about it.
 Home   Newsletters  E-BUSINESS IN THE ENTERPRISE
www.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com
 
About Us   Privacy Policy    Terms of Service   Webcast & Marketing Solutions
Copyright © 2003 Accela Communications, Inc. All rights reserved