One of the most dangerous non-sequitur in this business has to do with
the difference between storing something in digital form and processing
something in digital form.
It is tempting, but fallacious, to conclude that computerized storage of
information carries with it the promise of clever computerized
processing of that information.
Multimedia is the most glaring example of this phenomenon. For example,
you show the MPEG video of your kid's soccer game to your friend. She
asks if the computer can find all the places where your kid appears and
then create a separate video from the snippets. The request sounds
reasonable. After all, all the information is in the computer, right?
The problem of course is that the information about where your kid
appears in the video really cannot be *seen* by the computer. Computers
cannot see. Only certain biological life forms can see. To the rest of
the world - from paramecia to PCs - are blind to the concept of sight -
if you see what I mean.
Of much more relevance to e-business is the inability of computers to
'see' documents. For example, you show the draft of your first novel to
your friend. She asks if the computer can highlight all the areas where
there is a change of scene so that she can hop from scene to scene
without missing any. Again, a reasonable sounding request at first
blush. Again, not something that the computer can do because although it
has full visibility of every word of your novel, it has no understanding
of what the novel says. It does not know what a scene is. Only
homosapiens can read text. To the rest of the world - from Neanderthals
to Nano-machines, text is just a bunch of pseudo-random numbers.
The time honored way to get around these problems with computers is to
rigorously and tediously spell out exactly what each and every bit of
information represents. This is a date. This is a ten digit product code
and so on. The quintessential expression of this idea in e-business
computing is the concept of a database.
There was a time when the word database served to cleanly split the
world of digital information into two. Stuff in database form was
amenable to clever data processing and stuff outside of a database was
not.
In recent times, the distinction between so-called structured and
un-structured forms of digital information has continued to blur. The
dam began to burst in the e-business world with Lotus Notes - a tool
which was simultaneously a traditional database and a traditional
document system rolled into one. With Lotus Notes a single 'record'
could contain a mixture of highly structured fields - dates, product
codes and so on - alongside unstructured items known as 'Rich Text
Fields'. These rich text fields looked and felt like documents. You
could create headings, tables, bold, italic and so on.
The introduction of Rich Text in e-business applications was both a
blessing and a curse. There are times when the ability to just pour
words into a word processor-style interface is exactly what you need to
compliment your structured fields. However, the fact that you can type
- anything* into one of these fields made them very alluring. Unless the
dangers of free flowing text are properly mitigated against you can end
up with a situation where most, if not all, of your information creeps
into these rich text fields. The result? The computer becomes slowly but
surely, blind to the presence of structured information. Computers
cannot see into rich text fields to ferret out the telephone numbers and
product codes mixed in with the beautiful headings and tables cells and
font changes.
Shortly after the Web grabbed the world and shook it gently by the
throat, the concept of a 'Rich Text Field' became embodied in the now
household term 'HTML'. Up and down the e-business landscape you will see
applications that allow structured information to be complimented with
chunks of HTML for free-flowing document content.
Exactly the same risks exist with HTML as existed with the Lotus Notes
concept of a Rich Text Field. Used injudiciously, a structured database
application can degrade into a mush of unstructured data in which true
business information decomposes into the lush undergrowth of table cells
and fonts and level 1 headings.
Today however, another dangerous non-sequitur is doing the rounds. It
goes like this - if you store your information in XML all will be well.
In the case of free flowing, document style information in structured
database applications this equates to the idea that if you store your
document stuff in XHTML all will be hunky dory.
Not so. Yes, XHTML is machine processible in the sense that all
applications can unambiguously figure out where paragraphs start and
end, where font changes occur and so on. This is good and I recommend
that XHTML be used instead of HTML wherever possible. However, it does
not follow, that computers can dig into XHTML and ferret out those
telephone numbers or product codes you know are in there. Computers are
as blind to them as they would be looking into a Lotus Notes rich text
field.
XHTML is an incremental improvement over HTML in terms of the ability of
machines to process it but it will not magically turn unstructured
information into structured information. Doing that typically requires
the ability to both read and see. Computers are not going to perform
those feats for you any time soon.