Thursday, October 2, 2014

Some Myths About XML and RDF

It doesn't make sense to compare XML and RDF any more than it makes sense to compare poetry and paper, or a tax bill and a bottle of ink. But people do it.

XML here is like blank paper, a medium. It doesn’t give give meaning to the poem, and you could write the poem on something else, but that doesn’t diminish the usefulness and value of paper. Which brings me to the first myth, one I have heard often:

1. XML has no semantics, therefore XML cannot express RDF

This makes no sense if you think of XML as the paper, or as the electrical cable carrying a telephone conversation. Er, OK, if you’re young and use these new-fangled portable telephones, the radio waves carrying a telephone conversation.

But if it did make sense your RDF would lose all its meaning when you serialized your RDF graph as RDF/XML. When you send that XML to someone else and they reconstitute an RDF graph, they say there’s meaning in that RDF graph. How was the meaning transmittted if XML cannot store meaning?  I call this the XML Phlogiston argument.

2. XML is about trees and cannot represent graphs.

One RDF advocate accused me of not knowing what a graph was when I said this was not the case, but the existence of RDF/XML, and for that matter GraphML, proves otherwise. XML can store graphs just fine.

3. RDF lets you concatenate graphs to combine them but XML cannot do this.

This is a more subtle myth. First, you can’t concatenate two RDF/XML documents as-is, because the result would not be well-formed XML. But you can use a simple XQuery expression such as
    <rdf xmlns:ref="
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>{ doc('a.xml')/*/*, doc('b.xml')/*/* }</rdf>
to return a single combined document.

The hard part is called schema mapping or ontology matching - if one document uses vehicle, automotive, four-wheeled, short, sports-car, and another uses car, green, you can’t assume all green cars are sports cars, and no automatic combination of the graphs is possible, regardless of whether you use n3 or turtle or sparql or strawberry jam,and it’s as true in JSON and XML as in RDF.

4. XML Failed

I’ve heard this one even from colleagues, some of who believed the goal of XML was to replace HTML. That wasn’t our goal at all, and XML is doing very nicely in its original intended use of technical documentation, thank you. So this isn’t really about RDF and XML except that I sometimes hear it in that context.

Are there other RDF and XML myths?

[update: yes, no. 2 should have read, and now does read, graphs, thank you to those who pointed it out.]

2 comments:

thomasbeale said...

Just a typo check: I think your point 2 is meant to be 'XML is about trees and cannot represent graphs'?

James-C said...

Oh, I've heard all sorts of myths about XML (and often a particular vocabulary, TEI). I've been thinking for awhile to do a post like this, so I might steal some of these ideas.

My favourite myth is "You can't do stand-off or out-of-line markup in XML". This comes from the idea that all XML markup is 'embedded markup' which in some digital humanities circles (only very ignorant ones) has become a pejorative term. 'embedded markup' is seen as evil because of course *everyone* who uses XML can't possibly deal with overlapping hierarchies.

1) You can do stand-off/out-of-line markup in XML (in many different ways). My favourite is @xml:id-based URI pointing. Having markup in a different file that points using a URI to a range of @xml:id's to indicate a nodeset or new out-of-line element.

2) Almost everyone I encounter has no problem choosing one hierarchy, usually the intellectual structure of a document over its physical structure, and record the other with milestone elements for things like page breaks. Switching between hierarchies is a straightforward and solved problem with a simple XSLT stylesheet. Nevermind all the much more geeky ways to deal with this 'problem'. People tell me XML is fundamentally broken because of overlap and although this is an interesting problem for markup geeks, really it is not really a problem for most applications


-James