W3C XML Schema validation with Qt

Posted by tokoe on February 3, 2009 · 8 comments

As one can see here my contributions to KDE stopped from October 2008 to January 2009… that’s not because I have joined another open source project or my natural instincts forced me to hibernate, no, it’s because of two other reasons:

  • I started an internship at Trolltech aka Qt Software as part of my studies
  • The internship was located in Oslo, so there was a lot to explore for me on the evenings and during the weekends ;)

Today I want to talk about the project I worked on here at Qt Software during the last 4 month.
Qt provides a really nice support for handling XML documents with DOM and SAX APIs in the core module and
QXmlStreamReader as an alternative approach of parsing XML. Although nowadays there are many hand-written
implementations for loading XML documents from the local hard disc or via network and iterating over the
nodes to extract single parts of the XML tree, QtXmlPatterns provides a performant implementation
of the XQuery and XPath specification, which allows you to do all the loading and extracting of XML data with only
5 lines of code. In Qt 4.5 the XML support has been extended by an XSLT implementation, that allows you to easily
convert documents from one XML dialect into another or generating source code from a XML description like the Akonadi
project does it for their database access classes.

However all these technologies expect, that the xml input documents are well-formed and valid. In this case well-formed
means that it is valid according the XML specification (correct tag syntax, only nested tags etc.). Valid means, that
the elements have only the attributes that are allowed by the XML dialect the document is an instance of and that
the elements appear in the correct order. While the well-formedness is enforced by all basic XML parsers (in our case
QXmlStreamReader), the validity depends on the validation language that one wants to use. The most common languages are

  • W3C XML Schema
  • RelaxNG
  • Schematron

W3C XML Schema is, like the name suggests, defined by the W3C and part of nearly every other XML technology specification released by them,
so it is well accepted and used many software systems out there.
RelaxNG has a similar approach of validating XML documents like XML Schema, however it has a much simpler syntax.
Schematron follows an imperative concept of validation. Instead of describing the complete structure of a valid document,
it defines assertions that must be true, otherwise the document is invalid.

So we can see that it is worth to have an implementation for W3C XML Schema to complete the QtXmlPatterns module, unfortunately
Qt 4.4 and 4.5 are missing support for that… and that is the point where we finally come back to my internship project ;)

The goal of the project was to evaluate the existing XML schema validation C/C++ implementations (API-wise) and come
up with a nice API (and of course implementation) based on Qt and integrated into QtXmlPatterns.

To make a long story short… yes, we have a working implementation now!

And now the longer version of that story:
The implementation passes around 99% of the tests of the W3C XML Schema Test Suite. That looks pretty good at the first glance, but I have to admit, that I disabled some of the tests. For example all the tests that are marked as invalid in the bug tracking system and all tests that currently do not pass because of memory/processor resource limitations…
So I guess the real number of passed tests is around 98%, still acceptable IMHO ;)

So how can a developer make use of the schema validation? In the current version, we support only checking
if a schema document is valid and if an instance document is valid according a given schema. The validation
is not integrated into QXmlStreamReader yet, so the developer has to do the check manually like in the
following code:

    #include <QtXmlPatterns/QXmlSchema>
    #include <QtXmlPatterns/QXmlSchemaValidator>

    QXmlSchema schema;
    schema.load( QUrl("file:///home/user/myschema.xsd") );

    if ( schema.isValid() ) {
        QXmlSchemaValidator validator( schema );
        if ( validator.validate( QUrl("file:///home/user/instance.xml") ) ) {
            qDebug() < < "instance is valid";
        } else {
            qDebug() << "instance is invalid";
        }
    } else {
        qDebug() << "schema is invalid";
    }

Of course it is also possible to retrieve the error message why the schema or instance is invalid, information
about that can be found in the API documentation.

For those of you who prefer fancy, colored screenshots, here come some pictures of the example application:

Valid document instance

Invalid document instance

When choosing the invalid instance document, the application points out the invalid XML construct inside the
document.

We plan to provide the Qt branch with the schema validation support as separated branch in the Qt labs git
repository. Unfortunately you have to checkout and compile the complete Qt, as the schema support also patches
components outside QtXmlPatterns (namely QRegExp), but as soon as we merge it upstream, that shouldn't be a
problem any longer.

So will schema validation support make it into Qt 4.6? Maybe... I hope so, also there is still some stuff
to do, iron out all the rough edges (usability wise) and integrate it cleanly into the rest of Qt, however
my time here in Oslo ends soon and I have to go back to university. But I'm quite sure that Frans (my technical
mentor and the other 50% of the XML team ;) ) will continue to keep track of it and further help to make it really rock!

QShare(this)

Possibly related posts:

  1. Making auto-testing easier

8 comments

1 Chris G February 3, 2009 at 3:37 pm
 

Looks great – anxiously looking forward to the release!

2 Scorp1us February 3, 2009 at 4:56 pm
 

What you should do is start it as a QtSolution like Kinetic is and merge that in at the next release.

Also, please note that Qt lacks the ability to query & modify an XML document. I refer to MS’s XMLPathNavigator classes.

I should be able to say:
QList nl = doc.toElement().query(“Element[@attrib='4']“);

And get a node list of matching nodes whose attrib is 4.
Then I should be able to take those nodes, and say
nl.first().setAttibute(“attrib”, “5″);

This currently is not possible. Also, Qt needs to be able to transparently convert to/from QDomNodeList and QList.

3 Nils February 3, 2009 at 6:31 pm
 

Hi!

That’s a great extension for Qt’s XML support.

- Is it also possible to validate parts of an XML document?
- Is it possible to query the validator about things like “list all element that may be inserted at this position in the document”

Thanks!

Nils

4 Tobias Koenig February 4, 2009 at 10:12 am
 

@Scorplus The first part of the feature you mentioned (querying parts of a xml document) is already provided by the QXmlQuery class,
that allows you to select nodes by XQuery or XPath expressions. The update functionality is still missing but already on our agenda.

@Nils The internal API allows to validate parts of a XML document but for making it public API, we would have to make the schema structure
classes public as well, which requires many API reviews that could not be done in the current time frame. Querying for possible elements at
a given position is the same problem… only possible with private API at the moment.

5 Will Stephenson February 4, 2009 at 2:51 pm
 

Now you can write a krazy (http://englishbreakfastnetwork.org/krazy/) check to validate all of the KConfigXT .kcfg files in KDE’s svn. The schema is there already.

6 Dominik February 4, 2009 at 3:22 pm
 

Do you think with this it’s possible to implement code completion for XML based on XMLSchema and/or RelaxNG? That would be a useful plugin for Kate… (right now Kate only supports metaDTD)

7 Tobias Koenig February 4, 2009 at 4:00 pm
 

@Dominik unfortunately it is the same like with the wishes from Nils, without a public API of the data types and schema structure that is not possible

8 Anonymous February 4, 2009 at 11:08 pm
 

I’m rather new to the XML validation stuff. What about the classical DTD (Document Type Definition) format? Can XML documents be validated with that already? (I noticed Kate fails to do noteworthy code highlighting for DTD files as well.)

Comments on this entry are closed.

Previous post:

Next post: