Qt 4.6 with XML Schema support

Posted by tokoe on July 31, 2009 · 20 comments

Now it is official, last week my xml schema branch on gitorious has been merged into qt master (the upcoming Qt 4.6) by the brave Trolls Peter and Thiago.
Since February, when the internship ended and I had to go back to university, most of the changes to the branch were code cleanups, API improvements and additional documentation. So no new features so far, however there is a long TODO list on my desk that will get some attention as soon as time permits ;)

So what is all this XML Schema thingy about? Why shall I use it my applications? As you can see in the CWE/SANS TOP 25 Most Dangerous Programming Errors the first place in category Insecure Interaction Between Components is held by Improper Input Validation. Hands on, who of us checks all input data that is entered by the user in our application? While it is kind of easy to check input, typed into a QLineEdit, QTextEdit or QSpinBox, with a QValidator or by some other form of manually written source code, validating input data that come from files or via network is more complicated. These kind of data have a special format (e.g. a CSV file, an ODF document or an image) that has to be parsed and interpreted to make sure that no bogus data are pushed into the application. However for accessing these common formats there exist libraries that do the checks and validation for us and return an error in case of a violation.

During the last years the XML format has become more and more popular and the possibility to define own XML dialects (e.g. DocBook, MathML, QtUI) led to a broad adoption in the software world. If you want to parse a XML document you normally make use of a XML parser like QXmlStreamReader or QDomDocument, which do all the low-level parsing for you and throw errors if the input document is not well-formed as defined in the XML standard. Well-formed basically means that tag names, attribute names and attribute values only contain allowed characters and that tags are nested correctly in each other. However well-formedness doesn’t say anything about what tags are allowed (e.g. <html>, <resource> etc.) inside this document and which tag can contain other tags or attributes. So the classic XML parsers can’t help you to make sure that your input data are correct, you will find yourself writing code like the following

if ( element.tagName() != "resource" )
return error;


when parsing XML documents in your application.

Since this problem is as old as XML exists, there has been made many attempts to provide an easy way of validating XML documents according specific constraints. The document type definition (DTD) is an integral part of the XML standard, however its capability to define the structure of a XML document is quite limited. Other attempts are XML Schema and Relax NG, where the former is the official validation language of the W3C and the later the result of a counter-movement to XML Schema, which is seen as too complex and difficult to learn by many day-to-day XML users.

Independent on their complexity, all XML validation languages have one thing in common, they describe in an abstract way, how a valid XML document should look like. In other words they can define a grammar for a language. This grammar (a DTD or XML Schema file) is then used to validate a XML instance document and decide whether it is correct or not.

This functionality can be used with the new QXmlSchema and QXmlSchemaValidator classes in Qt 4.6 now. So instead of checking the single tag names manually, just create a XML Schema definition of the format you want to parse, pass this definition to the QXmlSchema object and then validate it against your input data via QXmlSchemaValidator. The validator will tell you then whether the input data can be processed further as they are valid, or if processing should be stopped because they are bogus. That will take away the burden of error/validity checking from you and does reduce the code size as well… however you still have to iterate over the single XML elements and parse the data into a C++ object representation to work further on the data. Can’t we simplify that somehow?

To cite a famous quote: “Yes, we can!” ;)

The grammar of the XML document that is used for validation contains all information we need to generate C++ code with the following properties:

  1. C++ classes for every type defined in the XML schema
  2. C++ code for parsing (and validating) a XML document and fill the C++ objects
  3. C++ code for writing out C++ objects to XML documents

So instead of defining your C++ data objects first and try to fill them with XML data in a second step, you could write a XML Schema definition first, which describes your XML input data, and then you let generate the matching C++ data objects, the parsing and synthesizing code automatically. That sounds really great (how much time have you wasted by parsing XML documents into C++ structures manually?) and indeed it is! ;) ‘So where is the code’ you might ask… Unfortunately there is no code written yet that could do that, however with having the XML Schema definition as internal object representation the first major step is already done. Creating C++ code from it is just a question of studiousness and time. Let’s see what the future will bring!

QShare(this)

No related posts.


20 comments

1 AlanB July 31, 2009 at 2:05 pm
 

Great! Obviously, this is a very nice addition :) Thanks for your work.

2 Michael Leupold July 31, 2009 at 4:10 pm
 

Awesome. I was really looking forward to that. And I’m even more looking forward to somewhen being able to use the XML Schema parser on its own. That would be a rocking basis for XMLSchema=>Classes or even WSDL=>Classes. :-)

3 Steve Sperandeo July 31, 2009 at 4:15 pm
 

This is more than just great. This is a very underrated feature that will eventually bring Qt into a whole new realm: communication. And for anyone that hasn’t been with us for the last 15 years, communication is the real foundation of the internet. I’ve been waiting for this feature for a really long time.

Excellent work. :-)

I think the next step in this evolution is a feature to easily create XML REST and RPC HTTP Services. Such a feature would allow Qt to start to become a powerhouse in the distributed computing, SaS, and other server-side markets.

Now, with XML Schema, this is much more feasible.

Thank you.

4 Il Rugginoso July 31, 2009 at 6:24 pm
 

Great post and excellent work!

5 Diederik van der Boor July 31, 2009 at 7:02 pm
 

Nice work! :-)

This is an area where .Net is really powerful at (generating C# classes from an xsd, or wsdl). Seeing similar support in Qt is good news for C++ and Java programmers :)

6 Nicolas B July 31, 2009 at 9:22 pm
 

I am really impressed by all the work that has been going into XML support at Qt level these last releases. Congrats !
Regarding the generation of C++ classes from XML schemas I came across codesynthesis.com which seemed pretty cool (though I did not really try it in the end). Would certainly be worth a look…

7 Tom July 31, 2009 at 9:23 pm
 

Great news!
Is it planned to support RELAX NG or Schematron in the future too?

8 Tobias Koenig July 31, 2009 at 10:40 pm
 

@Tom Yes, we have planned to implement support for further XML validation languages, however I guess the first language would be DTD as it is quite simple to transform a DTD into XML Schema and reuse the available XML Schema validation engine. Schematron could be implemented by using XSLT and the official Schematron stylesheets, however the XSLT implementation in Qt is still lacking the include functionality the stylesheets depend on… For Relax NG a new parser and transformer to XML Schema would be needed as well.

@all Thanks for the nice and supportive comments! :)

9 Dominik H August 1, 2009 at 4:44 pm
 

We once had a request for Kate’s XML completion plugin. It completes attributes and values based on a Meta DTD file. Normal DTD, XML Schema and friends are not supported.

Would it be possible to add XML completion support based on a given XML Schema? That requites somehow to query the XML Schema for valid attributes and values.

Dominik

10 Boris August 2, 2009 at 8:29 pm
 

Interesting. I wonder how complete and robust is this XML Schema support? Have you tried to run it against the XML Schema Test Suite as well as on some large, real world schemas? I am asking because libxml2 folks also tried to add XML Schema support and it ended up being an incomplete, abandoned, and pretty much unusable implementation.

As for generating C++ classes from XML Schema, CodeSynthesis XSD (http://www.codesynthesis.com/products/xsd/) does this for the standard C++ (full disclosure: I am working on the compiler). I can say that the task is quite a bit more complex than what you make it sound, especially when you try to compile real-world schemas. I have seen schemas that have hundreds of files and results in thousands of generated C++ classes. But I am sure you will find all this out for yourself if you decide to take on the project ;-) . I actually considered on a couple of occasions to implement a QT-specific mapping. Now that there is XML Schema support in QT, this becomes much more realistic, provided, of course, the implementation is of good quality.

Boris

11 Tobias Koenig August 3, 2009 at 9:59 am
 

@Dominik The type information are still not part of the public API, so it is currently not possible to write such a plugin for autocompletion. However for later releases it is planned to provide public API as well, for that we need a lot of API review though.

@Boris We used the W3C XML Schema Test Suite as unit test for the XML validator and it currently passes around 98-99% of the tests, so I’d call the validator quite major and more functional than the validator included in libxml2. Known issues are large minOccurs/maxOccurs values (same issue like in Xerces), handling of quite large or small float/double values, smaller problems with identity constraints (because of missing PSVI) and since we use QRegExp for regular expressions, not all ‘features’ of XML Schema RegExp are supported. I know that code generation from XML Schema is not as easy as it sounds (worked on such a project already), however having a perfect integration with Qt is more worth than a 100% perfect code generation. So we would start with an implementation for simple and middle complex schemas and become better and better from release to release ;) Although generating XML parsing classes for webservices is a long term goal, my personal goal is to provide an easy way to automatically generate code for all these nasty little use cases where you have to parse an XML configuration file or some output from another application or from a REST service, where the associated schemas are simple but writing the code manually is boring and time wasting.

12 Nils August 3, 2009 at 10:52 am
 

I would like to second Dominik’s wish. Any XML-Editor application based on Qt could make use of that. How many people have to raise there hand so that we might see that in 4.6 ;-) Nevertheless seeing schema support coming to Qt is a great thing!

13 Scorp1us August 3, 2009 at 7:37 pm
 

Generating C++ classes is so yesterday. What we really need is a QMetaObject system that will create a proxy class on the fly which can be mapped to-from dynamically. Then for SOAP services, we can use signal/slots to communicate with remote servers, all bound and discovered at runtime.

This way you can have a SOAP ot QService explorer and offer an in-application designer to commect Qt applications services to other services. Otherwise, you have to recompile.

14 parallaxe August 4, 2009 at 4:56 pm
 

Great news, i was waiting for this as long as im working with XML in Qt :)

Scorp1us: I didn’t get in touch with SOAP, so i don’t know the normal Use-Cases for it.
But for the most Use-Cases beside SOAP things, i think it’ll be fine for generating classes and no there’s no need for a proxy-class.
Although, through this way, it has not to much overhead.
So, generating classes won’t be the perfect solution for SOAP things, but for many others.

15 Stefan August 5, 2009 at 8:53 am
 

Great news. Good explaining example as well. I will use it soon :D

Best regards of one of your fellow students.

See you

16 Scorp1us August 5, 2009 at 2:42 pm
 

Well, SOAP services are described by a WDSL (Web Service Definition Language file which is XML) that enumerates the functions return types and the parameter types. In much the same way you would generate a CPP file from XML, you would set up a QObject-derived class and expose the web service as slots. Then an application can bind to this local object, connect or call its slots, which then generates the SOAP request and blocks for the return value. Then it obtains the return value. In this way you can interact with a web service as if it were any QObject derived class.

While you could digest a WSDL file and create CPP classes that need compiling, my point was that it would be cool to do discovery at runtime and have Qt make a class on the fly for you. The compiling CPP files apprach will work, but will be slightly less awesome.

17 DaveL August 6, 2009 at 4:47 am
 

Has there been any thoughts about using the XML Schema information (when present) to speed up the XQuery / XPath processing? I figure there’s got to be some pretty cool work in that direction.

18 PiotrDobrogost August 6, 2009 at 9:06 am
 

Well done job. Thanks for your work and your post.

The solution Scorp1us describes sounds very interesting and I think it should be taken into consideration before deciding how the model of web services support in Qt should look like.

I also second DaveL question which is very interesting.

19 DaveL August 7, 2009 at 2:11 am
 

Another random thought.

uic converts the contents of schema compliant XML document to C++ code, yeah?

If there’s any common code that is used to generate C++ it’d be nice to see that reused and even nicer if it was made a first class citizen – I like the idea of being about to hook up an XML document (or a QLALR parsed file) to something that assists with code generation. I’m a little biased here, I’ve been wondering about the benefits of some clever and judicious QScript to C++ compilation… :)

Of course, you might end up on a road that ends with code generating plugins for qmake, which would then have to manage the running order of the various generators. That might complicate things more than Qt would like, I’m not sure. It’s nice to daydream about though :)

20 Tobias Koenig August 7, 2009 at 6:05 pm
 

@Scorp1us Using QMetaObject to have dynamic methods sounds like an interesting idea, however for webservices methods are one part, the other is types. And dynamic types can only be created with some nested QVariant/QVariantMap constructs which is limited in opposite to generated code. I also don’t see the advantage of ‘need no recompile’, if the WSDL changes, you have to adapt you client application anyway as types or method signatures have changed, so recompiling the generated stubs together with the business logic code shouldn’t be an issue. And like already said, the short time goal would be to parse XML files and have _native_ Qt/C++ code to work on the parsed data.

@DaveL Integration between XPath/XQuery and XML Schema is indeed a nice feature, however that needs larger rewrites and refactoring in both, the XML Schema and the XQuery engienes. The XML Schema code has been developed nearly 2 years or more after the XQuery code, so the latter was not designed with possible XML Schema support in mind.

Comments on this entry are closed.

Previous post:

Next post: