QWebElement sees the light, do I hear a booyakasha!?

Posted by Tor Arne Vestbø on April 7, 2009 · 7 comments

One of the main missing parts of QtWebKit so far has been a proper way to inspect and manipulate the document structure. In JavaScript this is provided by the Document Object Model (DOM) bindings — giving you methods like getElementById(), createElement(), and insertBefore(). These methods were also accessible in Qt 4.4 and 4.5 though QWebFrame::evaluateJavaScript(), but it was hardly a optimal way of working with the document.

The reason a proper API for this was deferred in the earlier versions of QtWebKit was because we wanted to provide an API that was not only powerful, but also easy to use. That left out using QDom, or a similar exhaustive API, as customer feedback has shown that writing code on that level can be both tedious and error prone.

Consider the following snippet:

QDomElement docElem = doc.documentElement();

QDomElement root = docElem.firstChildElement("database");
QDomElement entry = root.firstChildElement("entry");
for (; !entry.isNull(); entry = entry.nextSiblingElement("entry")) {
	QDomElement data = entry.firstChildElement("data");
	for (; !data.isNull(); data = data.nextSiblingElement("data")) {
		QDomAttr attribute = data.attributeNode("format");
		if (!attribute.isNull() && attribute.value() == "human") {
			cout < < "Data:" << data.text() << endl;
		}
	}
}

Traversing the DOM like this quickly becomes hairy. The same pattern can be found in JavaScript code: using getElementById(), traversing children, looking for a element of a certain type, etc.

Another typical use case is manipulation, where you would do something like:

QDomElement foo = doc.getElementById("foo");
QDomElement elem = doc.createElement("img");
elem.setAttribute("src", "myimage.png");
foo.parent().insertAfter(elem, foo);

Remember to keep your tongue straight when building that insertAfter() statement!

Now some of you are probably jumping on your chairs now, screaming “this is not how you would do it in JavaScript!”. And you’re right, the JavaScript world has found a way to shield off the above annoyances: wrapper libraries like jQuery and Prototype.

Taking inspiration from these libraries we did a one-day prototyping/hacking session back in November, and the results were promising. We also asked customers what they really meant when they were asking for a DOM API, and it turned out that the goal was to inspect and manipulate the DOM, but not necessarily though a one-to-one mapping of the API provided by the DOM specification. Last week we managed to reserve some cycles to continue the work, and we’re now ready for some feedback.

So how does the new shiny QtWebKit DOM API look? There’s two classes: QWebElement and QWebElementSelection.The former wraps a DOM Element, and has methods for manipulation and traversal. The latter is just a list of QWebElements, which can be iterated and extended. The frosting lies in the way elements are selected: using CSS3 selector syntax (similar to jQuery). The implementation of the selectors is already part of WebKit, so we’re basically building on a tried and tested code base.

Using the two snippets above as examples, they would become:

QWebElement document = mainFrame.documentElement();
foreach (QWebElement humanData, document.findAll("database entry data[format='human']") {

	cout < < "Data:" << humanData.attribute("format") << endl;
}

and

document.findFirst("#foo").insertAfter("<img src='myimage.png'/>");

That’s a lot easier on my eyes at least.

The initial implementation of the new API is already landed in trunk, so feel free to try it out. You can find the API documentation here, but please note that this is a work in progress, so the API may change.

Oh, and here’s a screenshot of the QtLauncher highlighting all links:

qtwebkit-dom-api.png

Q: Is the DOM API targeted for Qt 4.6?
A: Yes

Happy hacking!

QShare(this)

Possibly related posts:

  1. There Is A Light That Never Goes Out
  2. Faking a web browser environment in QtScript
  3. QtWebKit Releases

7 comments

1 Stephan Sokolow April 7, 2009 at 5:41 pm
 

As soon as PyQt 4.6 is out, I’m going to code until I drop. :)

2 Scorp1us April 7, 2009 at 6:05 pm
 

This is good, but how about us folk who write back-end software that don’t want to have to ship webkit?
I write a lot of software that manipulates XML docs (not HTML webpages) and I wouldn’t want to ship all of webkit. In fact, this should be usable wih the XQuery/XPath stuffs.

Really all I need is something like the Microsoft classes – which is what you’re essentially recreating here.
XmlDocument^ doc = gcnew XmlDocument;
doc->LoadXml(m_XMLin);
m_provideMsgNav = doc->CreateNavigator();
m_provideNSMgr = gcnew XmlNamespaceManager( m_provideMsgNav->NameTable);
m_provideMsgNav->Select(String::Format(path, docID), m_provideNSMgr);

3 koraf April 7, 2009 at 6:09 pm
 

Finally! This is really great news. When I have some spare time, I’ll build a snapshot and play around with it.

Keep on rocking dudes.

4 Travis April 8, 2009 at 9:32 am
 

Will it also be possible to register ‘Listeners’ which are called when the DOM changes and which element was modified/added/removed?

5 Tor Arne Vestbø April 8, 2009 at 10:31 am
 

@Scorp1us:

We’re researching the idea of integrating the API with other parts of Qt, such as QtXml and QtXmlPatterns

@Travis:

That’s the plan, yes :)

6 Anon April 12, 2009 at 7:33 pm
 

Hi!

“That’s the plan, yes”

Further to this, would it be possible then to intercept elements that would be added to the DOM in the process of downloading & parsing the html page, so that they can be modified before becoming part of the DOM (or prevented from being added altogether, as a kind of more flexible “adblock”-type feature)? Currently, techniques for “customising” a downloaded webpage seem to perform required modifications only at set intervals during the downloading process, or only when the original page has been downloaded fully. It would be great to be able to re-sculpt elements the instant before or after they are added to the page proper!

Hope it’s clear what I mean ;)

7 ariya April 17, 2009 at 2:21 pm
 

@Anon: QWebElement is not designed for that purpose. An alternative way to “spy” on the passed HTML/content is by via a custom network access manager. Then you can reject unwanted resources before they are delivered.

Comments on this entry are closed.

Previous post:

Next post: