Todays topic is the raster engine, Qt’s software rasterizer. Its the reference implementation and the only paint engine that implements all possible feature combinations that QPainter offers.
History
The story of Qt’s software engine started around December 2004, if my memory serves me. My colleague Trond and I had been working for a while on the new painting architecture for Qt 4, codenamed “Arthur”. Trond had been working on the X11 and OpenGL 1.x engines and I was focusing on the combined Win32 GDI/GDI+ engine along with QPainter and surrounding APIs. We had introduced a few new features, such as antialiasing, alpha transparency for QColor, full world transformation support and linear gradients. As few of these new features were supported by GDI, it meant that using any of these features implied switching to GDI+, which at the time was insanely slow, at least on all the machines we had in the Oslo office back then. Actually, enabling the GDI advanced graphics mode to do transformations was also not very fast.
Then we came upon this toolkit called Anti-Grain Geometry (AGG) which did everything in software, in plain C++, and we were just amazed at what it could do. Our immediate reaction was to curl up on the floor in agony, thinking that we were going about this all wrong. Using these native API’s was not helping us at all. In fact it was preventing us from getting the feature set we wanted with a performance that was acceptable. Once we settled down again, our first idea was to try to implement a custom AGG paint engine which would just delegate all drawing into the AGG pipeline. But alas, the template nature of the AGG API combined with the extremely generic QPainter API bloated up into a pipeline that didn’t perform nearly as good as the demos we had seen.
So we took our Christmas vacation and started over in January of 2005. Still quite depressed over the new feature set that didn’t perform combined with being limited by a minimal subset of native API’s, I went to Matthias and Lars and asked if I could get three weeks of time to hack together a software only paint engine as a proof of concept. I got an “OK” and spent the following weeks implementing software pixmap transformation, bi-linear filtering, clipping support in the crudest possible way and three weeks later I had a running software paint engine and quite proudly announced that I was “just about done”. I’ve reconstructed an image of how I remember it:

The system clipping was all over the place, bitmap patterns were broken, but perhaps worst of all, all text is rendered using QPainterPath’s, and all drawing was antialiased. Despite it not looking 100% good, the performance of the various features was pretty ok. It was agreed that this was a good start, but that we needed a bit more work. And so started the sprint for the Qt 4.0 beta a few months later.
The initial version that was released with Qt 4.0 worked quite well in terms of features, but in hindsight the performance was far from what our users demanded from Qt. As a result, we harvested a lot of criticism over the first year of Qt 4.0. Since then, we’ve done a lot, and I mean a LOT, and my gut feeling is that it is the engine that performs the best for average Qt usage, so I think we made a good choice back then in dropping GDI and GDI+. And, as I outlined in my previous post, we are toying with making raster the default across all desktop systems for the sake of speed and consistency.
Overall structure
The overall structure of the engine is that all drawing is decomposed into horizontal bands with a coverage value, called spans. Many spans will together form the “mask” for a shape and each pixel that is inside the mask is filled using a span function.

The image highlights one scanline of a polygon which is filled with a linear gradient. There are 4 spans, one which fades in the opacity of the polygon and two which fade out the opacity of the gradient. For each pixel in the polygon, the gradient function is called and we write the pixel to the destination, possibly alpha blending it, if the coverage value is other than full opacity or if the pixel we got from the gradient function contains alpha.
Clipping also use the same mechanism. The span function for clipping takes the incoming spans, intersects them with the set of spans that defines the clip and calls the actual filling span function.

All operations followed this pattern. When a drawRect call comes in, we generate a list of spans for each scan line and set up a span function according to the current brush. A pixmap is similar, we create a list of spans and use a pixmap span function. A polygon is passed to a scanconverter which produces a span list, etc. We have two scan converters, one for antialiased and one for aliased drawing. The antialiased one is pretty much a fork of FreeType’s grayraster.c, with some minor tweaks, I think we needed to add support odd-even fills, for instance. Text is also converted into spans.
Lines, Polylines and Path Strokes
These primitives are passed to a separate processor called a stroker. The stroker creates a new path that visually matches the fillable shape that the outline represents. There is a public API for this too, in QPainterPathStroker. This fillable shape is then passed to one of the scan converters which in turn scan converts the shape into spans. For dashed outlines, the same process happens, and the resulting fillable shape is a path with a potentially very large amount of subpaths. Naturally, such a sub-path is costly to scan convert, which is part of the reason why we explicitly do not put dashed lines on the list of high-performance features. In fact, in many cases, line dashing is one of the slowest operations available in the raster engine, so use it with extreme caution.
A hacky alternative which performs much better, is to set a 2×2 black/white or black/transparent pixmap brush and draw the stroke using a pen with brush. A bit more to set up, but if that’s what it takes to get in running fast, then that’s what it takes.
State changes
Any setBrush, setTransform or any other state change on QPainter will result in a different set of span functions being set up. Each brush, or fill-type if you like as pens on this level are essentially just fills too, has a special span function associated with it and we also pass a per brush span data. For solid color fills the span data contains the color, for transformed pixmap drawing it contains the inverse matrix, a source pixel pointer, bytes per line and other required information. For clips it contains the span function to call after you clipped the spans. The thing to notice about state changes is that each time you switch from one brush to another brush or from one transformation to another, these structures do need to be updated. Up to Qt 4.4, this was in many cases a noticeable performance problem, bubbling up to 10-15% in profilers when rendering graphics view scenes, but since 4.5 the impact of this is minimal.
Well, perhaps not minimal compared to drawing a 2 pixel long line, but minimal compared to filling a 64×64 rectangle. The point is that though the raster engine is the engine that probably handles state changes best of all our engines, there are some usecases where it still shows up, and it should still be minimized.
Span functions
The task of the span functions is to generate a pixel and combine it with the destination according to the current state of the painter. Though the raster engine supports rendering to any of our image formats except 8-bit indexed, it will internally do all rendering in ARGB32_Premultiplied. Premultiplied alpha has the benefit that we don’t have to multiply the alpha into the color channels and it saves us a division in the blending. The reason for doing all rendering in one format is that the alternative simply doesn’t scale. Just think of the combination of composition modes multiplied with the number of image formats a source image can have multiplied with what formats the destination can have. To support all combinations we have a generic approach where we for each span do:
- Get the source pixels, e.g. from a gradient, pixmap, image or solid color, and convert them to ARGB32_Premultiplied.
- Get the destination pixels and convert them to ARGB32_Premultiplied
- Blend the source into the destination using current composition mode
- Convert the result to destination format and write it back.
This may seem like a lot of work, so luckily the story doesn’t end there.
Special casing and Optimizations
As I outlined in the QPainter documentation patch that I added recently, which was the start of this blog series, its all about defining which scenarios we want to be fast and which scenarios we just need working. Over the years since the initial release of the raster engine in the summer of 2005, we’ve added tons of of special cases to support what we experience as the functions that are called the most and which have the most impact.
QPainter::fillRect and through QPainter::drawRect. In 4.4 both of these implied a state change. Actually, fillRect implied two state changes because it set the brush to what was passed to fillRect and then set it back to what the painter state was. In 4.5, as part of this Falcon project, we introduced a new internal QPaintEngine subclass which supports a state-less fillRect with a color. This matches how applications normally use the painter anyway.- ARGB32_Premultiplied on ARGB32_Premultiplied
- ARGB32_Premultiplied on RGB32
- ARGB32_Premultiplied on RGB16
- ARGB8565_Premultiplied on RGB16
- RGB32 on RGB32
- RGB16 on RGB16
I think that was all of them.
A lot of details, but it gives an idea of what to consider when you write code for this engine. If all you are drawing is 1024×1024 pixmaps, then none of these things matter because all the time is anyway spent in the span function that does pixmap blending, but the second you have more content, several lines, several polygons, which are smaller in size, then these things are critical to achieve good performance.
The overall performance of the engine, when used according to how it’s outlined above, can be thought of as:
Overhead + O(pixelsTouched * memoryAndBusCapacity)
There is nothing scientific about that formula, but when you’re hitting the optimal path, all time should be spent in one of the many for loops inside qdrawhelper_xxx.cpp or even better qblendfunctions.cpp. These loops will spend all their time on per pixel processing. If these functions could be made faster by doing the algorithms slightly differently, then great, but if you see in your profiling that all time is spent in for instance qt_blend_argb32_on_argb32, then that means you told us to blend alpha pixmaps together and we’re doing that as fast as we can and you have zero loss between your app and actual processing. If all time is spent processing pixels, then that is a good thing. The overhead here is the time spent in state changes, function call overhead, and similar.
Some numbers
I got some feedback on one of the previous blogs that a few bar charts would be nice, so I’ll post some numbers on what kind of throughput is possible with the raster paint engine. I’ve timed it on both my Windows desktop machine and on my N900 to get a comparison. The operations range from several million pr second to only a few hundred so the scale is logarithmic, keep that in mind as you look at them.

As you can see, the fill-rate is more or less tied to the number of pixels involved. For some operations it takes a little bit longer to do something, like drawPixmap with scaling is somewhat slower than drawPixmap without, but you see that the rough formula I gave above holds quite often. Double the size of the primitive in each direction and you have one quarter the performance. It was also not my intention to trick you with using different numbers for drawPixmap, its just how the test was set up.
If you compare the three 4×4 rectangle drawing versions, you see that they differ when the rectangles are small. drawRect without brush change is fastest at around 7.4Mops/sec, followed by fillRect at ~6.1Mops/sec and then drawRect with brush change at 1.8Mops/sec. At 128×128 there is just a little difference between the two, which is what I was getting at with the state changes above. It is possible to do them and if you’re drawing semi-large areas, it doesn’t matter, but if you’re plotting pixels, doing loads of small lines here and there or particle effects with 8×8 pixmaps, then you want to do that in a tight loop with nothing else happening.
You can also see that the speed of non-smooth scaling is holding its own vs non-scaled pixmap drawing.
Finally, if you compare the N900 to the desktop Windows machine you see that despite windows only having a 4 times faster processor the speed is often around 10 times worse. Why? Because the CPU isn’t the only limitation, bus/memory capacity is also a limiting factor, and it’s to be honest not a fair comparison…
I hope you enjoyed this post and more will come in 2010.
No related posts.
19 comments
Great post again
You say: “Only the windows version of the raster engine supports drawing glyphs at rotated angles using the fast paths, so beware of that.”
Well, in Qt 4.4 (or 4.5?), under OSX, drawing text at 90° was as a matter of fact extremely slow (but nice). I have measured again today with Qt 4.6: speed is many times faster than before, almost “normal”. But text does not look very good anymore (worse than anti aliasing).
I know you plan to have the raster engine as default for OSX in Qt 4.7. I hesitate using the raster engine today with OSX and Qt 4.6. Any advice on pros and cons?
Philippe: With “not good anymore”, do you mean that the glyphs are only gray antialiased and when the font is small, it becomes somewhat blurry? This is because transformed text is hitting the “slow” path which converts it to a QPainterPath which is then filled.
The text drawing in the raster engine was changed in 4.5 for all platforms to use this “glyph cache” method, meaning we extract the natively rendered glyph image only once. Special support is required in the font engines to provide glyphs rasterized with transformations and only the windows font engine has this capability currently. For raster to be default on Mac we should and probably will add similar functionality to the Cocoa font engine.
Very interesting post indeed, thanks Gunnar.
I’ve been using the “-graphicssystem raster” *buildtime* option since it has been available, so all of my Qt runs in that mode. Yes, I know it isn’t recommended yet to use that as the default, but I have nothing but good experience both with performance and with compatibility. It is just Kolourpaint that likes to be started with “-graphicssystem native”, and it’s a trollsend that you can change that via a simple command line argument.
What I would like to read/learn about is the path the pixels take after they are rendered to a raster memory buffer. With raster being a pure software method, you cannot render directly to the screen’s DRAM (or do you, in case of UMA?), so you need to copy them to the screen. How are pixels blitted to the screen, and how many copies/blits/conversions are required depending on the graphics driver/window manager/compositing mode? On my system, this seems to be the bottleneck.
As of 4.6.0, only the native graphics engine on Mac OS X 10.6 produces good quality text. Text from the raster engine is much too thin and light. (The opengl produces good quality text, but the widgets actually seem to be missing except for their text.)
Yes, “too thin and ligh” is how I should describe vertical text in mode “raster” under OSX. This can be seen with the Qt demo “Main window” where we can dock/undock widgets. Rebuild this example and add “QApplication::setGraphicsSystem(QLatin1String(“raster”))” at the start.
Run then select menu “Dock Widgets > Red > Vertical title bar”.
The widget will first turn white (which is a bug in the raster mode only), but anyway drag and toggle this widget between floating and docked states. In the docked state, the font is too light, too thin. In floating mode, the caption bar draws the text correctly (I guess OSX does the painting and raster is not used) .
This example is not too shocking, I admit, but I have seen worse cases with other font sizes.
It would be perfect if you prepare a small booklet in pdf where all the blogs are collected.
Somehow this could extend your Quarterly series.
@gunnar; blurry? No. “thin and light” is also as I would describe vertical text, in OSX with the raster engine.
Example: take the “Main Window” Qt demo (the one where you can dock colored widgets). Built and run with the raster engine. Select the menu “Dock widgets > Red > Vertical title bar”
First you will get a blank widget (bug in raster mode btw) but anyway drag the widget to make it float, then dock it. You will see a difference in docked state (raster) and floating mode (Mac painting I guess), for the vertical text in the caption bar. This example is not dramatic, but I have seems more differences with other fonts in other contexts.
This is very interesting stuff. What level of AA do you use? 8×8?
Philippe: I’ve seen this effect in the past, but I thought all those “thin” fonts were ironed out. Do you have a non-apple screen by any chance? I’ll have a look at reproducing this once I get back from vacation.
Brandon: The antialiasing in the raster engine is not the conventinal GL YxZ multisampling. It uses the FreeType gray rastereizer and does a pr scanline accumulation per primitive and produces 256 levels antialising level from that.
I like Qt, its good and well designed, but for my work i needed a library that is really fast, Qt disappointed me though.
I was specially interested in improving the text rendering time. Using the library that i am using it takes .5 ms to render an anti aliased text of reasonable size.
I was shocked to see the same text took 40 ms in Qt , that is nearly 80 times slow
Jude: that sounds like a lot. What is your usecase?
I’m glad you mentioned AGG.
Qt, relying on libraries and NIH syndrome… a long long story apparently
@gunnar: Yes, I only have (good) non-apple screens. Do you mean Qt behaves differently if an Apple screen is detected?
Again, I’m speaking of 90° rotated fonts only.
I wanted to clarify that my comment about text looking much too thin and light using -graphicssystem native applies to normal horizontal text, such as in the orderform demo:
http://img34.imageshack.us/img34/8476/orderform.png
mathpup/Philippe: The bug in the image looks very much like a bug we fixed a month or two back. It was visible only on non-Apple screens as Mac OS X can decide to render glyphs differently for some screens. I’ll check why this problem has resurfaced after new years.
I think I solved the problem for my particular situation. I apparently had AppleFontSmoothing (in the defaults database) set to 4, which affected Qt’s text rendering but not the Apple native text rendering.
BTW: Apple changed the way that “font smoothing” was set up in the transition from Leopard to Snow Leopard. It’s now an off/on setting, where off sets AppleFontSmoothing to 0 in the defaults database, whereas enabling font smoothing actually deletes AppleFontSmoothing from the database. (Leopard permitting setting a value from 0 to 3. Snow Leopard actually understand this range of values, but does not permit anything but on/off to be selected in System Preferences.)
One more thing: The letter spacing seem to be a little off in some situations, regardless of the graphics system. In my previous screenshot, the word “receive” is particularly bad. (Ignore the now-solved darkness of text problem.)
The text rendered by Qt is on the left. The text captured from TextEdit.app is on the right.
http://img254.imageshack.us/img254/5472/qtlefttexteditright.png
Will you be adding support for drawing on 8bit indexed images?
steveg: We will not add support for rendering into 8bit indexed. It would be dithered and slow and the color table would be fixed ahead of time, causing the colors to potentially be all over the place. We have thought about supporting grayscale-only or alpha-only 8-bit, but its not on ou current roadmap.
Comments on this entry are closed.