Using hardware acceleration for graphics

Posted by TomCooksey on March 13, 2009 · 16 comments

I am one of our QWS developers. QWS, the Qt Window System, is the heart of Qt for Embedded Linux, formally Qtopia Core, formally Qt/Embedded. :-) What’s great about working on embedded is that you have a view of the system as a whole – the complete stack. So it’s my job to have a pretty good idea how QPainter commands you write in a widget’s paint event end up as voltage levels rapidly alternating between +3.3v and 0v on wires going to your LCD. While QWS handles all the usual window system tasks such as keyboard focus and mouse events, the biggest component is probably graphics. The window system is inherently part of the graphics stack and I actually spend a lot of my time working with the graphics team.

Over the last year our clear focus has been on performance. This performance push has been in all areas on all platforms and architectures. When it comes to graphics, there’s a very broad range of hardware Qt can expect to see – from a simple MMU-less ARM with only a frame-buffer all the way to scary gamer PCs with thousands of graphics cards & neon lights installed. Our lives are made complicated because if we’re not careful, one can end up writing code which runs faster on that little arm than it does on the gamer PC. This post hopes to explain why this is so.

Let’s begin by categorising the range of hardware available:

First, we need to differentiate Unified Memory Architecture (UMA) devices from those with dedicated graphics memory. Generally, high-end hardware will have dedicated graphics memory whereas low-end devices will just use system memory (sometimes reserving a memory region, sometimes not). This is pretty strait forward on PCs – You can almost tell from a PC’s price tag if it has dedicated graphics memory. Sadly, in the world of embedded devices, this is not the case. High-end devices often have UMA and low-end devices (especially set-top-boxes) have dedicated graphics memory.

The next differentiation is the graphics operations supported by the hardware. Generally they are wide ranging but can be loosely categorised as:

1) No acceleration (framebuffer only)
2) Blitter & alpha-blending hardware
3) Path based 2D vector graphics
4) Fixed-function 3D
5) Programmable 3D

Hardware with no acceleration whatsoever or a simple video overlay is the most common we see in embedded devices. This will always be the case until someone figures out how to design and manufacture silicon for free. Blitter and alpha-blending hardware is almost non-existent on desktops these days, but it does seem to still be around in the current generation of embedded hardware. Path based 2D vector graphics is pretty new and looks ready to replace blitter-only style hardware. NOTE: This does not refer to hardware which can draw a 1-pixel wide, non-anti-aliased, non-dashed, solid-colour line without clipping. Fixed-function 3D tends to be the older generation of desktop graphics processors. Generally, fixed function has pretty much been replaced with programmable 3D. This is even the case on mobile hardware.

So, there’s five categories of operations and two types of memory architecture leading to ten different overall types of graphics hardware. I’ve collected an example of each, just so you know we don’t make this stuff up. :-)

Type UMA Non-UMA
None Marvel PXA270 Various*
Blitter NXP PNX8935** Fujitsu Lime MB86276***
2D vector Freescale i.MX35
Fixed-3D Freescale i.MX31 nVidia GeForce 2
Programmable-3D TI OMAP3530 AMD Radeon HD 4600

* Various: Some devices use dedicated framebuffer memory to reduce load on the system memory bus
** NXP PNX8935: http://www.nxp.com/applications/set_top_box/ip_stb/stb225/
*** Fujitsu Lime MB86276: http://www.fujitsu.com/downloads/MICRO/fma/pdf/MB86276.pdf

The next question then becomes: How can Qt off-load graphics operations to these different types of hardware? Well this is done through Qt’s QPaintEngine API. The idea is that Qt applications (& Qt itself of course) always uses QPainter, which in turn uses one of the paint engines. To take advantage of graphics acceleration, we write a new paint engine (like the OpenGL ES 2.0 engine we’ve added in 4.5.0). The advantage is that existing applications can benefit from new rendering back ends and new applications can still work on older or less advanced hardware (albeit with lower performance). There seems to be a misconception in the community that Qt is out-of-date because it has no OpenGL scene graph API. While that statement is technically correct, Qt does have QGraphicsView scene graph API which uses QPainter. Because it uses QPainter, if OpenGL (for example) is available, it can be used as the rendering back end.

So, now that’s cleared up, what QPaintEngines are there and do we have all the hardware acceleration types covered by them?

Well, for devices with no acceleration, Qt will use it’s raster paint engine. The raster engine has seen some very impressive optimizations in Qt 4.5, as Gunnar has previously blogged about. For higher end graphics hardware, there’s usually a nice high-level API which is powerful enough to express all of QPainter. I.e. OpenGL & OpenVG. The trouble we’ve recently hit is the hardware in-between, I.e. those with blitters but not much else. Such hardware is not powerful enough to express the whole of QPainter, so we must fall back to the raster paint engine for unsupported operations. The raster paint engine needs a pointer to the memory it renders to (and reads from). On UMA systems, this is not a problem as the buffer is obviously in system memory (that’s all there is!). It’s on systems with dedicated graphics memory where the fun begins…

First, on many systems, you simply can not map graphics memory into your process’ address space – The architecture simply has no way to do it. On such systems, the buffer must be copied to system memory, rendered to with the raster engine, then copied back. If this happens _every_ time you switch between a fall-back and the hardware, it’s going to be _slow_!!

On some systems (particularly PowerPC for some reason?), the graphics controller sits on the SoC’s external bus and can be addressed directly by an application. All that needs to happen is for the kernel to configure the process’s page table to point to the graphics controller’s memory range. It’s then up to the graphics controller to access data in it’s dedicated memory on behalf of the host CPU. Although this kind of set-up does allow the raster paint engine to get a pointer to graphics memory, all accesses go over this external bus – which is usually slow. On PC/x86 architecture, things get more even more complicated, the kernel has to fiddle with lots more hardware, cache controllers, PCI bus controllers, IOMMUs, etc. However, in all cases, if you’re lucky enough to get a pointer to graphics memory, all access must go over a slower external bus.

So now we know what’s going on, what conclusion can we draw? Well, reading & writing to external graphics memory is slow. If your on non-UMA, don’t have OpenGL or OpenVG available, but do want to use your blitter then you’d better make sure your mostly using QPainter::drawPixmap(). NOTE: Graphics view’s cache modes can help you out a lot there – see Andreas’ previous posts! ;-) Otherwise falling back to the raster engine is going to be slow. Fortunatly, this type of hardware is (finally) on it’s way out.

NOTE: I should also mention that there’s a similar issue with X11. There’s no API to get a pointer to an X pixmap and X11 does not provide enough API to implement the whole of QPainter. While the X11 paint engine does not inherit from the raster paint engine, it does make use of software fall-backs which involve copying the pixmap, executing the fall-back and then uploading the result. It’s for that reason that we’ve added the raster graphics system which uses system memory (via the MITSHM extension) in Qt 4.5. On desktop, this is a fairly temporary measure until our OpenGL 2.0 engine & graphics system is in a fit state to take over all of Qt’s rendering. No promises, but we hope that can happen for Qt 4.6. For X11 on low-end embedded devices (like the n810), MITSHM provides a pretty decent long term solution.

So, when we look to the future of Qt’s graphics architecture and the required paint engines, I think we’re well on the way to having all the bases covered:

Type UMA Non-UMA
None Raster Raster*
Blitter DirectFB DirectFB**
2D vector OpenVG*** OpenVG***
Fixed-3D OpenGL (ES) 1.x OpenGL (ES) 1.x
Programmable-3D OpenGL (ES) 2.x OpenGL (ES) 2.x****

* When using raster on NUMA, rendering is actually done in system memory first, then flushed to VRAM
** This is the one which is going to be slow when doing anything other than QPainter::drawPixmap()
*** It shouldn’t be a big surprise we’re researching an OpenVG paint engine!
**** Qt 4.5 contains a new paint engine for OpenGL ES 2.x which we’re now making work on desktop OpenGL 2.0

I just want to finish by asking you to take another look at the above table. Do you notice anything interesting? All of the graphics systems (apart from DirectFB) are cross platform which means, when we make something faster in one engine, all platforms will benefit.

QShare(this)

No related posts.


16 comments

1 Andy March 13, 2009 at 6:49 pm
 

Tom:

Just wanted to thank you for the great explanation! Really clear and to the point.

2 Anon March 13, 2009 at 7:46 pm
 

is there any high-level info (abstracted block diagram or similar) about that new “our OpenGL 2.0 engine & graphics system” to see where it is designed for?

also would love to see a roadmap on supported future Linux graphics stack (DRI2, Gallium3D, OpenCL and friends), to see how one could stuff Qt on top of such system (and Qt Software’s policy on the subject).

and is the current QWS already embedded to the Qt (like the “Qt Everywhere” suggests) or is it on the todos for the Qt-4.6+?

and finally… there is not too much info about porting any of the Qt’s graphics systems to a custom hardware, would it be too much hassle to write up such wiki?

ps. Qt-4.5 is simply amazing, thank you to all Qt devs for a very nice toolkit (actually it’s much more than just a “toolkit” per se nowdays). :-)

3 Elvis Stansvik March 14, 2009 at 1:02 pm
 

Great post! You’re heroes for taking care of all these nitty gritty details. With the high-level nature of Qt development, it’s sometimes easy to forget all the hard work down in the stack that makes it possible. Keep up the good work!

4 miniak March 14, 2009 at 2:48 pm
 

What about using the new Direct 2D API in Windows 7?

5 AndrejT March 14, 2009 at 7:48 pm
 

Excellent overview of the graphics system in Qt. I’m also interested in what is being done regarding the use of new technologies esp. Gallium3D and OpenCL. For example would it be possible to make a special Gallium state tracker for Qt Painter to plug it directly into Gallium. Or is it just not worth it or possible and it is better to focus on OpenGL (BTW, how’s with support for OpenGL 3?). Anyways thanks for the article and can’t wait to see more like this one.

6 Andreas March 15, 2009 at 10:47 pm
 

Reportedly the problem with OpenGL painting (on the desktop) right now is that it is not pixel-perfect, and I have seen some artifacts myself. Lines that should form a rectangle don’t meet at the corners and such. How are you going to fix this? And, well, where exactly is the problem? Are there simply bugs to fix or do you need workarounds for some weaknesses in OpenGL concerning pixel-perfect painting?

7 TomCooksey March 16, 2009 at 8:30 am
 

DRI2, Gallium3D, KMS, etc. are all below the OpenGL interface on X11, so Qt doesn’t really need to worry about it. QWS on the other hand can take advantage of these new (uber cool!) technologies. We’ll just have to see what happens… ;-)

Direct2D is a nice looking competitor for QPainter. However what we care about is being able to tell the graphics hardware what to do in an efficient way. OpenGL (ES) 2.0 provides a good way for us to do that using shader programs which map nicely to the way modern graphics cards work. My guess is that Direct2D is implemented in a similar way to Qt’s OpenGL paint engine, in which case we might as well use OpenGL. Same kind of argument for Direct3D too. Ultimately it’s just an API to tell graphics hardware what to do. The only thing we have considered is a Gallium3D state tracker. However we don’t think it will give us much over OpenGL 2.0/GLSL.

OpenCL is an interesting technology – do we try to support it though QtConcurrent? Probably not, but we don’t know yet. One thing it could be useful for is calculating the triangulation of QPainterPaths for use with our own high quality anti-aliasing.

Pixel perfection is one of the things we have struggled most with on the GL paint engine. That’s the reason why we consider it experimental for every-day use. You will see a lot of effort going into the new OpenGL 2.0 engine during 4.6 to get it pixel perfect. Once it is, we can completely replace both X11 & raster (MITSHM) engines on almost all desktop and many high end embedded platforms.

8 Marco March 16, 2009 at 7:06 pm
 

Why not implement the Raster Engine in OpenCL? So you have full control over the rastering and no aliasing artifacts.

9 Anon March 17, 2009 at 4:19 am
 

as “The Zack” does already have his OpenCL state tracker (with in-flight LLVM) to the Gallium3D already on its way it would make sense to have an conventional/legacy Mesa OpenGL on top of X11, but also an backend to use directly OpenCL would be interesting way to render Qt (and by looks of things more future proof at least on Linux platform).

but the item about QWS’s future is still somewhat unanswered, how about systems without X11 and/or OpenCL? no framebuffer-only devices in the future, all have GPUs? an off-screen renderer a-la Mesa? is such backend present in the standard Qt-4.5? or will there be?

10 TomCooksey March 17, 2009 at 8:14 am
 

We’re not ruling out OpenCL, it’s just not on our roadmap. Our belief is that OpenGL 2.0 is a good enough API to program modern graphics hardware and make it render QPainter. You’re right, we may be able to program the same hardware using OpenCL – but what would be the point if it gives similar performance to the GL engine?

As for QWS, there’s always going to be devices where that 1$ extra for GPU silicon will make the difference between profit and loss. Qt for Embedded Linux also has a new PowerVR screen driver in 4.5 too.

11 mad wax March 17, 2009 at 5:57 pm
 

I read alot about OpenGL on Linux, I know TomCooksey your area is QWS so some of this I would expect but what about Windows? Is Qt only going to support hardware accel. throught OpenGL or is a true D3D/DirectX backend going to be written as well?

Qt’s current rendering speed is a real killer in my eyes and allowing the GPU(s) to handle the work will make Qt app’s fly! I must say 4.5 has done a lot to speed things and my hat goes off Qt for it.

12 Jeremy March 17, 2009 at 10:31 pm
 

Thanks for an enlightening article about the different types of hardware out there. Also, it’s nice to be able to put a name on at least one of the people behind QWS!

I submitted a patch on the issue tracker some time ago to add 2 bit per pixel support to QWS, any news on that?

13 TomCooksey March 18, 2009 at 1:43 pm
 

mad wax> Please read the bottom of the blog post: “All of the graphics systems (apart from DirectFB) are cross platform” :-) Windows has had OpenGL support since Windows 95.

As I hinted in other comments, Direct3D/Gallium3D/OpenCL/Direct2D/OpenGL/ are just different APIs for programming the *same* graphics hardware. Of these different APIs, Qt would like to get as close to the hardware as it can. Gallium3D would be the closest we would dare to get in this sense, but it also supports the least number of platforms. We believe OpenGL (ES) 2.0 is a good compromise as GLSL lets us get pretty close to the hardware and it is also supported on a wide verity of platforms.

Jeremy> No idea, sorry. But thanks for the patch!

14 AlexBravo March 19, 2009 at 6:58 pm
 

Tom, great job with implementing that OpenGL ES 2.0 paint engine.
Does all of the code live inside of qt-embedded-linux-opensource-src-4.5.0\src\opengl\gl2paintengineex\?
Are there instructions somewhere on how to make it work with QWS?
Keep up the great job and let’s hope that we see more OpenGL 2.0 hardware in the embedded world!

15 perezmeyer March 22, 2009 at 1:28 am
 

It is really great to be able to read this wonderful posts. Things explained in details, in a clear way. Keep rocking!

16 Joannah March 24, 2009 at 8:29 am
 

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Joannah

http://2gbmemory.net

Comments on this entry are closed.

Previous post:

Next post: