This is Part 3 of the translation of the performance chapter of the Windows Phone developer book I co-authored with many of my Hungarian peers. Other parts of this series can be found here.
The last time we saw our drunk Vuvrian hero, he was buzzed by a couple of procedurally animated Tie Fighters, but something just wasn’t right: the animations were choppy. So, let’s turn on the GPU in the application!
Clicking the Enable GPU Acceleration button changes the situation dramatically. The Redraw Regions stop the crazy flashing, indicating that there is no more rasterizing happening. The phone animates 3 fighers smoothly – and it can handle 10 fighters without dropping the frame rate below 20. At the same time, the emulator can handle ten fighters without any noticeable slowdown, but the desktop GPU is designed to handle Call of Duty MW3, so it is not a fair fight…
What is happening when the Enable GPU Acceleration button is clicked?
private void enableGPU_Click(object sender, RoutedEventArgs e)
foreach (var fighter in Ties)
fighter.CacheMode = GetTIECacheMode();
private CacheMode GetTIECacheMode()
bool enabled = enableGPU.IsChecked.Value;
return enabled ? new BitmapCache() : null;
We change the CacheMode property of the TIEFighter controls depending on whether the GPU acceleration has been turned on or off. When off, it is set to null, while turning the GPU on creates a BitmapCache object for every control. The name ‘BitmapCache’ describes precisely what GPU acceleration means: the pixels of the rasterized UIElement is cached in the GPU. The GPU can manipulate this texture: move it, change its opacity, rotate it, resize it, perform a perspective distortion on itm blend it with the other textures or do a rectangular clipping. If the texture of such a cached element is changed (for example, the color changes), the Silverlight runtime automatically refreshes the texture in the GPU’s memory.
All of this means that some animations perform better than others. For example, the CPU is doing the non-rectangular clippings, the color animations, the changes in layouts, gradients, etc. If we are using these kind of animations, do it in small sizes – thus the CPU doesn’t have to calculate too many pixels and the high FPS values can be kept.
There is one more option that helps understanding the GPU’s work:
If you turn on Cache Visualization, you get this result:
Cache Visualization makes every GPU texture semi transparent, and tinted with blue color. This allows us to see which textures are composited by the GPU.
Now is a good time to look at the Frame Rate Counters of the above screenshot:
UI and Composition Thread FPS: these are much higher than what you can expect on a first generation phone, since the screenshot is created with the Emulator
Texture memory used: 8 MByte, it’s OK.
Number of surfaces: The GPU works with 12 textures. 10 of then belong to the TIE Fighters (one for each), one to the background, and one to the controls. Note that while we have not specified a BitmapCache value for the background or the controls, they are also cached on the GPU, because there are textures in front of them and behind them. These are called implicit surfaces, and they are indicated by the number counter with the 002 value.
Fill Rate: The problem is here, and the phone was kind enough to nicely make it red to attract attention. The 11.6 value of the Fill Rate means that in every frame, the GPU is drawing 11.6 screens worth of pixels. This is a very high number! On first generation phones, the frame rate begins to degrade at 2.5, and the animation becomes hurtfully jerky above 3.5. The high fill rate is caused by the many (and increasingly bigger sized) TIE fighters, but in our case, the fact that they are going outside the screen helps a lot. The second generation phones have a much higher Fill Rate tolerance (my Lumia 800 can handle 16 fighters without any frame rate drop, and only drop below 20 FPS at 46 fighters). However, we still have to optimize for first generation devices for a long time, so we cannot allow much laziness.
StoryBoard animations and the GPU
So far, we have been working with procedural animations. It is much easier to create a similar animation in Blend, using StoryBoards. You can find the result in the StoryBoardAnim.xaml file – and some added Easings make the movement of the fighters much more natural. To run this version, again change the DefaultTask in WMAppManifest.xml:"
<DefaultTask Name ="_default" NavigationPage="StoryBoardAnim.xaml"/>
Launching the application now gives us a perfectly smooth animation, even with five TIE Fighters! We can easily verify the usage of GPU by turning on the GPU Visualization: every fighter has its own GPU surface. This is only possible by taking advantage of the GPU – which we haven’t turned on anywhere. It was the Silverlight runtime that turns on the GPU acceleration every time we use a StoryBoard animation, and only do things that the GPU can handle (see above). Quite convenient!
UI, Compositor and Input threads
The StoryBoardAnim.xaml page does not contain any stars. However, there is an “Add 6000 stars” button on the screen. If you press this, a few seconds later the 6000 stars are shown. Examining the code, you can see that the event handler for this button simply calls the CreateStarField method with an argument of 6000. There are no background threads or any other black magic around. Based on what I wrote about in Part 2 of this series, we know that this must occupy the UI thread. And yes, the UI thread is completely busy: the button is kept “pressed” long after your finger has been lift off from the screen (it is the UI thread that would redraw the button in a “non-pressed” state).
At the same time, the animation has not slowed down a bit during the drawing of the stars! The reason for this is that the animations are running on a separate Compositor thread (and are drawn on the GPU). This is valid as long as the animations we are using do not require rasterization, or a change in the layout does not cause the recalculation of the visual tree. Movement, scaling and rotation are implemented through a RenderTransform, which is applied after the layout, and does not effect it).
Windows Phone Mango introduced a new thread to handle Input. This change, along with some other finetuning solves most of the responsiveness issues in the first release of Windows Phone 7. The most painful issue happened when scrolling through lists – the animation of the list was smooth, but if you touched a moving list, it only reacted after a considerable delay (often as much as a second). The reason for this was that the UI thread (which handled input before Mango) was busy rasterizing the new list items, and couldn’t handle the touch event in time. Needless to say, this resulted in a very uncomfortable user experience – fortunately, with Mango’s separate Input thread, this problem is now history.
In the next part of this blog series, I will give some general advice on handling bad responsiveness, List performance. We will discuss the importance of subjective performance and related techniques. Finally, we will talk about memory optimization, and some tools that can help pinpointing problem areas in our apps. Tune in next time!
Mar 22 2012, 02:53 PM