Kinect is fun. Kinect is new. Kinect is awesome. However, doing anything in Kinect right now requires developer work – and even with the available building blocks to rely on, for a developer to get up to speed takes a long time. This means that interaction designers cannot do Kinect development without a top-notch developer, and developers are needed even to finetune gesture settings (tolerance, timers, etc) and visualizations. In essence, designers cannot “play” with their design, cannot just turn a knob and see the end result.
Luckily, there are quite a lot of great examples in XBOX game menus that provide great interaction models. If the Kinect WPF Toolkit (KWT) allows creating similar experiences, without deep knowledge of Kinect or even coding, it has reached its goal. At this stage, we are only dealing with single-user scenarios.
Below you can read the first draft of the first draft of the main idea.
note: The Kinect WPF Toolkit borrows code from Joshua Blake’s amazing InfoStrat Motion Framework, but does not build on it as of now. I am especially grateful for Joshua’s encouragement, and for the huuge amount of time his image processing and DirectCanvas code saved me so far.
Vision
The goal of the Toolkit is to give Interaction Designers the means to create Kinect-driven applications (not games!) with the similar ease they are creating mouse / touch driven UI today in Expression Blend.
Key components
There are two key component types in KWT. Recognition components and visualizers. Recognition components deal with detecting users, hands, gestures, poses and so on. Visualizer components aid in visualizing what the camera sees or the recognizers detected.
Recognition components
User recognition – events when a user enters / leaves the scene, number of users. Events for PrimeSense calibration. There is always only one “Active” User.
Pose recognition – XBOX equivalent: the 45 degree “Kinect Menu” gesture.
The app should be notified if your body or parts of it enters or leaves a specified position, or stays there for a certain time. The app should also be able to visualize the time spent in the pose, and the time left for the pose to be acted upon. Ideally, the poses are defined by example – just performing the pose in Front of Kinect, or choosing a frame from a recording. The designer should also be able to set a “tolerance” value.
“Pointer” – XBOX Equivalent: the “Kinect experience, menu navigation in Kinectimals, Kinect Adventure”
One of the user’s body parts (mostly, but not neccessarily a hand) acts as the pointer on screen. When it is over interactive elements, they react, like a “mouseover” effect. Keeping the hand over such an element for a given time will activate it (click).
Two-hand gestures – Equivalent: multitouch pinch zoom and rotate on Surface
The user’s both hands are tracked. When they are at a certain distance from the shoulder, they are considered “touching” the screen.
Gesture detector – detecting gestures such as Push, Swipe, Steady, Wave, Circle, etc. (NITE) with any body part. Thresholds should be configurable (relative to person’s size)
3D joystick
Body parts (such as a hand, head or the center of mass) can be used as a simple 3D joystick. The zero point can be calibrated using a calibration gesture. The designer should be able to set dead zones, and a mapping function. The latter is needed for things like height mapping for center of mass – standing on tiptoe lifts COM a lot less than crouching.
Visualization components:
Timeout
Certain gestures, poses have a timeout associated. These are usually displayed as a circular progress bar, so it would be great to have this in the library.
Body part(s)
Body part (mostly hand) visualization is important so that the user knows what he/she is pointing at when using it as a pointer.
Body parts can be pictured either via a simple graphics (see skeleton visualization), or using the image from the depth camera, processing it with a threshold and other bitmap filters. It would also be interesting to generate WPF Paths based on the outline of the body part. These Paths could be filled with a standard brush, used as opacity masks for the rgb / depth cameras or even used as objects in a physics simulation.
Entire Body
The body itself can be shown using the RGB camera, the depth camera, the skeleton, or a 3D avatar based on the skeleton. Various bitmap filters can be applied to achieve the desired effect. It would also be interesting to generate Paths based on the outline of the user. These Paths could be filled with a standard brush or even used as objects in a physics simulation.
RGB Camera
The output from the RGB camera.
Depth Camera
The output from the depth camera. Customizations: general fill color, banding effect for Z coordinate. Fill color for active user, other users.
Skeleton
The skeleton will use templating to allow styling of the different body parts and bones. The SkeletonVisualizer can be also used to show the “hand” cursor (just delete the other elements from the template). It works with 2D coordinates. Question: if the rgb / depth output can also be used as a “texture” for the skeleton, that could also have interesting use cases. Performance may be the limiting factor here.
Composite visualizations
The above visualizations can be combined to achieve an even more exciting effect. For example, the depth sensor output can be used to mask the RGB camera to only show the parts where the user is. The “cursor” can be shown along with the depth / RGB camera, or on top of a 3D avatar as in Kinect Adventures. For this to work, all visualizations have to use the same 3D – 2D transformations. In other words, the left hand should be at the same pixel position on the depth, rgb bitmap and the skeleton visualizer.
Interactive elements
It would be useful to get the “Tilt effect” in the library, as seen in the XBOX dashboard. Maybe a TiltableButton control with depth layers?
Integration with WPF
There are two kinds of events to deal with: global, control independent events, such as poses, and control-related ones, such as hover, activate via holding, etc. Control specific events should be translated to mouse / touch events whenever possible to help with development, testing and code reuse.
Current Status
After days of research and experimenting, Kinect WPF Toolkit only has less than a day of actual code writing behind it. Still, the main principles seem to be valid. Currently the main engine and the DepthCamera are implemented (but does not accept any parameters apart from turning it on or off), and the SkeletonVisualizer is already able to track the head. The SkeletonVisualizer is a fully template-able control, following the Parts and States model for WPF Controls, and easy customization by Blend – as described in the vision above.
Here is what the current state looks like:

All this is achieved with this code (no codebehind, no initialization, just what you see here):
1: <Window
2: xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
3: xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
4: xmlns:KinectWPFToolkit="clr-namespace:KinectWPFToolkit;assembly=KinectWPFToolkit" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:KinectWPFToolkit_Visualizers="clr-namespace:KinectWPFToolkit.Visualizers;assembly=KinectWPFToolkit" mc:Ignorable="d" x:Class="KinectWPFToolkitDemo.MainWindow"
5: Title="MainWindow" Height="350" Width="525">
6: <Window.DataContext>
7: <KinectWPFToolkit:KinectViewModel/>
8: </Window.DataContext>
9: <Grid>
10: <TextBlock Text="{Binding Engine.Status}" Margin="45,39,0,0" VerticalAlignment="Top"
11: Height="45" FontSize="24" HorizontalAlignment="Left" Width="190"/>
12: <Grid Margin="45,88,35,34">
13: <Image Source="{Binding DepthCamera.Image}" d:LayoutOverrides="VerticalAlignment"/>
14: <KinectWPFToolkit_Visualizers:SkeletonVisualizer
15: Template="{DynamicResource SkeletonVisualizerControlTemplate1}"/>
16: </Grid>
17: <CheckBox Content="Enable Depth Camera" HorizontalAlignment="Right" Height="22" Margin="0,39,8,0"
18: VerticalAlignment="Top" Width="145" IsChecked="{Binding DepthCamera.IsEnabled}"/>
19: </Grid>
20: </Window>
And here is the template that specifies what the Skeleton looks like (only head for now):
1: <ControlTemplate x:Key="SkeletonVisualizerControlTemplate1"
2: TargetType="{x:Type KinectWPFToolkit_Visualizers:SkeletonVisualizer}">
3: <Viewbox Height="Auto" Stretch="Uniform" Width="Auto" >
4: <Canvas Background="#6B000000" Opacity="1" Height="480" Width="640">
5: <TextBlock x:Name="Head" HorizontalAlignment="Stretch"
6: Height="43" Margin="-12,-21,0,0" TextWrapping="Wrap"
7: Text="HEAD" VerticalAlignment="Stretch" Width="24"
8: Foreground="White" FontSize="16" Canvas.Left="169" Canvas.Top="25"/>
9: </Canvas>
10: </Viewbox>
11: </ControlTemplate>
So, that’s it for now. What do you think?
Posted
Mar 25 2011, 01:03 AM
by
vbandi