6 Degrees of Freedom Gaming in Android with Project Tango – Google I/O 2016


Hello everybody
and good morning. Thank you guys for getting
up and joining me here today. I know it was a
late night for me. My name’s Eitan, and I lead
the developer engineering team on Project Tango. And today, I want to
walk you through some of the capabilities
of the device. But more specifically,
I want to focus on how you might build
an augmented reality experience with Project Tango. So I know we’ve been
relatively popular here at I/O. How many of you
have seen a Project Tango demo while you’ve been here? Yeah, did you like it? Yeah? All right, great. So that’s a lot of you,
but not all of you. Just a quick shout-out–
we will have the sandbox going throughout the day today. So after this talk, if you
want to go by and check it out, you should definitely get your
hands on Tango for yourselves, do a demo. All right, so for
those of you who haven’t heard of
the platform, I’m going to do a brief overview. I’ll keep it relatively brief. I really do want to spend most
of my time talking about how we actually build
applications for the platform, as well as the APIs that we
provide for augmented reality content. Clicker– awesome, right. So our brief overview–
Project Tango fundamentally is about extending
the capabilities of our mobile devices
beyond the screen. If you look at your phone today,
you interact with it mostly with your head down,
touching different buttons, maybe swiping through photos. Really, the only way that your
phone interacts with the world is through pictures and videos. And Tango seeks to
enable your phone to do much more, to
gain an understanding of the physical space
of your environment, to be able to track the
position of your device as you move through
space, and also to be able to sense the geometry
of your environment as well. When you have
these capabilities, it allows us to turn your
device into a magic window into the world. We can play games like
Jenga from Schell Games, where you place a
virtual Jenga tower on a table in front of you,
and you don’t have to clean up. We can do things like
measure your sofa. You can measure the height
from the ceiling to the floor without playing the game
where you see if your tape measure falls over. And we can also, towards the
future, navigate from point A to B in space. So imagine going into
a mall and wanting to know even where your
friend is in the mall, and being directed in an
augmented reality view to them. So that’s kind of
what we’re working on, and to enable these
kinds of applications, we provide a couple of core
technologies to developers. And you’ve probably
heard of these if you’ve been to some
of our other talks. But they are motion
tracking, depth perception, and area learning. And I’ll go through
each of them briefly over the next couple slides. So motion tracking is
really the core technology upon which Tango is built.
And motion tracking allows Tango devices to understand
their relative position and orientation to
where they started. So if I just use my
device as a prop, as I move with my Tango
device in the world, it knows that I’ve walked,
say, a meter forward and turned 90
degrees to the left. And this is the motion tracking
capability of the device. It’s using visual features
in the environment, combining that with inertial
sensors to estimate position. The second piece of technology
that we expose through our APIs is depth perception. And depth perception allows
our Tango devices to see in 3D. So on our tablet here we’ve
got a special depth camera, and it projects infrared
light out into the world. And it has a camera that
can see that infrared light. And by seeing the pattern,
it can judge, OK, my device is a meter to this surface. And so I can start
combining motion tracking with depth perception
to do things like real-time meshing
of environments. It’s very powerful. Now you have an
understanding of where a table is, where a chair
is, and where objects are in the world,
and your applications can start to take
advantage of it. And the third capability of
Tango devices is area learning. And Wim gave a talk– I
believe two days ago– on area learning. But it’s the memory
of the device. It’s the ability
of Tango devices to remember where they’ve been
before and to recognize spaces. So if you think about
coming into this room, and maybe you see the
Exit sign at the back, if the device saw
the same thing, it remembers roughly
how the Exit sign looks. It remembers how the scaffolding
here looks around it. And so when you walk back into
that space a different time, or if you have
multiple devices, they recognize where they are
relative to that landmark in the world. And with that, you can enable
multiplayer experiences, because you have a shared
reference frame for Tango devices, and you
can also enable, towards the future, things
like this indoor navigation use case, going from point A to
B, because the device knows what space it’s in. OK. So at this point,
we have actually gone through everything that
I normally get to talk about. I have never gotten to go deeper
on Tango technology than this. And here, we’re through it
in, like, seven-ish minutes. So I feel pretty
good, and I’m really excited about this next part. We’re going to go
deep on how you build for AR, the considerations
that you take into account, the different pieces
of Tango technology that enable you on a journey
to better and better AR, and also we’re going
to talk about some of the limitations of
augmented reality, as well, and how you can be creative
in building applications that work around it. And so for something
of this magnitude, I was thinking to
myself, you know, OK, what are we going to build? What am I going to show here? And I was browsing the
internet and looking around, and thought what really
represents the internet? What really brings it
to life in front of us? What can we do that just
represents the magnitude and scale of the occasion? And so we are going to build a
cat game in augmented reality in front of you– the most
majestic of creatures. And specifically, we’re
going to take virtual cats, and we’re going to put them
into our physical world. Also, in preparing
for this talk, I was procrastinating
a little bit. This Wikipedia page is
way better than you think, if you want to check it out. I know. It’s surprising, right? All right, so back to our cats. Let say our idea is to bring
as many cats as possible from virtual space into
our physical environment. We’ve got 3D models
of all of these cats, and we want to bring
them to life in 3D. So where do you even start? This picture is
kind of the ideal of what we’d be going for, but
it’s really hard to achieve. So we’re going to take
some steps along the way. And we’re not going to
get all the way there, but starting is hard. So first, I’m going to take
a step back a little bit. How many of you have
experience with linear algebra, 3D geometry, coordinate
frame transforms? You know, for an audience,
that’s pretty good. But I’m going to do a
little bit of a refresher, and also, just as comfort
for everyone else, I’m not going to go
too deep into that. We’re going to keep
it fairly high level. But mostly I’m going to talk
about the coordinate frames that Tango defines as
they relate to each other and as they relate to the device
as you move it through space. All right, we’re
back to our cats. So the goal, again, is
to take a virtual cat and to put it in
our physical world. So here we’ve got a living room. We’ve got our virtual
cat, and we want to place it on the ottoman. And the cat should stay
fixed in the environment as we move around it. And ideally, it
would even interact with the environment
a little bit, so maybe the cat jumps down
from the ottoman to the floor, or it jumps up on the
table– just a little bit of playfulness– so easy enough. All right, we’ve got a couple
of things going on here that we need to cover first. We’ve got the camera feed. So the camera is the 2D
projection of the 3D world onto an image plane, as
seen through the lens. And it comes in at 30 frames
a second on Tango devices. So as I move my camera
around the world, I might get a different view
of sofa, of the fireplace, of the TV and the table. And that 3D view is compressed
onto this 2D image plane, and it’s in color. And in the real world when
you’re walking around, you don’t have to worry about
much beyond this, right? Physics takes care of it. Physics is awesome. And the light comes
in, and you just get different perspectives,
based on how you walk around. But now we’re talking about
compositing a virtual character into this physical scene,
and it gets a little bit more complicated. The cat doesn’t actually
exist in the real world. So to understand how we
render this cat as if it’s in the real world, let’s forget
about the real world entirely for just a moment and
explore this problem. So say, instead of in the real
world, we’ve got a virtual cat, and we want to render it on
the screen of our device. We want to place it
approximately two meters in front of the device
and hold it fixed and just move around it. For those who are familiar
with building 3D games, this is very, very
similar to how a camera system in a standard
rendering engine works. We’re going to place our object
in a coordinate frame, called the world frame, of the
game, and then we’re going to render it relative
to the frame of our camera as it moves around it. And at the start, we’ll
view the cat from the left, and at the end of our
motion around the cat, we’d like to see
it from the right. And it always stays fixed. So with Tango, this is
exactly what we’ll do. It’s just that instead of
programmatically setting the transform of the camera
as it moves in the world, we’re going to get it from
the Tango device itself. Tango has a couple
of coordinate frames I’ll talk about that
are relevant for this. So the world frame for
Tango is represented as start of service,
and it’s wherever you started with the device. And the device frame is
wherever you currently are with the device. So if I start here, that
defines start of service. And if I ask for the transform,
between start of service and device, when
device is here, it will give me sort of two meters. OK, so now to show
this a little bit I’m going to give a demo of
what this looks like in action. OK. So here I’ve got a
completely virtual world, and I’ve got this ca. We’ll call him Mittens, I guess. And as I move around
Mittens, he stays fixed in the virtual world. But he’s not tied to my
physical space in any way. I can make him
walk, but it’s just on a plane that exists in space. And he can also paw on the
camera when I get close to him. [APPLAUSE] It gets better. I promise. All right, we’ll go back. To make this a
little more concrete, I want to just show
the calls that you need to make in our APIs
to be able to do this. So Tango provides a function
called getPoseAtTime, and you can pass it the base
coordinate frame that you would like, as well as the target
coordinate frame that you would like, and it will
return you the transform at that point in time. So here you can see I’m
defining a frame pair, where I want to ask where is the
device, relative to the start of service? So my base frame is
start of service, and my target frame
is device, and I say get me the transform at time
T, where maybe time is now. And that timestamp
actually turns out to be really
important when you’re trying to do things like
composites onto an image, because the camera
takes an image at a very particular point in time. And so that’s
something that we’ll be coming back to to
make sure that the cat is well registered
with the environment when we go to composite it. But overall, this is
pretty simple, right? You can just at any
time ask for where the device is in the world. And boom, you know, you’ve
got a handheld AR cat viewer. All right, so it’s not exactly
what we’re looking for, though. The goal, again, is to place
the virtual cat on the ottoman in the physical world. And the first thing you
might think of doing is just taking the RGB
image, and we’re just going to slap the cat
onto the RGB image. And it’s going to be fine. We’ll just use that
as a background. But it’s not going
to look quite right. This slide demonstrates
it, but I’ll go to a demo, again, just to make it
a little more clear. All right. So now we’ve got Mittens, but
he’s not looking super awesome. He’s just floating in
space in front of me. And here, I really would
rather he be on the ground. So that’s what we’re
going to work to do. But if you were to build
an application like, say, a sun with planets
orbiting around it, this could be an
appropriate visualization. But for Mittens, it’s
really just all wrong. So we’ll go back to
our slides, and we’ll see that it is
really, really wrong. I really I hadn’t looked at that
image in a while, so I like it. All right, so what
can we do to fix this? Well, there are a couple
of things that we can try. Before we go down this
road, though, it’s worth mentioning again that
perfect augmented reality is largely an unsolved problem. So we’re going to
talk about some things to make the cat look better, but
it’s not going to be perfect. I’ve learned to
manage expectations, so you’ll see me doing
this throughout the talk. But we’re going to
get better, and we’ll talk about the limitations
in detail at the end. OK. So with that public service
announcement out of the way, how far down the road
can we go with Tango? Step one is that we can tie the
cat into the environment using Tango’s depth-sensing APIs. So I mentioned before that
the device doesn’t just understand its
position in space, it also understands the
geometry of the environment. So maybe we can use
the Tango device to actually decide
where to place the cat, to recognize surfaces
in the environments, and then we can tie it to
the floor at the correct size and scale. And Tango provides
exactly these capabilities through our support
libraries to developers. You can essentially ask for
a given pixel in an image, give me the point and normal in
the world frame of that pixel. And then we can
place the cat there. And to show this,
you can see that this is the code needed to compute
the points and normal in space. So the first thing
that we do, is we need to get the relative pose
of the depth to the RGB image. And this is a little bit subtle,
so as you’re walking around, we’re taking RGB
images all the time, but we’re also firing
the depth camera. But they’re not actually
taken at the exact same time. There’s a little bit
of an offset there. So if I naively
took the point cloud at the timestamp
of the RGB camera and just selected a point,
it would be off a little bit. It wouldn’t look correct. And what I need to do
is actually transform the depth information into
the frame of the RGB camera. So here what you can see
is we’re calling a support function that says
calculate the relative pose between the color camera
and the depth camera at the last time I had a
color image and the last time I had a point cloud. And then from there, we
call fitPlaneModelNearClick, and we pass in the point cloud. We pass in the
color image, and we pass in the relative
transform between the two, which helps us do
that alignment. And then we get back a
point and a plane model for that pixel in the image. And after this call, we’ve
got a point and a normal that we can use to place the
cat on the correct surface in the environment. And we can put it
in the real world. So I’m going to show
a demo of that now. So here you’ll see that
Mittens is actually looking pretty good. He can move around the world. And every time I tap, I’m
getting that detection. And you can see that I can
make Mittens sort of interact with the surfaces and the
geometry of the environment. [APPLAUSE] OK, we’ll still get cooler. I like the enthusiasm, though. It’s sort of building over time. So I can have Mittens
jump on this, as well. And he can jump down. And if we like, we can also
change Mittens to Rufus. All right, so let’s
go back to the slides. OK, so we’re getting better. But you’ll notice
that I didn’t have the cat go behind anything. And the reason I
didn’t do that is because we don’t have support
in that demo for occlusion. What that means is if
I had placed Mittens behind the podium and
looked at it from kind of the incorrect angle, I would
have seen through the podium, and Mittens wouldn’t have
rendered realistically. And so, what can
we do along the way to maybe do more about
understanding our scene and our geometry and
our environments, to make Mittens look even
better and even more realistic? Well, Tango can create
rough 3D reconstructions of the world environment. And it provides developers
with these meshes in real time, which is
really– it’s kind of crazy when you think about it. And perhaps we can use these
meshes to help a little bit with our occlusion problem. So now, instead of
using a single point to determine how
to render the cat, and to have Mittens jump
from the table to the chair to the floor, we can use
the full 3D structure of the environment and
aggregate many points over time. So under the hood,
when Tango is meshing, it’s actually taking many,
many different viewpoints, doing sort of that
same transform I talked about to get the point
cloud into the world frame, and binning them over time,
and creating a representation of this surface that gets
better, actually, the more that you look at a
surface because you get more points of data and a
better accurate of the surface. And this process
is pretty complex. But it is, again, abstracted
for us by the SDK. And we’ve got a meshing library
that you can use in C++, or we’ve hooked it up directly
to Unity, for Unity developers, that gives you the mesh inside
the game engine in real time. And that makes it really easy
to use it in our depth buffer as we render. So to see this, we’re going
to check out another demo. All right, so we’ll put
Mittens over here for now. So you can see that as I
move around the podium, I’m starting to build
up a rough mesh of it. That’s probably
good enough for now. And we’ll put Mittens
in the background. And as I move, you can
see he becomes occluded. And we think that this is
a really powerful thing for building AR applications. We can sort of bring him out. [APPLAUSE] So one thing that you’ll notice
is that the mesh isn’t perfect. And so we’ll play some tricks. Like, we actually alpha
blend on the edge of Mittens as he goes behind. And here, we’ve even shown
you his silhouette when he’s behind the
podium, to give you a sense that he’s still there. I’m going to show another
version right now that does a similar
thing, but it’s not going to show the silhouettes. So Mittens will
be fully occluded, and you’ll get to see
how that looks as well. Should’ve saved my mesh. So probably good enough. All right, so now we’ll
put Mittens behind again, and you can see as I go, he
kind of pops out over time. And there’s alpha
blending on the edges that we’re using to sort of
make him fade in and out. So it’s not perfect, but
it does give the illusion of a more real experience. All right, let’s go
back to the slides. So at this point,
we’ve actually created a pretty compelling augmented
reality application, I would say. We’ve taken Mittens. We’ve placed him in the world. He can walk on surfaces. He can jump on chairs. And now he can
actually be occluded by objects in the scene. So you’ve got a lot
of the components of making a compelling augmented
reality experience right there in front of you. But it can be a lot better. And one thing I’ll
say– have you guys noticed anything weird
about this picture up here? Does something look
fake to you, maybe– yeah, people are nodding,
audience participation. The Lamborghini’s fake. It’s fake. So the front one is
actually not a real car. And the reason that
this looks so good is that we’re doing
a lot in terms of lighting and reflection. We’ve placed it
exactly where we want. We’ve done very detailed
compositing of it. And this is kind of like
the holy grail, right? This is the holy grail of AR. And this slide is really just
to say that this is a journey. And we’re moving along this
path, moving towards things that we hope someday
can look like this. But this is not, I
guess, the expectation to have today for where we are. And as an application
developer, it’s important to remember that. So if you’re building something,
you want to ask yourself, is it appropriate for the
technology that I have at hand? And everything
from the art assets that you use to the
gameplay that you design can help to set
expectations for your users. And you can still make
really compelling experiences that use maybe a subset
of the technology as well. So you could put a
solar system in space. You could have your cat walk
on planes but not be occluded. Or you could have objects
in games that take advantage of the scene geometry. But all of them are compelling
in their own rights. And with a little
bit of creativity, you can actually
make experiences that feel really good to people,
even if you’re not leveraging the full capabilities
of AR, and even if you don’t get to the holy
grail of Lamborghini rendering. And to illustrate
this I guess I’ll provide two examples of
augmented reality applications that I really like that
don’t even use depth. So the first one I’ve been
alluding to for a while. This was done by
some of our friends at San Francisco
State University, and it allows you to explore the
solar system by placing planets in a line in your room. And I’ll show a
demo of that now. All right, so here I’m going
to tap to place the sun. And now I’m going to walk. And I guess it recommends
I go further than this, but this is about
as far as I get. And I’ll tap to place Neptune. So now as I walk
back towards the sun, I can see the scale of the
planets relative to each other. And it’s actually
correct, and I can zoom into each one of the
planets, and if I want, I can start to see
how orbits work. And as an educational
tool, this is spectacular. And from an augmented
reality perspective, it’s actually remarkably simple. I mean, this is the floating cat
that we all laughed at before. But for the appropriate
application, it’s actually quite compelling. We’ll go back to the
slides for a second. The second application
I want to talk about is a game from our
friends at Trixi Studios, and it’s called “Phantogeist.” And how many of you actually
went to our after hours event and got to play “Phantogeist?” Oh, that is way too few of you. All right, so
“Phantogeist” is a game, and the premise is that aliens
have invaded your world, and you need to zap them before
they take over the world. And it’s not good
for us at the end. But anyway, there are a bunch
of design considerations that went into this
game that I really like. So the characters are
actually semi transparent. They feel like they’re
coming out of the wall. And they’re very
clever about using the position of the device
and making assumptions about free space to
interact with the world, even though they don’t
know where surfaces are. And I’ll show a brief
demo of this, actually. So I’m going to go
to bonus content. And we’ll do a giant worm thing. All right, so you
can tell they’re already setting the
mood with the music that they have for this. And what happens is,
as I start, the device records my position in space. Can we turn down the lights? Awesome. So we’ll get a
brief text message that says something
like the aliens will get you, or clear the area. And so as I move, you’ll see
something strange happens. [ALARM] So here we’ve ripped a
hole in the actual floor. And we can see into– whoa. All right, we’re safe. So if we switch back to
the slides, real quick. I think it’s pretty cool. All right, so everything that
you saw there used only motion tracking. There was no depth. There was no occlusion. And it’s kind of shocking. I’m actually impressed with
how well they pulled it off, but they’re playing tricks. They’re making some assumptions
and building gameplay that just works for the environment. And so really this
is to say that there are ways in gameplay
to make things feel real and immersive, even
with our most basic APIs. And as you start to move down
the road, it gets better. And with all that
said, I’ve said a lot of managing
expectations type stuff, but we’re continuing to improve. We’re always working
on new things. And in fact, we’re rolling
out some drift correction code into our SDKs over the
next couple months. So Wim talked about this
before, but another problem that you have with
augmented reality is that when I place the cat
in the world– actually, let’s switch back to the
demo, just real quick. I think it’s easy to show. So you can see that the
cat is on a surface. It’s relatively robust when
I move the device around because our tracking
is relatively good. But if I’m really mean,
the cat is gone, right? It drifted off into space. So we can go back to the slides. So we’ve actually
built software that allows us to correct for that. And so when the cat goes
drifting off into space, we recover very, very quickly. It takes about a second, but
you stop, look at the world, and the cat shows up back
in the correct place. And this is really just
one of the improvements that we’re working to
roll out and that we’re working to bring to you,
our developer community. We’re working on
things like modeling the lighting in the room to give
you realistic shadows, things like improving our meshing
and doing texturing as well, to give realistic textured
meshes inside of Unity that maybe you could
use in other games and even import them into
different game worlds. We’re just kind of taking steps
along this journey to getting to that Lamborghini,
to getting to really, really good and immersive
and solid augmented reality. But we don’t have to wait. We can already do
a lot right now and build compelling
experiences today. I also want to talk a little
bit about the tradeoffs that you make as you
start to use more and more of this technology. So heat is your enemy
on a mobile device. If any of you have used
a mobile phone before, you know it gets hot. It’s hot in your pocket. You’re using Google Maps,
and it’s just burning up. And it is a very real tradeoff. So we’ve worked hard to
optimize our algorithms, to optimize our computer
vision software, and to make it run as quickly
and efficiently as possible. But the more
capabilities you use, the more heat and CPU and
compute you use, as well. So if you can make an experience
work with motion tracking, then you have a lot more that
you can throw at your game. But if you go all the
way to full meshing, maybe you should be using
some low poly models. And it’s just important to
be aware of the tradeoffs that you make when you
think about the applications that you could build. If I could sum up my talk
with one cheesy statement, I would do it with
“creativity is king.” When you’re developing
for a new platform, there are limitations. And we’re super fortunate to
have partnered with people who can be creative,
and who have worked around the
different capabilities and technologies of the device,
as well as their limitations. And we’re really looking
forward to seeing what kind of
creative applications can be enabled by what is
super powerful technology over the coming months. There’s a bunch that
you should check out at our sandbox to get a picture
of what developers are up to. If you haven’t done that,
I really encourage it. And there are a couple
of things that we’ve done that are really exciting
at the platform level, too, looking forward. So first, we believe that
Project Tango technology should become ubiquitous. All devices should
have the ability to understand where
they are in space and to understand the
geometry of their environment. And one thing that we’ve
done is we’ve worked closely with our friends
at the Android team to start moving some of
Tango’s APIs into core Android. And so in Android
N, you can actually ask for the six
degree-of-freedom pose of the device
through Android APIs. And Android N also
supports depth sensors. And over time we hope to expose
more of Tango’s capabilities in this way. And we’re at the beginning of
a very, very exciting journey on the consumer front. So we have announced
a partnership with Lenovo, where we will
be shipping Project Tango smartphones later this summer. And this is huge. This is a big deal. This has been a three
year journey for us, to take this technology
from the prototype stage all the way to something
that you can actually hold in your hands. And a lot of the applications
we’re showing here start to show the power of
these devices for consumers. So now is really the
time to get on board. The train is sort of leaving
the proverbial station. And I’m really excited to see
what we can build together. It’s a great time to get
your foot in the door. If you haven’t been thinking
about augmented reality applications, I
encourage you to do so. And it’s also a
unique opportunity to partner with Google. So we’re always looking
for interesting ideas. We like to feature our
partners at things like talks. And on occasion we
even put out RFPs for unique augmented
reality content. And this all
happens because it’s a new and budding ecosystem. It’s just an exciting time
to take the first step in this journey together. And with that, I’m
ending my talk. Hopefully this gave
you an idea of the kind of applications that are
possible with Project Tango, as well as some of
the considerations that you need to be aware
of when you’re building augmented reality content. I think it’s a
really exciting time. I’m really happy
to be here, feel fortunate to be on the team. Thank you guys so much for
your time this morning, and I hope you enjoy
the rest of I/O. [MUSIC PLAYING]

6 thoughts on “6 Degrees of Freedom Gaming in Android with Project Tango – Google I/O 2016

Leave a Reply

Your email address will not be published. Required fields are marked *