July 13, 2016

Infrequently Asked Questions: How does augmented reality work?

By Brandon Baker
PhillyVoice Contributor

Brandon Baker/PhillyVoice

A digital Ratata takes up residence in the real world in 'Pokemon Go.'

Pokemon Go has, indisputably, won over smartphone users everywhere in the past few days. But let's be honest: We'd be none too impressed if not for the game's use of augmented reality technology, which places Pokemon in real-world settings for our Pokeball-throwing pleasure.

How, though, does that technology work?

Curious, we reached out to Philly gaming guru Frank Lee, co-founder of Drexel's Game Design Program and founding director of the Entrepreneurial Game Studio at Drexel's ExCITe Center, for an explanation.

What is augmented reality, from a technology standpoint?

The world is full of questions we all want answers to but are either too embarrassed, time-crunched or intimidated to actually ask. With Infrequently Asked Questions, we set out to answer those shared curiosities.

It’s the notion of combining digital with the real world. Typically, the way it’s been done is through cell phones, the camera that’s available with the cell phones, as you’re pointing to a scene in the real world and taking a picture or a video. What the phone will do – it’s basically a computer — is overlay digital information. One example might be, if you’ve seen the original 'Terminator,' the Terminator's eye, as he’s looking in the real world it’s giving digital information overlay that’s appropriate for that location and scene. When I think of examples of augmented reality, I don’t know if you remember this application, but there's one where if you point to a sign with a foreign language, Spanish or Italian or French, it would automatically translate it in the scene – on that sign, through the camera, into English. So essentially, if I point my cell phone camera to a 'Stop' sign in France, the French 'stop' will be converted to the English 'stop'. It’s a real-time translation of information from the video scene into some digital artifact.

So when you’re turning on the augmented reality, is it only then conjuring up the digital image, or is it kind of that the image is already there and you're putting a highlighter over it?

I think it’s better to think of it as the digital image being added to the real world, with the idea that, from your perspective, it’s as if it’s the same thing. You know, you’re trying to blur the boundary between the real and digital world. Another example might be in 'The Matrix,' which I always point to ... If you understand 'The Matrix,' you see it as streaming data. It’s the notion of – when you take reality, you augment it is what it comes down to. With digital information, digital overlay. What’s critical to that, especially in the real world, is location tracking. So if I’m in front of the Eiffel Tower, the digital overlay information you get, you’ll want it to know it’s the Eiffel Tower. So if I’m pointing at it, it’s giving me information about how tall it is and all that on the screen. In order for that to happen, it's one of two things: One, it may use GPS information, so the GPS will tell the phone where I am and what’s around me; and two, the directional info of north, south, east, west. Most phones have a compass built into it, to tell which direction I’m looking in. It should tell me what’s in the world. And three, it’s using the real-time video feed, basically analyzing the feed that’s coming in, in real time. It’s doing essentially a computer vision type of analysis.

The GPS part of augmented reality, is that a new addition to the technology? The first time I was introduced to it was through Nintendo's 3DS, and with that, it didn’t use GPS. It used a card you laid on the table.

So that's reflecting earlier technology. It’s only using video feed. It’s taking 30 frames per second, pictures per second, and there’s a computer – whether a 3DS or a phone – that is processing the image in real time to understand what’s there. It’s computer vision. So what those little markers on the table tell them, is it gives them orientation. It knows to look for those markers, it knows what’s there, and helps it orient itself. Right? It’s helping it to process visual information. Without those markers, it has to basically try to understand that scene purely from software. Computer vision algorithms are decent, but they’re not that great. Those markers give information. And GPS, it’s providing information for computer vision software.

Ultimately it comes down to taking video feed in real time, understanding what’s there through software and adding info. By knowing what location you’re in, the software is able to restrict what's likely there. By what direction you’re looking in, the compass direction, it knows not to worry about the other three directions.

With Pokemon Go taking off, do you see this technology evolving from here, and how it’s applied?

Yeah, the way I see it is that – it's something I’ve been interested in since way back in 2009 or 2010. I think the convergence of the technology is ripe for this type of gameplay. Meaning that you have the mass adaptation of really powerful computers, smartphones –they’re basically high-powered computers able to do the type of software and image analysis that you need for something like this. Too, you have this mass availability of data, especially geographic data through Google Maps and through Apple Maps and other companies, providing you location information. What’s there and so on? And you know, certainly, also, the wide availability of high-speed data network to the cell phone. We’re not talking about 2G anymore; we have LTE, 4G and upcoming 5G. So, we have fast data on-location throughout the world, through a fast computer that's able to process information. That type of gameplay is, I think, a new type of game coming in.

But what I want to point out is it’s new and old at the same time. Ultimately, what we’re talking about are games we play in the real world. Hide and seek, tag. That type of game that is almost part of our DNA, growing up all throughout human culture. That type of game where we move around in our cities and streets, enhanced by technology. That’s why it makes me feel comfortable thinking that we’re coming up on this type of gameplay using augmented reality … that relies on the kind of very core, simple games we used to play.

Your upcoming project, 'War of the Worlds,' uses augmented reality, right?

Yeah, so 'War of the Worlds,' it has aspects of that, but it’s more site-specific theater. We’ll use the software in the cell phone more as coordination. It’s more of like a hide-and-seek and tag-type of game, greatly expanded through technology. So, rather than 15 kids playing hide and seek, imagine 1,000 people playing hide and seek throughout the city. It’s not necessarily adding digital info over into the real world, but that type of game where you’re essentially using mixed reality, using digital relays onto the real world. There’s a game I experimented with back in 2010 or so and it also is part of a proposal we just submitted to [the National Science Foundation] for approval, but it really is a similar idea. We have the technology now to relay, to take the game out into the real world in such that we take the standard physical games we used to play as kids, capture the flag and other stuff, and potentially scale that as a worldwide game. That’s what it is.

Anything to add?

I’m really excited because, for one, ['Pokemon Go' developer] Niantic was testing the tech side of it with [their first game 'Ingress'], almost like a tech demo, but what 'Pokemon Go' has shown is that I think consumers are ready for these types of games. We’ll see many, many more of these games, and hopefully, we won’t get bombarded by crap games. But I think that type of gameplay that mixes real world and the digital world is here to stay.

Have a question you're dying to have answered? Send an email to entertainment@phillyvoice.com, and we'll find an expert who can give you the answer you're craving.

Brandon Baker
PhillyVoice Contributor