August 11, 2015

The science of ... crowd size estimation

Estimating crowd size is complicated science. Will we really know how many prayed with the pope?

By Meeri Kim
PhillyVoice Contributor

Popedelphia. Popenado. Popecalypse.

Whatever you want to call it, Pope Francis's upcoming three-day visit to our beloved corner of the world has spurred a frenzy of preparations by city officials expecting an extra 1.5 million bodies in Philadelphia over the Sept. 26-27 weekend. Numbers like that would temporarily double the city's population.

At a press conference in June, Mayor Nutter even predicted that the papal visit “will be the largest event in the city of Philadelphia in modern history.” The pope's itinerary will include Sunday Mass held on the Ben Franklin Parkway, which is open to the public and will undoubtedly draw the biggest crowd.

MORE PAPAL VISIT COVERAGE
Coalition creates map to help cyclists navigate papal lockdown
Hospitals plan upstaffing, sleep-ins for papal visit
Post-pope Monday commute a mystery

But after the chaos comes and goes, how will we know the number of people who truly showed up in the end? Is there a way to count that many heads in a non-ticketed event? And can 1.5 million visitors, along with the Philly natives who decide to brave the flood of tourists, even fit on the Parkway? (That's one estimate of the size of the crowd that filled the Parkway to watch the Live 8 concert in Philadelphia on July 2, 2005.)

Crowd size estimation isn't easy, and there's often a hidden agenda behind the number. Event organizers, of course, want to inflate the number of people present, but without the proper equipment and estimation methods, even unbiased parties can be inaccurate.

“For a single event, you'll have crowd size estimates that range from a couple thousand people to 500,000 or even a million,” said project manager Ryan Shuler, who oversees crowd counting and analysis services at Digital Design & Imaging Service Inc. (DDIS). “It's unfortunate that there's such a broad publication of the numbers, regardless of truth or fiction.”

With the Washington Monument in the background, participants in the Million Man March gather on the National Mall in Washington on Oct. 16, 1995. Estimates of the crowd size at the rally varied widely and caused controversy. In the aftermath, Congress later ordered the U.S. Park Police to stop releasing official crowd size estimates. (Doug Mills, File / AP)

An infamous example of crowd counting gone awry is 1995's Million Man March on the National Mall in Washington, D.C. -- a gathering of African-American men that stretched from the U.S. Capitol to the Washington Monument. It was organized by Nation of Islam leader minister Louis Farrakhan. Organizers estimated the size of the crowd at between 1.5 and 2 million people. The United States Park Police, however, put the number at 400,000. Farrakhan subsequently threatened to sue the National Park Service for its supposed lowballing.

Then in 1997, the Center for Remote Sensing at Boston University carefully analyzed original negatives from the National Park Service taken during the march. Its final number was 837,000 people with a 20 percent margin of error, or a possible range of 669,600 to 1,004,400 people.

Congress later ordered the Park Police to stop releasing official crowd size estimates altogether in an attempt to avoid future controversy. That's where Falls Church, Va.-based DDIS steps in to provide unbiased, apolitical crowd counts for big events in the D.C. area. As part of a contract with CBS News, the company has analyzed events such as Glenn Beck's “Restoring Honor Rally” at the Lincoln Memorial in 2010 and the Stephen Colbert/Jon Stewart “Rally to Restore Sanity and/or Fear” near the Capitol later that same year.

“We human beings are extremely good at this because we have a big prior set of knowledge about what people look like in any context, but that's not something that computers actually have. They are just looking at pixels.” – Ko Nishino, Drexel computer scientist

The principle behind the calculations by both the Boston University group and DDIS are the same: an estimated density of people multiplied by the area they are standing in will give a rough head count. A loose crowd has about one body per 10 square feet, whereas in a dense crowd, each person has less than half that amount of space. Mosh-pit density is one person per 2.5 square feet.

Traditionally, says Cara Schneider of Visit Philadelphia, there's a rough formula that 100,000 people can fit in each of the 10 blocks that make up the Parkway. So if pilgrims fill up all 10 blocks, one could possibly estimate a million people total — however, she notes that doesn't account for variables like event footprint, security build-out, or use of side streets.

So perhaps even 1 million would be a too-high estimate of the crowd that could fit on the Parkway, unless it sprawled out to neighboring streets. Billy Penn calculated that 1,894,860 people could fit on the Parkway and its adjoining spaces by using a density value of 2 square feet per person, even higher than mosh-pit density. That makes it a likely overestimate.

The density-times-area method of crowd counting, pioneered by journalism professor Herbert Jacobs in 1967, has been updated with technology like high-resolution aerial photography and computer simulation. DDIS uses an aerostat balloon equipped with remote-controlled DSLR cameras to capture images of the crowd at different angles, altitudes, and times during the event. Shuler and his colleagues try to identify a peak moment, where the most people would be present, and overlay a 3D grid with squares of a known area. Ideally, the shot would include the whole crowd and be as orthographic — looking straight down on people's heads as opposed to from an angle — as possible. (DDIS has not yet been hired to do crowd counts during Pope Francis' visit to the East Coast.)

“When you're looking at a photograph from an orthographic perspective, the crowd is a lot easier to count because of the negative space between the people, as opposed to an oblique angle or a low altitude,” said Shuler, who is also lead photographer and image analyst at DDIS. “Because many people view it from a low, oblique angle, the crowd often appears much bigger than it actually is.”

Farouk El-Baz, director of Boston University's Center for Remote Sensing, points to a chart showing attendance density at the 1995 Million Man March as determined by his department during a new conference in Boston on Oct. 27, 1995. He estimated an attendance of 837,214 with a 20 percent margin of error. (Stephan Savoia, File / AP)

With a typical event, some areas will be packed at mosh-pit density while others remain sparse. So using the overlaid grid, they will first manually count the number of heads within certain squares to double-check their density estimates before applying those calculations to the rest of the crowd. Sometimes they also generate crowds at different densities using modeling software to get a better understanding of what a heavy versus light crowd looks like in a particular space.

Schuler says that DDIS's crowd counts have about a 10 percent margin of error. The Boston University group used a similar tactic for its Million Man March estimates, originally developed to count the number of dunes in the deserts of Egypt and Kuwait as well as the number of trees in the forests of California. But instead of using a fine square grid, they traced large areas of the photo with roughly similar densities of people.

The next technological step for crowd size estimation will likely be computer vision — as in, you input a photo of a crowd and a computer algorithm spits out a number. But computers still aren't as good as humans at distinguishing a person's head from, say, a large rock.

“We human beings are extremely good at this because we have a big prior set of knowledge about what people look like in any context, but that's not something that computers actually have,” said Ko Nishino, a computer scientist at Drexel University. “They are just looking at pixels.”

Nishino mentions a challenge in computer vision called the occlusion problem, which is the ability to detect a person in a photo even if only part of his head is showing or just his legs. Or think about the difficulties of recognizing a person at low resolution — for instance, someone in the crowd positioned very far away from the camera.

But computer vision is getting better. And even if an automated crowd-counting algorithm isn't as accurate as a team of humans, it can come up with a rough calculation almost instantly — meaning it can also update numbers in more-or-less real-time. In terms of event safety and surveillance, such algorithms would be especially valuable.

Ko's research isn't on crowd counting, but he works on video analysis of the dynamics of crowded scenes to pick out unusual events. His algorithms can keep an eye on a train station, for example, and quickly sound an alarm for security if there is a disturbance in the typical flow of a crowd, like an accident or medical emergency.

“If you only see one sardine swimming in the ocean water, you can model how that fish looks like and moves,” he said. “But if you have a really dense school of sardines, you can't really track each individual fish — but you can model how the school itself moves by observing the global motion to predict how each of these sardines move.”

Meeri Kim
PhillyVoice Contributor