A few years ago, I was asked to be on the now-popular Animal Planet show Finding Bigfoot. I said "No, thanks. I don't do that sort of thing anymore." and that was that. The show is now in its fourth season, and I have to admit a mild cult fascination with it.
The show follows Matt Moneymaker who runs The Bigfoot Field Researchers Organization (BFRO), "The only scientific research organization exploring the bigfoot/sasquatch mystery." Right. He and three other "researchers" investigate alleged sightings around (mostly) North America. They generally follow the same basic formula:
Side note: I like the attempt to include a token "skeptic" on the show in the form of Ranae Holland. On the show, she is credited as a "field biologist" although, as far as I can tell, this is because she has a BS in Biology and works "in the field". Her previous work experience is mostly with fish (i.e. not Bigfoot or other large mammals). And she does not conduct rigorous science, a process that takes years to learn to do properly under the tutelage of a seasoned scientist. In every episode she is initially the voice of skepticism but always comes around to be "convinced by the evidence". Usually it's something like,
Holland: "Well, couldn't this be another animal?"
Holland: "Oh, well, I guess you're right then. Bigfoot is really the only logical explanation."
So in the course of my fascination, I came across a lovely infographic made by Joshua Stevens: ‘Squatch Watch: 92 Years of Bigfoot Sightings in the US and Canada. It's a beautiful piece of data-mining in which Stevens mined the BFRO's sighting database and plotted every reported sighting over the last 92 years onto a single map of North America. Besides being a gorgeous representation, Stevens hoped to learn something new. For example, he hypothesizes the map of sightings closely resembles a population distribution map of the United States, though he then decides that isn't the case. He also plots the number of sightings per year, and finds sightings have increased at roughly at the same rate as population. Interesting.
But what about sightings plotted over time? What would we see then? If Bigfoot is a real animal (it isn't), then perhaps we would expect to see migration patterns. If it is mass delusion or hoax, then we might expect a pattern that sweeps across the US, e.g. beginning in the Pacific Northwest and sweeping eastwards. Since the Animal Planet show is closely connected to the BFRO who maintains the sightings database in question, I predict we will see an explosion of sightings after the show becomes popular.
I decided to find out.
Click here for the result. It works best in Google Chrome, and has issues in Firefox, and probably doesn't work at all in Internet Explorer. Otherwise, I would embed it into this page.
The page uses an HTML5 canvas element in the background, overlaid with a semi-transparent Google Map. There is an animated timeline at the bottom showing a bell curve spanning a few years. For each data point under the curve, its position is plotted on the map with intensity proportional to the value of the curve at that point. So, for example, if there was a sighthing in San Francisco, CA in 1995, then a point is drawn faintly over San Francisco starting around 1992, becoming brighter as we approach 1995, then fading back out again until about 1998. The result is a twinkling light show over the US.
I also draw two histograms, one represented in the upper left and the other on the right, to show sightings by month for the year range being displayed.
The results are really interesting and don't show what I would expect. I make the following observations:
This project presented a number of interesting problems to be solved. The following sections discuss each one.
The first step is to obtain a useable database. The BFRO's database is not directly accessible; there is only a web interface (though it is at least very consistent). So I need a script that will crawl through each page (arranged by state) and pull out necessary details about each sighting. That isn't too hard in itself. However, in order to plot each point on a map, we need coordinates. Most BFRO data points are just town names.
Well, I'm already using the Google Maps API. But there are thousands of data points; I don't want to geocode each point as I draw it. It needs to be pregenerated and cached in my local database. Google Maps has a geocode service that can also be called from, say, a PHP script. However, Google caps this at something like 2,000 requests per day, and if you repeatedly go over this limit they will just ban your IP address from ever using that service again. OK, so that's probably not going to work.
Once I have the data, I need to plot it onto the map. This is slightly more complicated than it sounds, since converting a lat/lng coordinate to a (x, y) pixel position requires knowledge of the projection used by the map. Google Maps uses the Mercator projection (this is not quite true, but it is close enough) so it is straightforward to convert a lat/lng coordinate to an (x, y) screen coordinate.
Side note: I'm mildly surprised the Google Maps API does not supply such a function as part of the API. But whatever.
I also need the offset that represents a corner of the screen, so I can line up my coordinates with Google Maps's. In other words, I need to know where the Google Map starts on the screen so my data points will appear in the correct place. Luckily, Google Maps has a bounds_changed event I can listen for.
So now we have the year and location of each data point, and a way to convert them to screen positions. But the HTML5 canvas element does not excel at drawing 100s of points at once. And I don't just draw points; it's actually a circle with a gradient (so if several points are near each other or overlap slightly, that area becomes brighter). I also draw a larger circle as the map zooms in.
Chrome is by far the fastest browser when it comes to rendering on the 2D canvas context. However, each data point essentially becomes a separate draw call, which becomes very expensive when there is lots of data. Chrome seems to be able to batch draw calls for like colors. So, for example, code like this:
ctx.fillStyle = 'rgb(' + r + ',' + g + ',' + b + ')'; ... ctx.arc( 0, 0, radius, 0, Math.TWO_PI );is many times faster if r, g, b do not change between calls.
Side note: the use of a string for fill and strokeStyles is odd to me. My application needs to generate a string hundreds or thousands of times each frame, which means I see the garbage collector showing up prominently in performance profiles. There's no other way to set this value. Also, arguments for the rgb string are in the range [0..255], but if you use rgba (which I do to control transparency), the alpha parameter is in the range [0..1] even though it gets converted to an integer in the range [0..255] before rendering. That means an alpha of 0 and the value 0.001 << 1/255 are actually interpreted the same. Why r,g, & b are [0..255] while alpha is [0..1] * 255 internally is a confusing inconsistency to me.
So back to rendering. In my application, all the points use a similar gradient, modulated only by an alpha channel. Gradients are created using code like this
var gradient = context.createRadialGradient(0, 0, 0, 0, 0, radius); gradient.addColorStop(0.00, rgba(255,255,255,alpha)); gradient.addColorStop(0.10, rgba(this.color, alpha)); gradient.addColorStop(0.50, rgba(this.color, alpha * .1)); gradient.addColorStop(1.00, rgba(this.color, 0));(where rgba is a function that generates the rgba string)
That means that for each point, I need to generate 4 temporary strings and a new gradient object. That's a lot of work for the garbage collector.
We can improve this by realizing the only thing changing here is the alpha channel, and that although it can be in the range [0..1], internally it can only really have 255 distinct values. This means we can pregenerate all 255 possible gradients and reuse them.