Bigfoot Trends

A few years ago, I was asked to be on the now-popular Animal Planet show Finding Bigfoot. I said "No, thanks. I don't do that sort of thing anymore." and that was that. The show is now in its fourth season, and I have to admit a mild cult fascination with it.

The BFRO

The show follows Matt Moneymaker who runs The Bigfoot Field Researchers Organization (BFRO), "The only scientific research organization exploring the bigfoot/sasquatch mystery." Right. He and three other "researchers" investigate alleged sightings around (mostly) North America. They generally follow the same basic formula:

  1. Interview eyewitness(es)
  2. Recreate the scene to determine plausibility (usually with Bobo, the largest and most Bigfoot-like team member)
  3. Perform a night investigation with custom-built backpacks mounted with night vision cameras. Why they only ever search for Bigfoot in the middle of the night is a mystery to me, especially since most of the sightings are reported during daylight hours. I think it is (1) to mimic the investigation model established by myriad ghost hunting shows and (2) because any old blob could be a Bigfoot. If it were daylight, then we would know it was actually a deer or whatever. It keeps the hope alive.

Side note: I like the attempt to include a token "skeptic" on the show in the form of Ranae Holland. On the show, she is credited as a "field biologist" although, as far as I can tell, this is because she has a BS in Biology and works "in the field". Her previous work experience is mostly with fish (i.e. not Bigfoot or other large mammals). And she does not conduct rigorous science, a process that takes years to learn to do properly under the tutelage of a seasoned scientist. In every episode she is initially the voice of skepticism but always comes around to be "convinced by the evidence". Usually it's something like,

Holland: "Well, couldn't this be another animal?"
Moneymaker: "No."
Holland: "Oh, well, I guess you're right then. Bigfoot is really the only logical explanation."

So in the course of my fascination, I came across a lovely infographic made by Joshua Stevens: ‘Squatch Watch: 92 Years of Bigfoot Sightings in the US and Canada. It's a beautiful piece of data-mining in which Stevens mined the BFRO's sighting database and plotted every reported sighting over the last 92 years onto a single map of North America. Besides being a gorgeous representation, Stevens hoped to learn something new. For example, he hypothesizes the map of sightings closely resembles a population distribution map of the United States, though he then decides that isn't the case. He also plots the number of sightings per year, and finds sightings have increased at roughly at the same rate as population. Interesting.

But what about sightings plotted over time? What would we see then? If Bigfoot is a real animal (it isn't), then perhaps we would expect to see migration patterns. If it is mass delusion or hoax, then we might expect a pattern that sweeps across the US, e.g. beginning in the Pacific Northwest and sweeping eastwards. Since the Animal Planet show is closely connected to the BFRO who maintains the sightings database in question, I predict we will see an explosion of sightings after the show becomes popular.

I decided to find out.

Click here for the result. It works best in Google Chrome, and has issues in Firefox, and probably doesn't work at all in Internet Explorer. Otherwise, I would embed it into this page.

How it's done

The page uses an HTML5 canvas element in the background, overlaid with a semi-transparent Google Map. There is an animated timeline at the bottom showing a bell curve spanning a few years. For each data point under the curve, its position is plotted on the map with intensity proportional to the value of the curve at that point. So, for example, if there was a sighthing in San Francisco, CA in 1995, then a point is drawn faintly over San Francisco starting around 1992, becoming brighter as we approach 1995, then fading back out again until about 1998. The result is a twinkling light show over the US.

I also draw two histograms, one represented in the upper left and the other on the right, to show sightings by month for the year range being displayed.

Results

The results are really interesting and don't show what I would expect. I make the following observations:

  1. There is almost no geographic pattern. Sightings begin in the late 1950's, but are not isolated to the Pacific Northwest. They occur simultaneously all over the US.
  2. The histogram at the bottom represents the number of sightings by year. The sightings increase steadily from the 1950s through the 1970s, peaking around 1980. Then there is a mild drop off and steady state for a while, then sightings really begin to pick up in 1994, increasing rapidly until 2004. 1994 is when the Internet suddenly became widely available to the mainstream public, so perhaps this is no surprise. However, what is a surprise is the steady drop off in sightings ever since 2004. I predicted Finding Bigfoot would bring the BFRO into such public light that everyone who thought they'd seen a Bigfoot would begin logging their sightings. That doesn't seem to have happened.
  3. The monthly histogram is heavily weighted towards the summer months, with few or no sightings during winter months. That makes sense: people see Bigfoot when they are enjoying the great outdoors, not sighting around their fireplace at home. Fun thing to try: zoom into Florida, and the monthly histogram becomes random again; Floridians don't really experience seasons, so this is also expected.
  4. Why so many Florida sightings? Bigfoot is associated with the Pacific Northwest, not Florida. Weird.

Details

This project presented a number of interesting problems to be solved. The following sections discuss each one.

Mining the Data

The first step is to obtain a useable database. The BFRO's database is not directly accessible; there is only a web interface (though it is at least very consistent). So I need a script that will crawl through each page (arranged by state) and pull out necessary details about each sighting. That isn't too hard in itself. However, in order to plot each point on a map, we need coordinates. Most BFRO data points are just town names.

Google to the Rescue

Well, I'm already using the Google Maps API. But there are thousands of data points; I don't want to geocode each point as I draw it. It needs to be pregenerated and cached in my local database. Google Maps has a geocode service that can also be called from, say, a PHP script. However, Google caps this at something like 2,000 requests per day, and if you repeatedly go over this limit they will just ban your IP address from ever using that service again. OK, so that's probably not going to work.

However, Google does not impose a limit on geocode requests coming from a web browser. The idea, I think, is client requests coming from your website using a Google Map are unlimited, but your web server isn't supposed to be caching this data on the backend. So it turns out that all I need to do to get around this 2,000/day limit is to make the request from a web page via Javascript, then send the result back to my localhost server. The data flow is:

A convenient consequence of this workflow is it becomes automagically parallelizable; just open the same page in multiple tabs.

Plotting the Points

Once I have the data, I need to plot it onto the map. This is slightly more complicated than it sounds, since converting a lat/lng coordinate to a (x, y) pixel position requires knowledge of the projection used by the map. Google Maps uses the Mercator projection (this is not quite true, but it is close enough) so it is straightforward to convert a lat/lng coordinate to an (x, y) screen coordinate.

Side note: I'm mildly surprised the Google Maps API does not supply such a function as part of the API. But whatever.

I also need the offset that represents a corner of the screen, so I can line up my coordinates with Google Maps's. In other words, I need to know where the Google Map starts on the screen so my data points will appear in the correct place. Luckily, Google Maps has a bounds_changed event I can listen for.

Drawing

So now we have the year and location of each data point, and a way to convert them to screen positions. But the HTML5 canvas element does not excel at drawing 100s of points at once. And I don't just draw points; it's actually a circle with a gradient (so if several points are near each other or overlap slightly, that area becomes brighter). I also draw a larger circle as the map zooms in.

Chrome is by far the fastest browser when it comes to rendering on the 2D canvas context. However, each data point essentially becomes a separate draw call, which becomes very expensive when there is lots of data. Chrome seems to be able to batch draw calls for like colors. So, for example, code like this:

    ctx.fillStyle = 'rgb(' + r + ',' + g + ',' + b + ')';
    ...
    ctx.arc( 0, 0, radius, 0, Math.TWO_PI );
		
is many times faster if r, g, b do not change between calls.

Side note: the use of a string for fill and strokeStyles is odd to me. My application needs to generate a string hundreds or thousands of times each frame, which means I see the garbage collector showing up prominently in performance profiles. There's no other way to set this value. Also, arguments for the rgb string are in the range [0..255], but if you use rgba (which I do to control transparency), the alpha parameter is in the range [0..1] even though it gets converted to an integer in the range [0..255] before rendering. That means an alpha of 0 and the value 0.001 << 1/255 are actually interpreted the same. Why r,g, & b are [0..255] while alpha is [0..1] * 255 internally is a confusing inconsistency to me.

So back to rendering. In my application, all the points use a similar gradient, modulated only by an alpha channel. Gradients are created using code like this

    var gradient = context.createRadialGradient(0, 0, 0, 0, 0, radius);
        gradient.addColorStop(0.00, rgba(255,255,255,alpha));
        gradient.addColorStop(0.10, rgba(this.color, alpha));
        gradient.addColorStop(0.50, rgba(this.color, alpha * .1));
        gradient.addColorStop(1.00, rgba(this.color, 0));
	
(where rgba is a function that generates the rgba string)

That means that for each point, I need to generate 4 temporary strings and a new gradient object. That's a lot of work for the garbage collector.

We can improve this by realizing the only thing changing here is the alpha channel, and that although it can be in the range [0..1], internally it can only really have 255 distinct values. This means we can pregenerate all 255 possible gradients and reuse them.