Category Archives: Survey Results

Question of the Month and survey results

Rating System Feedback

tl;dr People mostly like the star difficulty rating system, so we’re keeping it.

You might have noticed that we’ve been experimenting with adding difficulty rating to puzzles.  I brought this notion to Puzzled Pint HQ last year because some teams in Austin kept asking which puzzles were the “easy” ones.  Looking further into this, it became apparent that those teams were generally of mixed experience levels, and wanted to give the easier ones to the more novice solvers to attempt first.

Our First Two Trials

We tested this in Austin as an A/B test adding “Easy, Medium, or Hard” to the top of each puzzle in October.  I asked each team after the event if they liked having the rating system.  Obviously, the teams that normally asked about difficulty liked it, but nearly all teams gave really positive feedback among them.  In fact, we had only a single negative comment of the nature “I was proud of myself, until I saw it was marked easy“.

In November and December, we went broader adding the same system to every city’s copies.  This went over less well with several complains by GC of people hating the system, and in December of the rating being inaccurate.   The ratings were based strictly on the “difficulty” response provided by playtesters on their feedback forms, but there was some judgement in determining the cutoff values between easy and medium and hard.

This negative feedback from GC was concerning and confusing, since the Austin test had gone so well.  We didn’t know if only the players that hated it were complaining and GC wasn’t getting the positive feedback, or if the testing in Austin was an outlier and the hate was universal.

Another Test

Because of that negative feedback, we decided to scrap the ratings system for January 2017 and think about a resolution.   It seemed the “I felt bad because this was supposed to be easy” was a common complaint on the feedback thread, so we decided on a slightly different system of using a 5-star rating instead of the English words.  Hopefully, this would convey the information, but allow people to rate their abilities themselves instead of having the implicit judgement of not being able to get an ‘easy’ puzzle.

Thus, February’s puzzles had this new system, but, by golly, we were going to solicit player feedback this time to make sure.  If you love charts like I do, you’ll like this next bit…

We had 434 teams do February’s set (1440 people, not including Game Control members).  The puzzles’ difficulties ranged from 2 to 4 stars, as is our goal, not too easy and not too hard.  Of course, in the future, it’s possible for various reasons that a puzzle set might legitimately contain a 1 or a 5-star puzzle.

February’s set was the most playtested, perhaps ever, in Puzzled Pint history. This ensures that the feedback on the system wasn’t tainted by incorrect difficulty ratings.  Even so, I made the call to bump the Cupid puzzle to 4 stars because it had a large standard deviation instead of keeping it at the strict mean, which would have been 3 stars.  We probably will formalize that going forward at setting star difficulties at the first standard deviation to the right of the mean.

So, how were the responses?

Well, first off, we didn’t have the response rate I’d hoped.  Even if you don’t consider teams that didn’t finish the puzzle set (i.e. had fewer than 5 completed puzzles), here’s the response rate by city:

Still, a 59% response rate is enough to represent a good section of our players, and the results are likely skewed away from beginners anyhow because beginners are less likely to finish the set and thus have not given a response.   Recall, Puzzled Pint is very much targeting the experience of the beginner puzzler, not the experts or even the ‘regulars’.

So, what were the survey results?

Only 4 teams in our survey reported that the difficulty ratings were harmful.  Amazing! We figured that would be higher considering the feedback on the GC thread.

90 teams did say that they were not helpful, but that they didn’t mind their existence.

67 teams said that they were helpful, but not necessary.

Finally, 63 teams said that they really wanted us to keep them!

Breaking Down the Data

Those stats alone don’t tell the full story.  Yes, more people wanted us to keep them than thought they were helpful, but we are interested to know how much they helped the more novice teams.

How are we to judge which QotM responses came in from the beginners vs the more experienced?  We hypothesized that, since we collect solve times, we could look at those and assume that teams that took longer to finish the set were the less experienced puzzlers.

But wait!  What about team size?  Don’t smaller teams take longer?  To check, I ran those numbers and came up with this lovely chart:

Nope! Team size matters very very little to overall solve time.  There is a clear downward trend, but the standard deviation is nearly a consistent 40 minutes for each sized team.

In the chart, I used larger bubbles when multiple teams had the same exact size and minutes taken.  As you can see by the data points, there was a huge variance of solve times, no matter the team size.

Therefore, I felt safe in doing the analysis based purely on the number of minutes taken to do the set, assuming those that took longer were the more inexperienced.  A simple histogram sorting those times into buckets of 15 minutes allowed me to create this graph of the opinion results:

First of all, let us revel in the lovely emergence of the Gaussian curve again in nature.  This one has a fatter tail than true normal, but it’s nice and smooth.  Ahh.

Next, we can clearly ignore the red bars, the ‘complainers’, as they are so few.  So, let’s look only at the rest.

Both light green and orange show no clear trend, but it does seem the dark green increases as solving time increases.  For a clearer picture let’s ignore the number of teams in each category and look at the percentages within each:

Now we can see a significant trend.  Ignoring the outliers on either end (there’s only one team in the 20-34 bucket).  Even though a pretty constant percent of the teams think the rating aren’t helpful, the longer a team takes to solve the more they like the rating system!

Okay, so there’s one lingering question that remained in my mind.  Is this city dependent?  Maybe some cities just hate them and others like them?  Will we see a significant variance among cities, or will they all just be average?  Well, check this out:

Boom!  We have a triangle!  I’ve put the size of each city on the X axis, and their average rating on the Y.  As cities grow the responses move towards the mean answer of ‘slightly yes’.  Still, I’m amazed at the variance in the smaller cities (this is actually locations, not cities, but you know what I mean).

Boston, our largest city by far is clearly supportive of the rating system, not like Victoria’s 100% support, of course, but solidly above the disdain that Tacoma’s 17 people have.  Luckily, this chart shows that, by combining cities, I wasn’t significantly masking any strong negatives from only a few.  No city really minds the system (on average), and most of them are well into the ‘yes’ range.  Austin (both sites) are well into the yes range, which validates the earlier testing there.

Overall folks, the rating system is here to stay.  Thanks for participating, and please keep the suggestions and feedback coming, so we can continue to improve in the future.

Yours Truly,

Neal Tibrewala
Puzzled Pint HQ

 

Your theme suggestions

Puzzled Pint is always looking for authors — both seasoned veterans and people who want to get their first taste at puzzle design. Because we do one set of puzzles per month, our waitlist is about a year out, but that’s a good thing for all. It means we have to time to work with draft puzzles, provide direct feedback, bounce the puzzles off of playtesters in the US and abroad, and route that feedback to the author as suggested revisions.

Some authors like to come up with a theme first, then see what sorts of puzzle mechanisms that theme inspires. Others like to come up with mechanisms first and then wrap them in story and theme. Both ways are equally valid. For what it’s worth, bonus puzzles are often — but not always — in the latter camp.

This month we asked a Question of the Month to our Puzzled Pint attendees. We asked you to suggest themes for upcoming months. Our hope was that this could provide a source of inspiration to future authors. We would like to share the results here. There were 374 total suggestions from 23 cities. The top suggestions (with 3 or more votes) are:

  • Star Wars (14)
  • Harry Potter (13)
  • Disney (10)
  • board games (9)
  • geography (6)
  • video games (6)
  • Pokemon (6)
  • Alice in Wonderland (5)
  • Game of Thrones (4)
  • Doctor Who (4)
  • Star Trek (4)
  • Dr. Seuss (4)
  • Lord of the Rings (3)
  • superheroes (3)
  • space (3)
  • Carmen Sandiego (3)
  • pirates (3)
  • alcohol (3)
  • Buffy the Vampire Slayer (3)
  • Olympics (3)
  • food (3)
  • X-Files (3)
  • pizza (3)
  • Breaking Bad (3)
  • James Bond (3)
  • Shakespeare (3)

We happened to do Disney Star Wars back in 2012 when the Lucas/Disney sale was first announced, but there’s no reason why we couldn’t do Disney, Star Wars, or another Disney Star Wars. We also did board games back in May of 2011, Doctor Who a little more recently in September, and James Bond in 2013. But these are all good themes, and as long as the puzzles are unique, fun, and challenging, we’re open to revisiting past motifs.

The remaining suggestions are as follows (highlights added strictly for humor value I’ve highlighted a few unique entries to better stand out):

30 Rock, 7 Wonders, 80s, 80s action movies, 90s, a salute to gingers, AA Milne, Adventure Time, ALF, anagrams, Ancient Rome, Angry Birds, astrology, Austin, automobiles, Back to the Future, backpacking, bad Scifi movies, Barbie, Beatles songs, beer, Best of the MIT Mystery Hunt, birdwatching, Blade Runner, Bones, boy bands, branches of the Armed Forces, Britney Spears, Broadway shows, butts, Calvin and Hobbes, cats, cheese (A Brie Encounter, Cheddar Off Dead, etc.), childhood, childhood games, chocolate, circus, classic cinema, classic literature, Clue, Clue (the game), Coca-Cola, college football conferences, colors, comic books, Comics, composers of classical music, conspiracy theories, crosswords, cryptology, cuisines, cultures around the world, David Bowie + The Muppets = Labyrinth, David Bowie/labyrinth, DC, dessert, dinosaurs, Disney Princesses, dogs, donuts, Downton Abbey, Edgar Allan Poe, Egypt, emoji, escape rooms, Ex Machina, exploring, fairy tale, fairy tales, famous cathedrals, famous Chicagoans/landmarks/history, famous crossroads, Fargo, Firefly, Firefly/fireflies/“Firefly”, fish, flowers, Follow that Bird, football (soccer), Futurama, G. I. Joe, Ghostbusters, Gilmore Girls, grade school, Gravity’s Rainbow, hair metal, hair metal bands, Hamilton, He-Man, Hello Kitty/Sanrio, history, holidays, Hollywood/directing a film, hot cheese, House MD, HP Lovecraft, Hunger Games, Jeopardy, Jim Henson, John Hughes movies, Keep Austin Weird, Labyrinth (the film), Lady Gaga outfits, Larry Bird vs. Dr. J, Law & Order SVU, League of Legends, Lego, libraries, Limburger, literature, logic puzzles, Looney Toons, Mad Magazine, magic, March Madness, Mardi Gras, Marvel, Mass Effect, math, mazes, MegaMan, Mel Brooks, Michael J. Fox, Miyazaki (Totoro), moar robots, Monty Python, movies, Muppets, museums, music, music, sheet music, musicals, mythical creatures, Nickelodeon, Nintendo, Orphan Black, outer space, Parks & Rec, Party Down!, pinball, Pixar, Portal, Portlandia, Post Apocalyptia, presidents, pro wrestling, psychology, Pulp Fiction, pumpkin everything, QI, raccoons, Rambo, Red Dwarf, Rocky Horror, Roman, Roman numerals, RPGs, running, running a newspaper, science, scifi, scifi movies, Scooby Doo, seas on Earth, seas on the moon, secret agent, Seinfeld, Sesame Street, sharks, Sharktopus, Sherlock Holmes, Simpsons, Smurfs, solar system, songs by a famous band, space/planets/astronomy, spies, spies/spying, sports, Steven King, Story Lords (on YouTube, created in 1984, children’s reading educational program produced in Wisconsin), stupid laws (e.g. emergency Sasquatch ordinance), summer camp, Super Mario, Terry Pratchett works, the (fictional) Martians, The Big Bang Theory, The Birdman of Alcatraz, the elements, The Hateful Eight, the impressionists, The Legend of Zelda, The Martian, The Matrix, the movies of Gene Kelly, The Office, The Power Broker: Robert Moses and the Fall of New York, the presidents, The Smashing Pumpkins, the victorian era, The World Series, time travel, Tom Cruise, Tour de France, transportation, trash pandas, travel/flights, traveling, TV game shows, TV shows with initialisms, Twin Peaks, twitter, US presidents, varieties of tomatoes, Walking Dead, War of 1812, We care more about good content. Anything can make a good theme if handled well., weather systems, Weird Al, Whedonverse, who-dunnits (Like Jan 2016), wilderness survival, Winnie the Pooh, women scientists or musicians, World of Warcraft, xkcd, Yo-yos, “don’t be such butts”, “less poetry, more math”, and “one that does not require scissors

A few of the suggestions are funny in context (such as the one team that suggested Angry Birds, birdwatching, Follow that Bird, The Birdman of Alcatraz, and Larry Bird vs. Dr. J, or the team that suggested The Martian as well as sci-fi martians in general). A few of the suggestions might not resonate in all countries globally, such as US presidents. Several suggestions all orbited around David Bowie, Labyrinth, the Muppets, and Jim Henson. It was tough to normalize them down to one specific word or phrase, but any of those ideas could be fun.

If you’re interested in writing a whole month or simply contributing a bonus puzzle or two, please contact us and we’ll point you to the author guidelines. And if you’d like to perform your own analysis, you can download the raw data.

Do you check standings?

Since the beginning, Puzzled Pint has logged team standings each month — a table of which teams attended, solve times, and so on. It was a fun idea in the beginning, but as Puzzled Pint matures, there has been friction about the published standings. Internally, we use them as a rough gauge of how hard we thought the month’s puzzles were vs. reality. But publishing them on the site is extra work and their value has recently been called into question.

From a player’s point of view, they are entirely artificial. “Solve time” means very little when local Game Control freely gives out hints. It’s also highly variable based on play style. Some cities eat dinner first and then jump into puzzles undistracted. Some solve puzzles more casually over food and drink. A few larger teams split the packet up, solving puzzles in parallel. Most teams focus on one puzzle at a time so that all players can savor the a-ha moments.

From Puzzled Pint Headquarters’ point of view, the public standings page is extra work and additional moving parts when producing events each month. For legacy reasons, most of the standings are manual operations, split between local GC and HQ. This was fine with one or two cities, but hasn’t scaled up well. HQ must wait until all the local city GC have entered their data, then an HQ volunteer finds time to clean up and normalize the entries for public consumption, copying between spreadsheets. We like to focus on finding authors, playtesting puzzles, helping with feedback and editing, onboarding new cities, and running events. Compared to the other responsibilities, updating standings often feels like busywork.

There’s also the philosophical angle in which Puzzled Pint is meant to be a beginner-friendly event. Some see standings as fostering a competitive environment by keeping score and highlighting the more experienced teams. Our Puzzled Pint Charter specifically states that we’re non-competitive with no prizes and no scoring. Are standings a form of score?

With that in mind, we presented teams with a related Question of the Month (QotM) in January: “After the event, do you check the standings?” with the following results:

Answer Team Count Percent
Yes 121 38.4%
No 53 16.8%
What standings? 61 19.4%
No answer / bad answer 80 25.4%

The majority of answers in that last row were from teams not filling out the question. A few were from teams checking multiple conflicting answers, effectively throwing out simultaneous Yes and No answers (there were only a few). Teams that checked both No and What were counted as What.

Plotting the raw percentages gives us this chart:

image

We can probably lump together the No and most of the What answers. There’s a chance that a few Whats might convert to Yeses after becoming aware of their existence, but it’s likely only a small number of teams. The non-answers could go either way, but my experience in hearing from teams, local GC, and Twitter comments is that the folks that love standings are very vocal about loving the standings (and about getting them posted in a timely manner). The people that don’t care or outright don’t like them tend to be a little more apathetic or quiet. Given this, I could make a broad sweeping assumption that the Yes folks all said yes and that the non-answer folks are probably in the No or What category:

image-assumptions

But do remember that this makes some sweeping assumptions that may not be entirely valid.

Looking at city-by-city data, the locations in which the Yes crew took the majority are: Austin, Chicago, Detroit, London, Phoenix, Eastside Seattle, and Washington DC.

So what does this all mean? It means we have better visibility into who likes standings. In the short term, probably not much will change. The person that has been responsible for updating standings is stepping away from them — we’re looking for a new volunteer, preferably a local GC member from a Puzzled Pint city. In the long term, we have to decide if we’re going to spend the time and effort to better automate the process. Or decide that what standings has become is now counter to the tenets of Puzzled Pint.

Puzzled Pint Attendance by City

We do our best at Puzzled Pint events to take attendance. It started in Portland as a curiosity. Or maybe an obsession on the part of Matt C. We just knew we weren’t going to break 30-40 people in attendance, so tracking seasonal change in participation was a fun curiosity.

Fast-forward to moving Puzzled Pint beyond Portland. More cities, more attendees, even breaking some cities into multiple regions (Portland, Seattle, London). When Portland started routinely getting 80+ people, shopping for bars to host the event became much more constrained. As we pull in more guest authors, they want better data on how many people their puzzles will reach. The curiosity became a necessity.

We collect the attendance data, but don’t often share it in an interesting way. The raw data is always available by digging through the standings page for each city, but I thought it would be fun to share graphically. The underlying data covers a year and 3 months. That starts after the explosive growth in Portland and Seattle, but early enough to capture the birth of a dozen new cities.

Below, we have global attendance — the sum of all the cities. This charts overall growth, such as onboarding new cities, as well as growth within the cities.

Global Attendance
Global Attendance

The dark blue squiggily line shows actual attendance, smoothed to curves. The offset light-blue line shows a 2-month rolling average. This average gives a more realistic estimation that buffers over sudden spikes or dips. You can see the trend globally that we always see in Portland: a bump in the summer and a dip in the winter. When it gets cold, people are less likely to leave the warmth of their homes.

Just for fun, I made a stacked-bar chart, broken down by city. The colors and smaller slivers are pretty difficult to read, but you can get a feel for which cities pull in the larger numbers of people. (Spoiler alert: it’s larger urban centers and/or Puzzled Pint venues that have been around longest. Shocking, I know.)

Attendance by City
Attendance by City

I’d considered putting the same data on a line chart, comparing city to city. I felt that was perhaps a little too disheartening to have larger cities directly compared to smaller ones in such a fashion. I instead plotted each city individually, using the same range for the x (time) axis, but varying the y (attendance) range to better normalize and fit the data for a given city.

Keep in mind that there will be some fluctuation and bounce when looking at individual cities. Sometimes a city isn’t able to report attendance stats (or even overall standings) for the month. Sometimes there are teams that take packets but quietly leave without checking in with GC. Sometimes additional teammembers arrive part-way through the event without being counted. Looking at the global numbers helps buffer away some of these localized glitches. Looking at an individual city, especially a smaller one, may result in what appears to be wild changes in attendance due to the more significant margin of error.

So: what does it all mean? It means Puzzled Pint continues to grow. If you’d like to run Puzzled Pint in your city, please contact us and we’ll walk you through what it takes. (Hint: it’s pretty easy!)

 

Gender Representation at Puzzled Pint

Puzzle events in Portland have, anecdotally, been a fairly even 50/50 split between men and women. Nobody has thought to tally up exact numbers, mainly because we don’t think much about it. But that has been the general consensus of gender split when discussing Portland puzzle events.

We occasionally get out-of-town visitors — familiar with Puzzled Pint in their hometown — playing here in Portland on the second Tuesday. They’re sometimes surprised at the turnout of women at the event.

This month, we thought we’d grab some empirical data about the gender of players from all the cities out there. For the sake of simplicity, we kept the choices simple: male, female, and other/unspecified. Were I to do this again, I would have separated those two slashed options. A nonbinary gender answer is much different from an “I’d rather not say” answer. (See also: Vienna’s large percentage of this category.) I apologize if anyone felt marginalized or under-represented by this grouping.

Brooklyn took the month off, and we didn’t get responses from Los Angeles or the two Seattle locations by the time this data was generated, but barring those data points here is the global result (blue==male, pink==female, magenta==other/unspecified):

Puzzled Pint's global gender breakdown
Puzzled Pint’s global gender breakdown

If you’d like to see the results for your city, as compared to the global results (or other cities), here they are. Hover your mouse to see the city name, or click for a larger version.

In case you’d like to examine the source data or Ruby script to generate these graphs, they’re linked below. Also keep in mind that these are self-reported numbers. A few outliers were thrown out — one team used a fractional number and another team said on their answer sheet that they had 100 team members in the “other” category.