Rating System Feedback

tl;dr People mostly like the star difficulty rating system, so we’re keeping it.

You might have noticed that we’ve been experimenting with adding difficulty rating to puzzles.  I brought this notion to Puzzled Pint HQ last year because some teams in Austin kept asking which puzzles were the “easy” ones.  Looking further into this, it became apparent that those teams were generally of mixed experience levels, and wanted to give the easier ones to the more novice solvers to attempt first.

Our First Two Trials

We tested this in Austin as an A/B test adding “Easy, Medium, or Hard” to the top of each puzzle in October.  I asked each team after the event if they liked having the rating system.  Obviously, the teams that normally asked about difficulty liked it, but nearly all teams gave really positive feedback among them.  In fact, we had only a single negative comment of the nature “I was proud of myself, until I saw it was marked easy“.

In November and December, we went broader adding the same system to every city’s copies.  This went over less well with several complains by GC of people hating the system, and in December of the rating being inaccurate.   The ratings were based strictly on the “difficulty” response provided by playtesters on their feedback forms, but there was some judgement in determining the cutoff values between easy and medium and hard.

This negative feedback from GC was concerning and confusing, since the Austin test had gone so well.  We didn’t know if only the players that hated it were complaining and GC wasn’t getting the positive feedback, or if the testing in Austin was an outlier and the hate was universal.

Another Test

Because of that negative feedback, we decided to scrap the ratings system for January 2017 and think about a resolution.   It seemed the “I felt bad because this was supposed to be easy” was a common complaint on the feedback thread, so we decided on a slightly different system of using a 5-star rating instead of the English words.  Hopefully, this would convey the information, but allow people to rate their abilities themselves instead of having the implicit judgement of not being able to get an ‘easy’ puzzle.

Thus, February’s puzzles had this new system, but, by golly, we were going to solicit player feedback this time to make sure.  If you love charts like I do, you’ll like this next bit…

We had 434 teams do February’s set (1440 people, not including Game Control members).  The puzzles’ difficulties ranged from 2 to 4 stars, as is our goal, not too easy and not too hard.  Of course, in the future, it’s possible for various reasons that a puzzle set might legitimately contain a 1 or a 5-star puzzle.

February’s set was the most playtested, perhaps ever, in Puzzled Pint history. This ensures that the feedback on the system wasn’t tainted by incorrect difficulty ratings.  Even so, I made the call to bump the Cupid puzzle to 4 stars because it had a large standard deviation instead of keeping it at the strict mean, which would have been 3 stars.  We probably will formalize that going forward at setting star difficulties at the first standard deviation to the right of the mean.

So, how were the responses?

Well, first off, we didn’t have the response rate I’d hoped.  Even if you don’t consider teams that didn’t finish the puzzle set (i.e. had fewer than 5 completed puzzles), here’s the response rate by city:

Still, a 59% response rate is enough to represent a good section of our players, and the results are likely skewed away from beginners anyhow because beginners are less likely to finish the set and thus have not given a response.   Recall, Puzzled Pint is very much targeting the experience of the beginner puzzler, not the experts or even the ‘regulars’.

So, what were the survey results?

Only 4 teams in our survey reported that the difficulty ratings were harmful.  Amazing! We figured that would be higher considering the feedback on the GC thread.

90 teams did say that they were not helpful, but that they didn’t mind their existence.

67 teams said that they were helpful, but not necessary.

Finally, 63 teams said that they really wanted us to keep them!

Breaking Down the Data

Those stats alone don’t tell the full story.  Yes, more people wanted us to keep them than thought they were helpful, but we are interested to know how much they helped the more novice teams.

How are we to judge which QotM responses came in from the beginners vs the more experienced?  We hypothesized that, since we collect solve times, we could look at those and assume that teams that took longer to finish the set were the less experienced puzzlers.

But wait!  What about team size?  Don’t smaller teams take longer?  To check, I ran those numbers and came up with this lovely chart:

Nope! Team size matters very very little to overall solve time.  There is a clear downward trend, but the standard deviation is nearly a consistent 40 minutes for each sized team.

In the chart, I used larger bubbles when multiple teams had the same exact size and minutes taken.  As you can see by the data points, there was a huge variance of solve times, no matter the team size.

Therefore, I felt safe in doing the analysis based purely on the number of minutes taken to do the set, assuming those that took longer were the more inexperienced.  A simple histogram sorting those times into buckets of 15 minutes allowed me to create this graph of the opinion results:

First of all, let us revel in the lovely emergence of the Gaussian curve again in nature.  This one has a fatter tail than true normal, but it’s nice and smooth.  Ahh.

Next, we can clearly ignore the red bars, the ‘complainers’, as they are so few.  So, let’s look only at the rest.

Both light green and orange show no clear trend, but it does seem the dark green increases as solving time increases.  For a clearer picture let’s ignore the number of teams in each category and look at the percentages within each:

Now we can see a significant trend.  Ignoring the outliers on either end (there’s only one team in the 20-34 bucket).  Even though a pretty constant percent of the teams think the rating aren’t helpful, the longer a team takes to solve the more they like the rating system!

Okay, so there’s one lingering question that remained in my mind.  Is this city dependent?  Maybe some cities just hate them and others like them?  Will we see a significant variance among cities, or will they all just be average?  Well, check this out:

Boom!  We have a triangle!  I’ve put the size of each city on the X axis, and their average rating on the Y.  As cities grow the responses move towards the mean answer of ‘slightly yes’.  Still, I’m amazed at the variance in the smaller cities (this is actually locations, not cities, but you know what I mean).

Boston, our largest city by far is clearly supportive of the rating system, not like Victoria’s 100% support, of course, but solidly above the disdain that Tacoma’s 17 people have.  Luckily, this chart shows that, by combining cities, I wasn’t significantly masking any strong negatives from only a few.  No city really minds the system (on average), and most of them are well into the ‘yes’ range.  Austin (both sites) are well into the yes range, which validates the earlier testing there.

Overall folks, the rating system is here to stay.  Thanks for participating, and please keep the suggestions and feedback coming, so we can continue to improve in the future.

Yours Truly,

Neal Tibrewala
Puzzled Pint HQ

 

A Call for Puzzle Authors — Write for Puzzled Pint!

The puzzles you see every month at Puzzled Pint don’t just materialize out of the aether. They all start as rough prototypes, often just a simple draft thrown together in a Word document, with little flavor text and no graphic design. The puzzles take several trips through the feedback loop — as first Headquarters, and later playtesters, help polish the rough edges. At the end of this process, we have a month’s puzzles ready to print.

At the moment we have the rest of 2016’s puzzles scheduled. We currently have nothing on the books for 2017. There are a few folks with theme ideas, but we’re not able to put people on the calendar until the first draft of puzzles is ready. We roughly know how long it takes to go from draft puzzles to final puzzles, but “I think I maybe have this idea for a theme and this really cool coding mechanism” is a little too vague to reliably schedule.

So this is an official call! Have you thought about writing puzzles for Puzzled Pint? It’s easier than you expect and this is your chance! While we’re happy to get puzzles from anyone, we would particularly like to see more:

  • authors who are women
  • authors who are people of color
  • authors outside the United States

And although collaborations are fine, we prefer if a single author is responsible for the month’s puzzles. This helps align them editorially, balances difficulty across the whole set of puzzles, and helps ensure two puzzles don’t accidentally use similar mechanisms. (Plus, the folks at PP headquarters would rather manage a single cat than a herd of cats.) If you’re interested in writing only a single puzzle then scroll down to where we talk about bonus puzzles.

The puzzle-writing process is simple. If you have a specific theme in mind, you can (optionally) ping HQ and we’ll let you know if we’ve heard of anyone else also thinking about the same theme. Write some puzzles: a location puzzle, puzzles played at the event, and (optionally, but strongly encouraged) a meta puzzle. Send those our way (with solutions). The solution part is important, especially for new puzzle authors. Puzzles in their draft stage often have unpolished edges, like leaps of logic that are obvious to the author but may need a little flavor text or examples before being visible to others. Once we have puzzles and answers, we’ll put you on the calendar and work with you to help refine the flow of the puzzles, over the course of a couple rounds of playtesting. You can find a lot more detail about the process and requirements at http://www.puzzledpint.com/info/author/.

If you’d like to get your feet wet by writing a single puzzle, as opposed to a whole month of them, we’re also looking for bonus puzzle authors. Some authors like to write a whole set of puzzles, including location, meta, and bonus. Some want to focus on just the main set, without a bonus. We find that players enjoy having a bonus puzzle available, but we cannot always offer one every month. If you’d like to submit just a single puzzle, we’d be happy to work with you on getting it ready for a bonus. (Hey! Here’s a dirty little secret: one can make an arbitrary puzzle fit just about any month’s theme by simply changing flavor text and graphic design.)

This is your call to action! Write puzzles for Puzzled Pint!

PP After Dark – Solutions and More!

Earlier this week, Puzzled Pint Portland was invited to participate in OMSI After Dark, an adults-only nighttime event at a local science museum. The theme was “forensics.” We pulled some thematic puzzles from past months to reprint, and also brought along some surplus copies we had on hand to show people what we’re all about.

There was quite a bit of interest! We started with 100 copies of Brian Enigma’s “What is Puzzled Pint?” info sheet, and ran out of those less than halfway through the evening. OMSI was kind enough to reprint those, as well as more copies of the three featured puzzles:

Brian Hahn himself was on hand to help us tell people about Puzzled Pint, as were regular GCs Matt Shields, Jen Dumont, and yours truly.

As shown in these photos we tweeted, we also pulled out Brian Hahn’s “Paddles” from October 2015 and Andrea Blumberg’s March 2016 Comic Book Mystery as examples of different presentation formats. We’re extremely grateful to all our volunteer puzzle authors who share their creativity with us!

If you were at OMSI After Dark on Wednesday and are looking for solutions to any of the above puzzles, here they are:

Your theme suggestions

Puzzled Pint is always looking for authors — both seasoned veterans and people who want to get their first taste at puzzle design. Because we do one set of puzzles per month, our waitlist is about a year out, but that’s a good thing for all. It means we have to time to work with draft puzzles, provide direct feedback, bounce the puzzles off of playtesters in the US and abroad, and route that feedback to the author as suggested revisions.

Some authors like to come up with a theme first, then see what sorts of puzzle mechanisms that theme inspires. Others like to come up with mechanisms first and then wrap them in story and theme. Both ways are equally valid. For what it’s worth, bonus puzzles are often — but not always — in the latter camp.

This month we asked a Question of the Month to our Puzzled Pint attendees. We asked you to suggest themes for upcoming months. Our hope was that this could provide a source of inspiration to future authors. We would like to share the results here. There were 374 total suggestions from 23 cities. The top suggestions (with 3 or more votes) are:

  • Star Wars (14)
  • Harry Potter (13)
  • Disney (10)
  • board games (9)
  • geography (6)
  • video games (6)
  • Pokemon (6)
  • Alice in Wonderland (5)
  • Game of Thrones (4)
  • Doctor Who (4)
  • Star Trek (4)
  • Dr. Seuss (4)
  • Lord of the Rings (3)
  • superheroes (3)
  • space (3)
  • Carmen Sandiego (3)
  • pirates (3)
  • alcohol (3)
  • Buffy the Vampire Slayer (3)
  • Olympics (3)
  • food (3)
  • X-Files (3)
  • pizza (3)
  • Breaking Bad (3)
  • James Bond (3)
  • Shakespeare (3)

We happened to do Disney Star Wars back in 2012 when the Lucas/Disney sale was first announced, but there’s no reason why we couldn’t do Disney, Star Wars, or another Disney Star Wars. We also did board games back in May of 2011, Doctor Who a little more recently in September, and James Bond in 2013. But these are all good themes, and as long as the puzzles are unique, fun, and challenging, we’re open to revisiting past motifs.

The remaining suggestions are as follows (highlights added strictly for humor value I’ve highlighted a few unique entries to better stand out):

30 Rock, 7 Wonders, 80s, 80s action movies, 90s, a salute to gingers, AA Milne, Adventure Time, ALF, anagrams, Ancient Rome, Angry Birds, astrology, Austin, automobiles, Back to the Future, backpacking, bad Scifi movies, Barbie, Beatles songs, beer, Best of the MIT Mystery Hunt, birdwatching, Blade Runner, Bones, boy bands, branches of the Armed Forces, Britney Spears, Broadway shows, butts, Calvin and Hobbes, cats, cheese (A Brie Encounter, Cheddar Off Dead, etc.), childhood, childhood games, chocolate, circus, classic cinema, classic literature, Clue, Clue (the game), Coca-Cola, college football conferences, colors, comic books, Comics, composers of classical music, conspiracy theories, crosswords, cryptology, cuisines, cultures around the world, David Bowie + The Muppets = Labyrinth, David Bowie/labyrinth, DC, dessert, dinosaurs, Disney Princesses, dogs, donuts, Downton Abbey, Edgar Allan Poe, Egypt, emoji, escape rooms, Ex Machina, exploring, fairy tale, fairy tales, famous cathedrals, famous Chicagoans/landmarks/history, famous crossroads, Fargo, Firefly, Firefly/fireflies/“Firefly”, fish, flowers, Follow that Bird, football (soccer), Futurama, G. I. Joe, Ghostbusters, Gilmore Girls, grade school, Gravity’s Rainbow, hair metal, hair metal bands, Hamilton, He-Man, Hello Kitty/Sanrio, history, holidays, Hollywood/directing a film, hot cheese, House MD, HP Lovecraft, Hunger Games, Jeopardy, Jim Henson, John Hughes movies, Keep Austin Weird, Labyrinth (the film), Lady Gaga outfits, Larry Bird vs. Dr. J, Law & Order SVU, League of Legends, Lego, libraries, Limburger, literature, logic puzzles, Looney Toons, Mad Magazine, magic, March Madness, Mardi Gras, Marvel, Mass Effect, math, mazes, MegaMan, Mel Brooks, Michael J. Fox, Miyazaki (Totoro), moar robots, Monty Python, movies, Muppets, museums, music, music, sheet music, musicals, mythical creatures, Nickelodeon, Nintendo, Orphan Black, outer space, Parks & Rec, Party Down!, pinball, Pixar, Portal, Portlandia, Post Apocalyptia, presidents, pro wrestling, psychology, Pulp Fiction, pumpkin everything, QI, raccoons, Rambo, Red Dwarf, Rocky Horror, Roman, Roman numerals, RPGs, running, running a newspaper, science, scifi, scifi movies, Scooby Doo, seas on Earth, seas on the moon, secret agent, Seinfeld, Sesame Street, sharks, Sharktopus, Sherlock Holmes, Simpsons, Smurfs, solar system, songs by a famous band, space/planets/astronomy, spies, spies/spying, sports, Steven King, Story Lords (on YouTube, created in 1984, children’s reading educational program produced in Wisconsin), stupid laws (e.g. emergency Sasquatch ordinance), summer camp, Super Mario, Terry Pratchett works, the (fictional) Martians, The Big Bang Theory, The Birdman of Alcatraz, the elements, The Hateful Eight, the impressionists, The Legend of Zelda, The Martian, The Matrix, the movies of Gene Kelly, The Office, The Power Broker: Robert Moses and the Fall of New York, the presidents, The Smashing Pumpkins, the victorian era, The World Series, time travel, Tom Cruise, Tour de France, transportation, trash pandas, travel/flights, traveling, TV game shows, TV shows with initialisms, Twin Peaks, twitter, US presidents, varieties of tomatoes, Walking Dead, War of 1812, We care more about good content. Anything can make a good theme if handled well., weather systems, Weird Al, Whedonverse, who-dunnits (Like Jan 2016), wilderness survival, Winnie the Pooh, women scientists or musicians, World of Warcraft, xkcd, Yo-yos, “don’t be such butts”, “less poetry, more math”, and “one that does not require scissors

A few of the suggestions are funny in context (such as the one team that suggested Angry Birds, birdwatching, Follow that Bird, The Birdman of Alcatraz, and Larry Bird vs. Dr. J, or the team that suggested The Martian as well as sci-fi martians in general). A few of the suggestions might not resonate in all countries globally, such as US presidents. Several suggestions all orbited around David Bowie, Labyrinth, the Muppets, and Jim Henson. It was tough to normalize them down to one specific word or phrase, but any of those ideas could be fun.

If you’re interested in writing a whole month or simply contributing a bonus puzzle or two, please contact us and we’ll point you to the author guidelines. And if you’d like to perform your own analysis, you can download the raw data.

Do you check standings?

Since the beginning, Puzzled Pint has logged team standings each month — a table of which teams attended, solve times, and so on. It was a fun idea in the beginning, but as Puzzled Pint matures, there has been friction about the published standings. Internally, we use them as a rough gauge of how hard we thought the month’s puzzles were vs. reality. But publishing them on the site is extra work and their value has recently been called into question.

From a player’s point of view, they are entirely artificial. “Solve time” means very little when local Game Control freely gives out hints. It’s also highly variable based on play style. Some cities eat dinner first and then jump into puzzles undistracted. Some solve puzzles more casually over food and drink. A few larger teams split the packet up, solving puzzles in parallel. Most teams focus on one puzzle at a time so that all players can savor the a-ha moments.

From Puzzled Pint Headquarters’ point of view, the public standings page is extra work and additional moving parts when producing events each month. For legacy reasons, most of the standings are manual operations, split between local GC and HQ. This was fine with one or two cities, but hasn’t scaled up well. HQ must wait until all the local city GC have entered their data, then an HQ volunteer finds time to clean up and normalize the entries for public consumption, copying between spreadsheets. We like to focus on finding authors, playtesting puzzles, helping with feedback and editing, onboarding new cities, and running events. Compared to the other responsibilities, updating standings often feels like busywork.

There’s also the philosophical angle in which Puzzled Pint is meant to be a beginner-friendly event. Some see standings as fostering a competitive environment by keeping score and highlighting the more experienced teams. Our Puzzled Pint Charter specifically states that we’re non-competitive with no prizes and no scoring. Are standings a form of score?

With that in mind, we presented teams with a related Question of the Month (QotM) in January: “After the event, do you check the standings?” with the following results:

Answer Team Count Percent
Yes 121 38.4%
No 53 16.8%
What standings? 61 19.4%
No answer / bad answer 80 25.4%

The majority of answers in that last row were from teams not filling out the question. A few were from teams checking multiple conflicting answers, effectively throwing out simultaneous Yes and No answers (there were only a few). Teams that checked both No and What were counted as What.

Plotting the raw percentages gives us this chart:

image

We can probably lump together the No and most of the What answers. There’s a chance that a few Whats might convert to Yeses after becoming aware of their existence, but it’s likely only a small number of teams. The non-answers could go either way, but my experience in hearing from teams, local GC, and Twitter comments is that the folks that love standings are very vocal about loving the standings (and about getting them posted in a timely manner). The people that don’t care or outright don’t like them tend to be a little more apathetic or quiet. Given this, I could make a broad sweeping assumption that the Yes folks all said yes and that the non-answer folks are probably in the No or What category:

image-assumptions

But do remember that this makes some sweeping assumptions that may not be entirely valid.

Looking at city-by-city data, the locations in which the Yes crew took the majority are: Austin, Chicago, Detroit, London, Phoenix, Eastside Seattle, and Washington DC.

So what does this all mean? It means we have better visibility into who likes standings. In the short term, probably not much will change. The person that has been responsible for updating standings is stepping away from them — we’re looking for a new volunteer, preferably a local GC member from a Puzzled Pint city. In the long term, we have to decide if we’re going to spend the time and effort to better automate the process. Or decide that what standings has become is now counter to the tenets of Puzzled Pint.