Second Wordle post

2021-12-29 09:31:06 -08:00 · 2021-12-29 09:31:06 -08:00 · e668f22072
commit e668f22072
parent 6c7118f9dd
2 changed files with 17 additions and 1 deletions
--- a/blog/content/posts/cheating-at-word-games-part-2.md
+++ b/blog/content/posts/cheating-at-word-games-part-2.md
@ -0,0 +1,16 @@
+---
+title: "Cheating at Word Games: Part 2"
+date: 2021-12-29T08:53:32-08:00
+math: true
+---
+This is a sequel to my [previous post]({{< ref "/posts/cheating-at-word-games" >}}), where I laid out a Information Theoretical approach to algorithmically solving [Wordle](https://www.powerlanguage.co.uk/wordle/) puzzles.
+<!--more-->
+In that post, I considered whether there might be a strategy that takes a broader-view - rather than optimizing for maximizing information at _every_ step, it might be profitable to make a less-optimal guess1 if the combination (guess1+guess2), in combination are "better". One approach that intuitively makes sense is to select a pair of 5-letter words which, combined, comprise the ten most-common letters in the corpus. The two guesses combined will give a clear indication of which of those ten letters are present - and, hopefully, indicate a couple of their positions, too.
+
+These pairs were [pretty easy to generate](https://github.com/scubbo/wordle-solver/blob/2d9279f8570154269ade68d56c0eeade74b24f1b/naive_solve_starter.py) - but, far from giving a single best option, there are 447 of them! Interestingly, my previous suggested first guess - "_roate_" ("_the cumulative net earnings after taxes available to common shareholders_", apparently) - was not in there. Apparently there is no word that is an anagram of the remaining 5 most-common letters, "_sincl_".
+
+The next step would be to find the pair whose partitioning[^1] gives the greatest information. As a heuristic before doing that computation, it would be sensible to start with the pair which contains the single best guess from the previous approach. (I _suspect_ that this is equivalent to "_the guess which is most likely to have correct letters (rather than present ones)_", since correct letters are likely to be "_rarer_" and so to provide more information - but I haven't proven that yet). This might not be the best pair overall, if the second guess is significantly worse-than-average - but it's a good starting point!
+
+That shakes out to recommending the pair of `(soare, clint)`, which has quite a pleasing poetic image of Hawkeye in flight :) now that I have two strategies described ("_always-locally-optimal_" vs. "_guess `soare`, then `clint`, then 🤷_"), I'm looking forward to finding a way to pit them against one another against an automated implementation of the game. I _suspect_ that they'll both reliably "win" in the same number of turns, so I'll either need to score them on their information/entropy properties (and probably have to dig out a textbook to make sure I'm doing that right), or construct some larger dataset for them to compete on.
+
+[^1]: As described in the [previous post]({{< ref "/posts/cheating-at-word-games" >}}), each guess partitions the set of possible solutions into 125 subsets - one for each of the $5^3$ possibilities of `[first letter correct | first letter present | first letter absent] X [second letter correct | second letter present | ...`.
--- a/blog/content/posts/cheating-at-word-games.md
+++ b/blog/content/posts/cheating-at-word-games.md
@ -67,7 +67,7 @@ Intuitively, these seem reasonable. The letters R, O, A, T, E, and S are all pre

 That said, I would love to do some testing on this strategy, by setting up the iterated strategy, having it "play" the game, and recording how may attempts are required to win. It would be particularly cool to see if there are cases where the second guess of this iterated strategy is _known_ to be wrong (from the results of the first guess) because that increases the amount of information gained. If we know that the word ends in "E", there's more information gained by guessing a word that _doesn't_ end in E - but humans (probably?) intuitively try to "keep the known letters". Maybe a follow-up post!

-In particular, I've been informed (while writing this post) that a friend-of-a-friend had already figured out the optimal start-words, and they're similar-to-but-different-from mine - so, there's at least one interesting alternative perspective out there!
+~~In particular, I've been informed (while writing this post) that a friend-of-a-friend had already figured out the optimal start-words, and they're similar-to-but-different-from mine - so, there's at least one interesting alternative perspective out there!~~ (EDIT: 2021-12-29) I [implemented]({{< ref "/posts/cheating-at-word-games-part-2" >}}) an alternative approach - finding pairs of 5-letter words that together comprise the 10 most common letters in the corpus - which I've been told is pretty similar to [Bruno's](https://twitter.com/NotBrunoAgain/status/1475908992174067717) (I intentionally waited until after I'd got two implementations before I spoiled myself by reading those tweets!), though we somehow got different answers. I'm looking forward to learning what I did wrong!

 [^1]: We can figure out exactly what this is by taking a quick peek at the code - or, we could infer that it exists by noting that there are a large-but-finite number of ways of arranging 26 letters in 5 positions (a little less than 12 million), or a smaller-but-still-very-large number of actual five-letter words. The actual list of potential answers for Wordle includes 2,315 words, and there are 10,657 words that you are allowed to guess (that is - there are 8,342 words that you're allowed to guess for information, but that cannot possibly be the answer).