AI Gaming
Search
⌃K

Blurry Words

Introduction

In this challenge you will receive 3 links to images that are sentences, (written in English), that represent poorly scanned text images. The text images include typos, blurring, additional or missing spaces and swapped letters.
The following image is an example:
The challenge is to determine what the true sentence is.
To help you do this you can use the Microsoft Cognitive Services API. In the example above the answer would be "How much did he charge to mend the chain?".

Challenge Objective

In the challenge, you will compete against one of our Housebots, housebot-competition.
The objective is to get the closest answer or, even better, the exact phrase for each of the three sentences that you are given. You will be scored by taking the Levenshtein distance (or edit distance) between your answer and the correct solution. The Levenshtein distance is calculated by taking your answer and the correct sentence and working out the number of deletions, insertions, or substitutions required to transform your answer into the correct sentence.
For example if the correct solution was:
  • "The quick brown fox jumps over the lazy dog"
And you submitted:
  • "The qucik brown fox jumps over the lazy dg"
The distance would be 3, as you would need to
  • swap the 'c' to an 'i'
  • swap the 'i' to a 'c'
  • insert an 'o'.
When working out the distance the test is NOT case sensitive eg 'Apple' will correctly match 'apple' with no penalty.
The distance for all three sentences will be calculated, and the total difference from your opponents distance will represent your result in the challenge. For example if your total distance across 3 sentences is 15 and your opponents is 20, you will score 5 and win this game.
During the game you will be presented with the current state of play:
In the image above we have highlighted three areas. One in red, one in yellow and one in blue. The area highlighted by the red box is your current guess for the first sentence. The area highlighted by the yellow box shows the current Levenshtein distance for you and your opponent. In this example you are currently losing 10 to 8 and your net score would therefore be -2. The area highlighted by the blue box is a visual representation of how well you are doing in the game, the bigger the green bar the better your bot is performing.
At the end of the game all 3 differences are added together and this will be your overall score for the game. So in the example above if the game were to end at this point your score would be -6.

Microsoft Cognitive Services

In order to recognise and interpret the text in the images, you can use the Microsoft Cognitive Services API.
Microsoft's Cognitive Services API provides several powerful functions that greatly simplify the process of recognising the text in the images provided and allow you to start interpreting that text so that you can convert it into a grammatically correct english sentence. This will give you the best chance of translating the image into the correct text sentence.
To get your Microsoft Cognitive Services API Keys you need to sign up for a free Microsoft Azure trial using the following guide.

Recommended Microsoft Cognitive API functions

Including the two APIs we use in our example code, the Microsoft Cognitive Services API functions that might help you in this challenge and that we recommend you look at are:
    • Optical Character Recognition (OCR) detects text in an image and extracts the recognized characters into a machine-usable character stream.
      • Read text in images
        • Optical character recognition (OCR) detects text in an image and extracts the recognised words into a machine-readable character stream. Analyse images to detect embedded text, generate character streams and enable searching.
        • Use this function to make an initial scan of the image to convert it into text.
    • The Bing Spell Check API lets you perform contextual grammar and spell checking.
      • Spell check words.
        • With the ability to switch between spelling correction for web searches (“Spell”) and documents (“Proof”). “Spell” is more aggressive in order to return better search results, while “Proof” is less aggressive and adds capitalisation, basic punctuation and other features to aid document creation.
        • When you are happy that you have converted the image into text, use this to find and deal with spelling errors
    • Automate a variety of standard, natural language processing tasks using state-of-the-art language modelling APIs.
      • Word breaking
        • Insert spaces into a string of words lacking spaces.
        • A common issue with scanned text is missing spaces or putting spaces in the wrong place. You can further analyse the image and the text you have with this function.
      • Joint probabilities
        • Calculate how often a particular sequence of words appear together.
        • When you have your text, see the probability that word pairs and word sequences are correct.
      • Conditional probabilities
        • Given a sequence of words, calculate how often a particular word tends to follow.
        • Similar to Joint Probabilities, determine if words are likely to follow other words in your text
      • Next word completions
        • Given a sequence of words, get the list of words most likely to follow.
        • If your following words are not correct, find out which words are likely to follow and narrow down your options to replace the word.
We've created a section on Bot Strategy that further discusses the Microsoft Cognitive API functions and how you can use them.

Python Getting Started Guides for Microsoft Cognitive API

Microsoft provide some great guides to using the Computer Vision and Languages functions with Python. To find these guides and read more about Getting Started with the Microsoft Cognitive Services API, follow the links below:

Creating a game playing bot

We have a fuller explanation of the Using the Online Code Editor to write your first bots, but the very quick start guide would be to:
  • Register and login to the AI Gaming site.
  • Go to the Editor page from the menu options at the top of the site
  • Select the Microsoft Cognitive Challenge in the game type drop down
  • Using the New button dropdown, create a new file using the Microsoft Cognitive Challenge option
  • This will load example code for a working bot to play the Microsoft Cognitive Challenge
  • You must update the code at the top of the example that asks for your Microsoft Key
headers_visual = {'Ocp-Apim-Subscription-Key': 'YOUR MICROSOFT COMPUTER VISION API KEY HERE'}
headers_spell = {'Ocp-Apim-Subscription-Key': 'YOUR MICROSOFT BING SPELL CHECK API KEY HERE', 'Content-Type': 'application/x-www-form-urlencoded'}
Now you can click the Run button at the top right of the editor and play your first game.
It is now up to you to tweak and improve the example code that we have provided you with. Everything happens in the calculate_move() function of the example code. This is where you receive the images in the gamestate object and where you can call the Microsoft Cognitive API functions to start to interpret them.
You can get more information about the information you receive and the move that you have to return on our Programmer's Reference page for this Blurry Words game.

Strategy

The template code provide will perform some basic functions using the Microsoft Cognitive Services including reading the text from image and a basic spell check.
Here are some further suggestions of strategies to improve your bots performance.

Word breaking

Use the Web Language Model API word breaking function to split words when you think one or more spaces have been removed.
https://westus.api.cognitive.microsoft.com/text/weblm/v1.0/breakIntoWords?model=title&text=thequickbrownfox&order=5&maxNumOfCandidatesReturned=5
This would return a result of:
{
"candidates": [{
"words": "the quick brown fox",
"probability": -7.5340000000000007
}, {
"words": "the quickbrownfox",
"probability": -11.758999999999999
}, {
"words": "t h e quick brown fox",
"probability": -14.189
}, {
"words": "th e quick brown fox",
"probability": -14.531
}, {
"words": "th e quickbrownfox",
"probability": -16.1
}]
}
In the above example you can see the text "thequickbrownfox" was submitted to the function and it correctly identifies the most likely correct sentence would be "the quick brown fox".

Next word completions

Use the Web Language Model API Generate Next Words function to see if the next word in the sentence matches. For example you have the sentence "the quick brown ofxx" after you have applied the spell check. You could take the first 3 words of the sentence to see what might come next.
https://westus.api.cognitive.microsoft.com/text/weblm/v1.0/generateNextWords?model=title&words=the quick brown&order=5&maxNumOfCandidatesReturned=5
This would return a result of:
{
"candidates": [{
"word": "fox",
"probability": -0.014
}]
}
If this matches the next word move on if it doesn't try replacing it and see if your answer is any better.

Keep a track of your score

If your distance is small don't make too many changes. If your distance is exactly 2 you could try swapping common two letter words with other possibilities eg replace 'to' with 'at'. You could also try adding in two letter words.

There are only two words 'A' and 'I' that are one letter long

If you have any other letter on its own you know it is wrong so try removing spaces either to the left or right of it.