Microsoft Cognitive Challenge

Introduction

In this challenge you will receive 3 links to images that are sentences, (written in English), that represent poorly scanned text images. The text images include typos, blurring, additional or missing spaces and swapped letters.
The following image is an example:

mcc_example1.png

The challenge is to determine what the true sentence is.

To help you do this you can use the Microsoft Cognitive Services API. In the example above the answer would be "How much did he charge to mend the chain?".

Challenge Objective

In the challenge, you will compete against one of our housebots, housebot-competition.

The objective is to get the closest answer or, even better, the exact phrase for each of the three sentences that you are given. You will be scored by taking the Levenshtein distance (or edit distance) between your answer and the correct solution. The Levenshtein distance is calculated by taking your answer and the correct sentence and working out the number of deletions, insertions, or substitutions required to transform your answer into the correct sentence.

For example if the correct solution was:

  • "The quick brown fox jumps over the lazy dog"

And you submitted:

  • "The qucik brown fox jumps over the lazy dg"

The distance would be 3, as you would need to

  • swap the 'c' to an 'i'
  • swap the 'i' to a 'c'
  • insert an 'o'.

When working out the distance the test is NOT case sensitive eg 'Apple' will correctly match 'apple' with no penalty.

The distance for all three sentences will be calculated, and the total difference from your opponents distance will represent your result in the challenge. For example if your total distance across 3 sentences is 15 and your opponents is 20, you will score 5 and win this game.

During the game you will be presented with the current state of play:

mcc_example2.png

In the image above we have highlighted three areas. One in red, one in yellow and one in blue. The area highlighted by the red box is your current guess for the first sentence. The area highlighted by the yellow box shows the current Levenshtein distance for you and your opponent. In this example you are currently losing 10 to 8 and your net score would therefore be -2. The area highlighted by the blue box is a visual representation of how well you are doing in the game, the bigger the green bar the better your bot is performing.

At the end of the game all 3 differences are added together and this will be your overall score for the game. So in the example above if the game were to end at this point your score would be -6.

Official Game Play Criteria

In order for a game to be included in the challenge, it must:

  • Play the Microsoft Cognitive Challenge game type
  • Play against housebot-competition
  • Play Game Style "901 | 3 Sentences | Stake 0 | Prize 0"

Microsoft Cognitive Services

In order to recognise and interpret the text in the images, you can use the Microsoft Cognitive Services API.

Microsoft's Cognitive Services API provides several powerful functions that greatly simplify the process of recognising the text in the images provided and allow you to start interpreting that text so that you can convert it into a grammatically correct english sentence. This will give you the best chance of translating the image into the correct text sentence.

To get your Microsoft Cognitive Services API Keys you need to sign up for a free Microsoft Azure trial using the following guide.

Recommended Microsoft Cognitive API functions

Including the two APIs we use in our example code, the Microsoft Cognitive Services API functions that might help you in this challenge and that we recommend you look at are:

  • Computer Vision API
    • Optical Character Recognition (OCR) detects text in an image and extracts the recognized characters into a machine-usable character stream.
      • Read text in images
        • Optical character recognition (OCR) detects text in an image and extracts the recognised words into a machine-readable character stream. Analyse images to detect embedded text, generate character streams and enable searching.
        • Use this function to make an initial scan of the image to convert it into text.
  • Bing Spell Check
    • The Bing Spell Check API lets you perform contextual grammar and spell checking.
      • Spell check words.
        • With the ability to switch between spelling correction for web searches (“Spell”) and documents (“Proof”). “Spell” is more aggressive in order to return better search results, while “Proof” is less aggressive and adds capitalisation, basic punctuation and other features to aid document creation.
        • When you are happy that you have converted the image into text, use this to find and deal with spelling errors
  • Web Language Model API
    • Automate a variety of standard, natural language processing tasks using state-of-the-art language modelling APIs.
      • Word breaking
        • Insert spaces into a string of words lacking spaces.
        • A common issue with scanned text is missing spaces or putting spaces in the wrong place. You can further analyse the image and the text you have with this function.
      • Joint probabilities
        • Calculate how often a particular sequence of words appear together.
        • When you have your text, see the probability that word pairs and word sequences are correct.
      • Conditional probabilities
        • Given a sequence of words, calculate how often a particular word tends to follow.
        • Similar to Joint Probabilities, determine if words are likely to follow other words in your text
      • Next word completions
        • Given a sequence of words, get the list of words most likely to follow.
        • If your following words are not correct, find out which words are likely to follow and narrow down your options to replace the word.

We've created a page on Bot Strategy that further discusses the Microsoft Cognitive API funxtions and how you can use them.

Python Getting Started Guides for Microsoft Cognitive API

Microsoft provide some great guides to using the Computer Vision and Languages functions with Python. To find these guides and read more about Getting Started with the Microsoft Cognitive Services API, follow the links below:

Creating a game playing bot

We have a fuller explanation of the Using the Online Code Editor to write your first bots, but the very quick start guide would be to:

  • Register and login to the AI Gaming site.
  • Go to the Editor page from the menu options at the top of the site
  • Select the Microsoft Cognitive Challenge in the game type drop down
  • Using the New button dropdown, create a new file using the Microsoft Cognitive Challenge option
  • This will load example code for a working bot to play the Microsoft Cognitive Challenge
  • You must update the code at the top of the example that asks for your Microsoft Key
headers_visual = {'Ocp-Apim-Subscription-Key': 'YOUR MICROSOFT COMPUTER VISION API KEY HERE'}
headers_spell = {'Ocp-Apim-Subscription-Key': 'YOUR MICROSOFT BING SPELL CHECK API KEY HERE', 'Content-Type': 'application/x-www-form-urlencoded'}

Now you can click the Run button at the top right of the editor and play your first game.

It is now up to you to tweak and improve the example code that we have provided you with. Everything happens in the calculateMove() function of the example code. This is where you receive the images in the gameState object and where you can call the Microsoft Cognitive API functions to start to interpret them.

You can get more information about the information you receive and the move that you have to return on our Programmer's Reference page for this Microsoft Cognitive Challenge game.

—-

This should have you up and running for this challenge. There is more information on this site and we've included links to some of the pages that we think will be most useful to you below:

Useful Links

iconMSCognitiveChallengeBlack.png
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License