AI Gaming

Form Data Extraction

Form Data Extraction is a game about optical character recognition. You will be given a series of images (via a URL) which contain data headers ("labels" or "keys"), like "Name", and entries ("values"), like "Ada Lovelace". Using Microsoft's Computer Vision API, you must analyse the image you are given and then process the API response to find the labels and values you are looking for. In the Game Styles "3 Images", "10 Images" and "30 Images", you are given a list of labels on the form. In the harder Game Style "No labels", you have to work these out for yourself.

Microsoft Computer Vision API

You will need a subscription key to Microsoft's Computer Vision API to play this game. See Signing Up For Azure for more information.


The template code gives you a structure of how you can analyse the Microsoft API response. You should begin by thinking only about the Game Styles in which you are given the labels which are on the form. Some initial questions you might ask are:
  • What calls to guess_at_label() do I need to make? (There is a glaring inefficiency in the template code which can be fixed in a couple of lines.)
  • How can I improve the distance function? That is, what do I know about where the data corresponding to a label is likely to be?
  • Can find_closest_line() take into account anything more than just the distance between two boxes? Are there any other traits that might give away the data field that corresponds to a label?
Throughout all of this, you should bear in mind that ideally, your code should still perform fairly well even if the database of images is completely replaced.
The Game Style in which you are not given any labels is much more difficult. How can you analyse the API response to work out where the labels are, or what sort of text a label generally consists of? Ideally, your code should still be as general as possible, rather than specific to one type of form or a specific set of labels.

Programmer's Reference

Making a move

The calculate_move() function is the equivalent of your main function. This is where you need to make your changes and from where you will control the game. The function:
  1. 1.
    Receives information about the game in a Python dictionary called 'gamestate'.
  2. 2.
    Returns a game move.

The gamestate

In order to work out what move you want to make, the calculate_move() function is given as input the gamestate. The gamestate is a Python dictionary where all of the game information is held. Examples of accessing data in the gamestate are:
gamestate["Image"] # the URL where the current image is held
gamestate["MyScore"] # the number of points you have scored
The gamestate contains the following keys:
A string describing the style of the game, either "user_given_labels" or "user_not_given_labels".
The round number you are on, 0-indexed so that the first round is Round 0. (That is, it is the number of images you have already submitted.)
A URL to the image of a form which you are to collect the data from. (Images are given to you one at a time, and you cannot get the next one until submitting your answers to the current one.) This URL will always end in "image_k.png" where it is the k-th image in the game (that is, k is one more than RoundNumber).
If the Style field of the gamestate is "user_given_labels" then "Labels" will be in the gamestate. It is a list of strings, where each string is a key on the form (e.g. "Name") which you are aiming to find the value of (e.g. "Ada Lovelace").
A Boolean value indicating whether it is your turn to move. It will always be true when the calculate_move() function has been called.
The epoch time, in milliseconds, that the game will end at, unless both players finish submitting images before this time. (Epoch time is the amount of time that has passed since midnight on 1 January 1970, which can be accessed in Python with time.time().)
A string that will have value "RUNNING" if the game is in progress or a reason the game has ended otherwise.
Your current score—the number of correct key-value pairs you have submitted so far in the current game.
Your opponent’s score—the number of correct key-value pairs they have submitted so far in the current game.
An example gamestate:
'Style': 'user_given_labels',
'RoundNumber': 0,
'Image': '',
'Labels': ['Contingency MOT number', 'Vehicle Registration', 'Vehicle Identification Number', 'Make', 'Model', 'Colour', "Issuer's name", 'EU Classification', 'Country of Registration', 'Expiry Date', 'Issued', 'Test Station', 'Odometer Reading and History', 'Inspection Authority', 'Result of the test'],
'IsMover': True,
'EndTime': 1565187141764,
'GameStatus': 'RUNNING',
'MyScore': 0,
'OppScore': 0

Processing the form

You need a Microsoft API subscription key to play this game. The template code you are given takes you through how to get a JSON object analysis of the form by making a POST request with your key and the image URL. To understand the format of the object, take a look at the comments in the template code under the header "API return value". To help you process some of its information, helper functions are given in the template code.

Making a valid move

Every time it is your turn, the game engine will run your calculate_move() function, which needs to return a valid move. A valid move is any dictionary object. It should have the labels which you have found on the form and the corresponding data. (If you are in the Style "user_given_labels", the labels are simply those listed in gamestate["Labels"].

Helper functions

  • guess_at_label(label, analysis) – Given a label and the dictionary object representing the Microsoft API analysis of the image, make a guess at what data corresponds to the label. This function uses the helper functions below to do this.
  • find_closest_line(region, labelBox) – Given a "region" (part of the API analysis) and a boundingBox corresponding to the label we are looking for, find the line l in that region which minimises the value of the helper function distance(labelBox, l).
  • distance(labelBox, potentialValueBox) – Given a boundingBox for a label and a piece of text, return an integer representing how likely the text is to be the data corresponding to the label. The higher the integer, the less likely the text is to match the label. In the template code, this is a measurement of their distance apart, with two penalties if the second box is not below or not to the right of the first box, as we expect the label to be either to the left of or above the data.
  • merge(line) – Given a "line" (part of the API analysis), merge it into one boundingBox and a string of all the text inside it.
  • format_bounding_box(boundingBoxString) – Given a boundingBox in the form returned by the API analysis, i.e. a string of four comma-delimited values, return the same information as a list of integers.