Form Data Extraction
Form Data Extraction is a game about optical character recognition. You will be given a series of images (via a URL) which contain data headers ("labels" or "keys"), like "Name", and entries ("values"), like "Ada Lovelace". Using Microsoft's Computer Vision API, you must analyse the image you are given and then process the API response to find the labels and values you are looking for. In the Game Styles "3 Images", "10 Images" and "30 Images", you are given a list of labels on the form. In the harder Game Style "No labels", you have to work these out for yourself.
The template code gives you a structure of how you can analyse the Microsoft API response. You should begin by thinking only about the Game Styles in which you are given the labels which are on the form. Some initial questions you might ask are:
- What calls to guess_at_label() do I need to make? (There is a glaring inefficiency in the template code which can be fixed in a couple of lines.)
- How can I improve the distance function? That is, what do I know about where the data corresponding to a label is likely to be?
- Can find_closest_line() take into account anything more than just the distance between two boxes? Are there any other traits that might give away the data field that corresponds to a label?
Throughout all of this, you should bear in mind that ideally, your code should still perform fairly well even if the database of images is completely replaced.
The Game Style in which you are not given any labels is much more difficult. How can you analyse the API response to work out where the labels are, or what sort of text a label generally consists of? Ideally, your code should still be as general as possible, rather than specific to one type of form or a specific set of labels.
The calculate_move() function is the equivalent of your main function. This is where you need to make your changes and from where you will control the game. The function:
- 1.Receives information about the game in a Python dictionary called 'gamestate'.
- 2.Returns a game move.
In order to work out what move you want to make, the calculate_move() function is given as input the gamestate. The gamestate is a Python dictionary where all of the game information is held. Examples of accessing data in the gamestate are:
gamestate["Image"] # the URL where the current image is held
gamestate["MyScore"] # the number of points you have scored
The gamestate contains the following keys:
An example gamestate:
'Labels': ['Contingency MOT number', 'Vehicle Registration', 'Vehicle Identification Number', 'Make', 'Model', 'Colour', "Issuer's name", 'EU Classification', 'Country of Registration', 'Expiry Date', 'Issued', 'Test Station', 'Odometer Reading and History', 'Inspection Authority', 'Result of the test'],
You need a Microsoft API subscription key to play this game. The template code you are given takes you through how to get a JSON object analysis of the form by making a POST request with your key and the image URL. To understand the format of the object, take a look at the comments in the template code under the header "API return value". To help you process some of its information, helper functions are given in the template code.
Every time it is your turn, the game engine will run your calculate_move() function, which needs to return a valid move. A valid move is any dictionary object. It should have the labels which you have found on the form and the corresponding data. (If you are in the Style "user_given_labels", the labels are simply those listed in gamestate["Labels"].
- guess_at_label(label, analysis) – Given a label and the dictionary object representing the Microsoft API analysis of the image, make a guess at what data corresponds to the label. This function uses the helper functions below to do this.
- find_closest_line(region, labelBox) – Given a "region" (part of the API analysis) and a boundingBox corresponding to the label we are looking for, find the line l in that region which minimises the value of the helper function distance(labelBox, l).
- distance(labelBox, potentialValueBox) – Given a boundingBox for a label and a piece of text, return an integer representing how likely the text is to be the data corresponding to the label. The higher the integer, the less likely the text is to match the label. In the template code, this is a measurement of their distance apart, with two penalties if the second box is not below or not to the right of the first box, as we expect the label to be either to the left of or above the data.
- merge(line) – Given a "line" (part of the API analysis), merge it into one boundingBox and a string of all the text inside it.
- format_bounding_box(boundingBoxString) – Given a boundingBox in the form returned by the API analysis, i.e. a string of four comma-delimited values, return the same information as a list of integers.