Rummy Vision

Match the tiles in our version of the classic Rummy game. You will have to identify what the contents of each tile’s image is.


Rummy Vision is our version of the classic card game, Rummy. In this version each of the cards will have pictures on them. To make a play you will have to analyse the subject matter of the cards in your hand, which we can do using the A.I. power of Azure Cognitive Services.


In this turn based game the card in the deck may have one of four landmarks on it. The whole deck contains 40 cards: 10 cards of each landmark (the landmarks change every game). At the start of the game, each player is dealt 10 cards and the starting player is chosen randomly.

On each turn the players may look at their own cards and their opponents cards. The player makes a move by doing one of the following:

  1. Lay down a set of cards. This may be 3, 4 or 5 cards of the same landmark. It may also be 3 or 4 cards of all different landmarks. This will score the player points as shown in the table below.

  2. Pick a card to take from the opponent's hand.

  3. Give a card from your hand to the opponent.

The turn ends by the player drawing a card from the deck.

Note: if a player lays down an invalid set of 3, 4 or 5 cards (maybe due to an error in the knowledge of card types), the cards are returned to their hand and the play continues with the opponent. In effect the player misses their turn.

The game ends when one of the players runs out of cards, or when 20 turns are taken and the deck runs out. The players lose 3 points for every card they still have in their hand by the end of the game and the player with the highest score wins. If the players score the same number of points then the player who started second wins. The point rewards are listed below:



Set of three different landmarks

25 points

Set of four different landmarks

35 points

Set of three of the same landmark

20 points

Set of four of the same landmark

30 points

Set of five of the same landmark

50 points

Penalty for cards held at the end of the game

-3 points per card

Azure Cognitive Services

In order to analyse an image and identify what the subject of the image is, you will need to use Azure Cognitive Services. The data returned from the API will contain all the necessary information to identify the landmark that is on the card.

Your hand might contain two cards showing The Taj Mahal, but every card in the deck is a unique picture, so they might be taken from a different angle. For this reason we can't compare images pixel by pixel. We have to understand what each one is showing.

Signing up for Azure Cognitive Services

If you are a student, you can sign up for a student account on Azure that gives you $100 of credit. You can then create a Computer Vision API Key that will let you make 20 API calls per second.

Alternatively, you can get a free trial account on Microsoft's Cognitive API that lets you make 20 API calls every minute to a maximum of 5,000 calls every month.

You can follow our guide to signing up for a Microsoft free trial at this link.

Once you have a Microsoft Account, you will need to create resources within it and obtain API keys for one of Microsoft's Cognitive Services, Computer Vision, to play this game.

Training your own image recognition

For an alternative version of this game, Microsoft's Custom Vision technology will be used to train your own image classifiers that will recognise planets. You will need to create and train a Custom Vision model to recognise the planets before using it in the code. To do this follow these steps:

Navigate to the Custom Vision Portal and sign in with the same account as your Azure portal.

Select 'New Project', which will open the following dialogue:

Fill in the dialogue as follows:

  • Name - enter your project name e.g. 'Planets'.

  • Description - add a sentence to summarise the project.

  • Resource - click on 'create new' in order to make a resource. Make sure that if you are creating a new group you set its location to West Europe. Do the same for the location of the resource. Make sure to leave the 'Kind' as 'CognitiveServices', since this will allow you to use this resource for both training and prediction:

  • Project Types - choose 'Classification', since we want to categorise whole images in this project.

  • Classification Types - choose 'Multiclass', since our pictures will contain only one subject.

  • Domain - you may leave this as 'General'.

After clicking 'Create project' you will be taken to the project page.

To add training images click on 'Add images' and select the first batch of images of the same type. Here is an example doing this with Jupiter:

Type in the tag for your images and press 'Enter' to label them. Then click on 'Upload' to add them to the training library. Repeat this process for as many images and as many categories as necessary.

Once you are done uploading and labeling all images, you can click the 'Train' button. Now select the 'Quick training' option and 'Train'. This will automatically take you to the performance page and show you the results for this new training iteration. You may go back, add more images and train your model again to produce new iterations until you are happy with the result.

To use this model in code, you will need to select the iteration that you would like to use and click 'Publish' in the top left corner. You will be presented with the following dialogue:

Give your model a suitable name and select the same resource you have created when creating this model. Click 'Publish'.

Now you will be able to click on the 'Prediction URL' button to get all the necessary information in order to use this model in your code:

Since in this game the card pictures will be presented as URLs, you will need to look at the top section: 'If you have an image URL'. Note down the prediction URL in the grey box and the prediction key.

Writing Code to Play the Game

Template code

When you first load Rummy Vision in the editor, the code will be a template that plays a very simple game by submitting random moves. This version of the code does not require any API keys.

There are other versions of the template code that makes use of Azure Cognitive Services. You can access all templates in the Online Code Editor under the 'New' button. These are explained in more detail in the sections below.

The calculate_move() function

The calculate_move() function is the equivalent of your main function. This is where you need to make your changes and from where you will control the game.

  • Called each time you need to submit a move to the game.

  • Receives information about the game in a Python dictionary that describes the current game state.

  • Must return a game move.

Understanding the game state

Your calculate_move() function will be passed the current state of the game, the 'gamestate'.

The gamestate is where all of the game information is held. It is a Python dictionary. The following shows an example of the gamestate information that you will receive for each move of the game and some examples of how to access data within it.

For Rummy Vision, the most important fields are MyHand, and OppHand.

Fields that you are unlikely to need in this game are ResponseDeadline, GameStatus, IsMover, GameId and OpponentId.

An example of the gamestate JSON

'AllCardTypes': ["brandenburg gate", "jal mahal", "sydney opera house", "viking ship museum", "pyramid of djoser", "forbidden city", "palais garnier", "palace of westminster", "freedom monument", "trevi fountain", "sunwapta falls", "alamo mission in san antonio", "mount vernon", "slieve league", "sheikh zayed mosque", "haystack rock", "giralda", "big ben", "marine corps war memorial", "mount of olives", "bryggen", "chhatrapati shivaji terminus", "hanauma bay", "empire state building", "phi phi islands", "fitz roy", "koutoubia mosque", "milwaukee art museum", "alcatraz island", "saint joseph's oratory", "city hall, london", "westminster bridge"]
'MyHand': [
[0, ''],
[5, ''],
[2, ''],
[8, ''],
[9, ''],
[26, '']
'OppHand': [
[15, ''],
[16, ''],
[19, ''],
[22, ''],
[11, ''],
[27, ''],
[28, '']
'MyPoints': 50,
'OppPoints': 35,
'MyScore': 0,
'OppScore': 0,
'RemainingRounds': 7,
'RemainingMoves': 18
'CardsInDeck': 18,
'ResponseDeadline': 1593100352476,
'GameStatus': 'RUNNING'
'IsMover': True,
'GameId': 2398560,
'OpponentId': 'housebot-practise'

Example to access the game state fields

Examples of accessing data in the gamestate would be:


The gamestate fields explained

The following list gives a description of what each element in the gameState represents:

  • AllCardTypes - A list of all the possible card types this game style might present you with.

    • You can use this list to make sure your image recognition is able to deal with all possible cards.

    • Keep in mind that this particular instance of the game may not use all of these types.

  • MyHand - A list representing the cards you hold in your hand. Each card is represented by a two element list consisting of its ID and image URL.

    • To determine how many cards you are holding you could use len(gamestate["MyHand"]).

    • An example of accessing the ID of your first card is gamestate["MyHand"][0][0].

    • An example of accessing the image URL of your first card is gamestate["MyHand"][0][1].

  • OppHand - A list representing the cards in your opponent's hand. Each card is represented by a two element list consisting of its ID and image URL.

    • To determine how many cards your opponent is holding you could len(gamestate["OppHand"]).

    • An example of determining the first card of your opponent is gamestate["OppHand"][0].

  • MyPoints - You current points score for this game.

  • OppPoints - Your opponent's current points score for this game.

  • RemainingRounds - The number of rounds of Rummy you have remaining with your opponent.

  • RemainingMoves - The number of moves you have left until the end of the game

    • You can use this value to determine how many turns you have left for your plays.

  • CardsInDeck - The number of cards not yet drawn from the deck.

  • ResponseDeadline - The epoch time, in milliseconds, that a successful move has to be sent and received by to prevent you from timing out.

    • There is a time limit to how long you have to calculate your move. If you exceed this time limit your game will be terminated and your opponent will be awarded as the winner.

    • It is unlikely that you will need to check this time as timeouts are set generously to allow you time to calculate your move, however, if you see yourself timing out a lot, you may need to limit yourself using this value.

  • GameStatus - A string that will have value "RUNNING" if the game is in progress or a reason the game has ended otherwise.

    • You are unlikely to need the GameStatus for this game type.

  • IsMover - In this turn based game, this will always be true.

  • GameId - An integer representing the unique game id for the current game.

    • You are unlikely to need the GameId for this game type.

  • OpponentId - A string containing the name of your opponent.

    • You are unlikely to need the OpponentId for this game type.

Making a valid move

The whole point of the calculate_move() function is for you to return the move you want to make in the game. In Rummy Vision, there are three types of moves.

  1. You may give your opponent a card in your hand by returning a dictionary with the key "Give" whose value is the ID of the card you want to give.

  2. You may take a card from your opponent's hand by returning a dictionary with the key "Take" whose value is the ID of the card you want to take.

  3. You may lay a set of cards on the board to gain points by returning a dictionary with the key "Lay" whose value is the list of card IDs you want to lay down. This list may can contain 3, 4 or 5 cards of the same type. It may also contain 3 or 4 cards of unique types.

An example of each move:

return {"Give": 2}
return {"Take": 10}
return {"Lay": [5, 6, 14, 1]}

Finding the subject from the landmark Computer Vision API call

Understanding the celebrity card template

There is template code that makes use of the Computer Vision API in order to process cards that have celebrity pictures on them. Here we will understand what the code is doing and see how we can adapt it in order to recognise landmarks. Celebrity cards are represented by a list of IDs and image URLs (the same way cards are described in the gamestate):

CelebrityCardList = [
[100, ""],
[101, ""],
[102, ""],
[103, ""],

Every turn the template code picks a random celebrity card and calls the 'analyse_card' function:

def analyse_card(card):
ident, image = card
result = analysed.get(ident, None)
# If not analysed in the past, use the API to analyse the image
if result == None:
params = {"details": "celebrities,landmarks"} #analyse details about celebrities and landmarks
data = {"url": image}
msapi_response = microsoft_api_call(analyse_url, params, headers_vision, data)
print("JSON Response:\n"+json.dumps(msapi_response, indent=2)+"\n") #log API response
# Search the JSON response to find what celebrity has been recognised
if "categories" not in msapi_response:
return None
for category in msapi_response["categories"]:
if "detail" in category:
if "celebrities" in category["detail"]:
for celebrity in category["detail"]["celebrities"]:
result = celebrity["name"]
# Add the analysed card to the global dictionary for future access
return result

The function first checks if the card has been analysed in a previous move by checking its ID in the global 'analysed' dictionary. If this is not the case the Computer Vision API is called analysing the celebrities and landmarks in the image. The JSON response for one of the celebrity cards looks something like this:

"categories": [
"name": "people_portrait",
"score": 0.9921875,
"detail": {
"celebrities": [
"name": "Tom Hanks",
"confidence": 0.7836723923683167,
"faceRectangle": {
"left": 53,
"top": 89,
"width": 120,
"height": 120
"requestId": "74d3dc76-f374-4aee-aee9-45a12fbc2f61",
"metadata": {
"width": 220,
"height": 306,
"format": "Jpeg"

Such a response is analysed in the code by iterating through the 'categories' list and looking into the 'detail' section, then the 'celebrities' list. As soon as one celebrity is found we return this value (after storing it in 'analysed' for future use)

Adapting the template to recognise landmarks

Instead of putting a celebrity card into the 'analyse_card' function, use one of the landmark cards from the gamestate. Check the log to see what kind of JSON response this would give. Now adapt the JSON search to find the name of the landmark within this function in a similar way to how we found the name of the celebrity in the template code.

Using Custom Vision in your code

(based on template for planets...)

Steps to Improve your Code

When writing bots to play this version of Rummy there are a few things to consider which will Improve the quality of your play.

Laying cards

Clearly, the only way to gain points in this game is by laying down valid sets of cards. At any given point you may have the opportunity to choose which set you want to lay down. Make sure you consider which set will give you the most points and which set will get rid of the most cards.

Good hand

The biggest advantage you can get in this game is by running out of cards first. This will not only give you the advantage of being the last one to score points, but also lead to a penalty to your opponent for every card they have left. Assess which move that you make will leave you with the best hand to keep getting rid of the cards. For example if you only have two cards left, it would be bad if they were of the same type, since the only draw that would give you a valid set is the third card of this type. Try to avoid situations like this.

Look at your opponent

In this version of the game we have the ability to see exactly what the other player's position is. To use this to your advantage, make sure that if you're unable to lay down cards you make the move that not only benefits you the most, but also leaves your opponent in the worst situation.