Microsoft Rummy Vision

Match the tiles in our version of the classic Rummy game. You will have to identify what the content of each tile’s image is.

Introduction

Rummy Vision is our version of the classic card game, Rummy. In our version the cards have pictures on them. To be able to play the game, you will have to be able to analyse the image to identify what it is an image of. You can do this using the A.I. power of Azure Cognitive Services.

Objective

The objective of the game is to score more points than your opponent. However the game ends, the person with the most points at the end of the game is the winner.

Rules

In this turn based game, each card in the deck may have one of four landmarks on it. The whole deck contains 40 cards: 10 cards of each landmark (the landmarks change every game). At the start of the game, each player is dealt 10 cards and the starting player is chosen randomly.

On each turn the player will have one new card in their hand from the deck until all cards in the deck have been used.

The player can look at their own cards and their opponents cards. The player makes a move by doing one of the following:

  1. Lay down a set of cards. This may be 3, 4 or 5 cards of the same landmark. It may also be 3 or 4 cards of all different landmarks. This will score the player points as shown in the table below.

  2. Take a card from the opponent's hand.

  3. Give a card from your hand to the opponent.

Note: When Laying cards, if a player lays an invalid set of cards, the cards are returned to their hand and the play continues with the opponent's turn. In effect the player has wasted a turn.

The game ends when one of the players runs out of cards, or when 20 turns are taken and the deck runs out.

The players lose 3 points for every card they still have in their hand by the end of the game and the player with the highest score wins. If the players score the same number of points then the player who started second wins.

These are the valid hands that you can play with the points they earn:

Action

Reward

Take a card from your opponent's hand

0 points

Give a card to your opponent

0 points

Set of three different landmarks

5 points

Set of four different landmarks

10 points

Set of three of the same landmark

20 points

Set of four of the same landmark

40 points

Set of five of the same landmark

80 points

Penalty for cards held at the end of the game

-3 points per card

Identifying the card images and Azure Cognitive Services

In order to analyse an image and identify what the subject of the image is, you will need to use Microsoft's Azure Cognitive Services to identify the image on each card. You can use Microsoft's "Analyse" API function to do this. The data returned from the "Analyse" API function will contain all the necessary information to identify the landmark that is on the card.

Your hand might contain two cards showing The Taj Mahal, but the pictures will not be the same. There will be 10 Taj Mahal cards in the deck, but all 10 images of the Taj Mahal will be different. This means that we cannot use simple image processing techniques like comparing images pixel by pixel. We have to use an A.I. service like Microsoft's Azure Cognitive Services to understand what each one is showing.

Signing up for Azure Cognitive Services

If you are a student, you can sign up for a student account on Azure that gives you $100 of credit. You can then create a Computer Vision API Key that will let you make 20 API calls per second.

Alternatively, you can get a free trial account on Microsoft's Cognitive API that lets you make 20 API calls every minute to a maximum of 5,000 calls every month.

You can follow our guide to signing up for a Microsoft free trial at this link.

Once you have a Microsoft Account, you will need to create resources within it and obtain API keys for one of Microsoft's Cognitive Services, Computer Vision, to play this game.

Template code

When you are in the Editor window of the AI Gaming site and you first select the Rummy Vision Game Type, the editor will load some template code that plays a very simple game by submitting random moves. This version of the code does not require any API keys and you can use it as a very simple example of how the game works.

There are other versions of the template code that help guide you to make use of Azure Cognitive Services. You can access all templates in the Online Code Editor under the 'New' button.

The calculate_move() function

The calculate_move() function is the equivalent of your main function. This is where you need to make your changes and from where you will control the game.

  • Called each time you need to submit a move to the game.

  • Receives information about the current state of the game in a Python dictionary.

  • Must return a game move.

Understanding the game state

Your calculate_move() function will be passed the current state of the game in a Python dictionary, the 'gamestate'.

The gamestate is where all of the game information is held. The following shows an example of the gamestate information that you will receive for each move of the game and some examples of how to access data within it.

For Rummy Vision, the most important fields are MyHand, and OppHand.

Fields that you are unlikely to need in this game (but can still use if you want to) are ResponseDeadline, GameStatus, IsMover, GameId and OpponentId.

An example of the gamestate JSON

{
'MyHand': [
[0, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/75bf1977-7eb7-4068-9d03-e1fa75097f62.jpg'],
[5, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/e943c52e-2756-46c8-8c8c-06f6a3b73d2d.jpg'],
[2, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/8dca83fd-afd8-4b5d-8b95-913a77c7f13f.jpg'],
[8, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/6f9c27a9-7ac9-421c-90c1-b3fd63177e2d.jpg'],
[9, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/9741bb43-48d3-417c-bbc0-455e01b61344.jpg'],
[26, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/ecb3b0f2-6ede-4566-b7da-8954f361f05c.jpg']
],
'OppHand': [
[15, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/78e77507-e6b8-4bf7-94d3-6622ddfc8cb0.jpg'],
[16, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/1f509ec4-f1af-4f85-b5fe-f6da907f53b5.jpg'],
[19, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/4514c342-938a-462c-95b8-6f21c3f15f8c.jpg'],
[22, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/ddb932ac-6414-499d-aede-cecf96219cfa.jpg'],
[11, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/778f4e50-4ccd-43ba-a880-b9d25f723eb4.jpg'],
[27, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/2f1b110c-94f5-457f-8cfe-fa55afa5625f.jpg'],
[28, 'https://matchgameimages.blob.core.windows.net/rummy-vision/b637423e-f067-4dbf-b073-3ccbb5aa7673/04fae2b7-100e-444e-8c21-0e11487aae0e.jpg']
],
'MyPoints': 50,
'OppPoints': 35,
'MyScore': 0,
'OppScore': 0,
'ScoreLimit': 3,
'RemainingMoves': 18,
'CardsInDeck': 18,
'ResponseDeadline': 1593100352476,
'GameStatus': 'RUNNING'
'IsMover': True,
'GameId': 2398560,
'InvalidMove': None,
'OpponentId': 'housebot-practise'
}

Example to access the game state fields

Examples of accessing data in the gamestate would be:

gamestate["MyHand"][0]
gamestate["OppScore"]

The gamestate fields explained

The following list gives a description of what each element in the gameState represents:

  • MyHand - A list representing the cards you hold in your hand. Each card is represented by a two element list consisting of its ID and image URL.

    • To determine how many cards you are holding you could use len(gamestate["MyHand"]).

    • An example of accessing the ID of your first card is gamestate["MyHand"][0][0].

    • An example of accessing the image URL of your first card is gamestate["MyHand"][0][1].

  • OppHand - A list representing the cards in your opponent's hand. Each card is represented by a two element list consisting of its ID and image URL.

    • To determine how many cards your opponent is holding you could len(gamestate["OppHand"]).

    • An example of determining the first card of your opponent is gamestate["OppHand"][0].

  • MyPoints - Your current points score for this game.

  • OppPoints - Your opponent's current points score for this game.

  • MyScore - Your current count of games won in a best of X games game style.

  • OppScore - Your opponent's count of games won in a best of X games game style.

  • ScoreLimit - The number of games you have to win in a best of X games game style, for example, you need to win 3 games in a best of 5 game style.

  • RemainingMoves - The number of moves left until the end of the game, if you or your opponent does not discard all cards.

  • CardsInDeck - The number of cards not yet drawn from the deck.

  • ResponseDeadline - The epoch time, in milliseconds, that a successful move has to be sent and received by to prevent you from timing out.

    • There is a time limit to how long you have to calculate your move. If you exceed this time limit your game will be terminated and your opponent will be awarded as the winner.

    • It is unlikely that you will need to check this time as timeouts are set generously to allow you time to calculate your move, however, if you see yourself timing out a lot, you may need to limit yourself using this value.

  • GameStatus - A string that will have value "RUNNING" if the game is in progress or a reason the game has ended otherwise.

    • You are unlikely to need the GameStatus for this game type.

  • IsMover - In this turn based game, this will always be true.

  • GameId - An integer representing the unique game id for the current game.

    • You are unlikely to need the GameId for this game type.

  • OpponentId - A string containing the name of your opponent.

    • You are unlikely to need the OpponentId for this game type.

Making a valid move

The whole point of the calculate_move() function is for you to return the move you want to make in the game. In Rummy Vision, there are three types of moves.

  1. Give: You may give your opponent a card in your hand by returning a dictionary with the key "Give" whose value is the ID of the card you want to give.

  2. Take: You may take a card from your opponent's hand by returning a dictionary with the key "Take" whose value is the ID of the card you want to take.

  3. Lay: You may lay a set of cards on the board to gain points by returning a dictionary with the key "Lay" whose value is the list of card IDs you want to lay down. This list may contain 3, 4 or 5 cards of the same type. It may also contain 3 or 4 cards of unique types.

An example of each move:

return {"Give": 2}
return {"Take": 10}
return {"Lay": [5, 6, 14, 1]}

Finding the landmark name from the Computer Vision API call

The first challenge for you to solve in this game is to have the Microsoft Computer Vision API analyse the card image and tell you which landmark is in the image.

We're not going to tell you how to do this, but, we have provided some example code that uses the Microsoft Computer Vision API to identify celebrities in images. This is very similar to identifying landmarks and it is a good example that you should be able to adapt to identify landmarks.

Using the example code in the AI Gaming Python Editor

To see our example python code, select the Python Editor game type in the AI Gaming Editor window. This Python Editor lets you write and execute simple Python code without having to have a calculate_move() function or being called by the Game Manager.

Then create a new file using the Computer Vision API Call Example.py template

This template contains example Python code that

  • Makes a call to the Microsoft Computer Vision API

  • Extracts the celebrity name from the Microsoft Computer Vision API response

Let's look at this code to understand how it does it and how we can use it to figure out how to extract Landmark names from images with Landmarks in them.

Understanding the 'Computer Vision API Call Example.py' script

Add your Computer Vision API key

First of all, you must add your Computer Vision API key to the code in order for the code to be able to access the Computer Vision API.

If you haven't already, learn how to get a Computer Vision API key at this link.

Example celebrity images

The code starts with a list of example celebrity images for you to practise with.

celebrity_list = [
{'name': 'beyonce', 'url': 'https://cdn1.thr.com/sites/default/files/2017/03/beyonce.jpg'},
{'name': 'idris elba', 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Idris_Elba-5272.jpg/220px-Idris_Elba-5272.jpg'},
{'name': 'halle berry', 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Halle_Berry_by_Gage_Skidmore_2.jpg/1024px-Halle_Berry_by_Gage_Skidmore_2.jpg'},
{'name': 'tom hanks', 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Tom_Hanks_TIFF_2019.jpg/220px-Tom_Hanks_TIFF_2019.jpg'},
{'name': 'julia roberts', 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/54/Julia_Roberts_%2843838880775%29.jpg/220px-Julia_Roberts_%2843838880775%29.jpg'},
{'name': 'halle berry', 'url': 'https://upload.wikimedia.org/wikipedia/commons/4/4b/Halle_Berry_-_USS_Kearsarge_a.jpg'},
{'name': 'group people', 'url': 'https://dujye7n3e5wjl.cloudfront.net/photographs/1080-tall/time-100-influential-photos-ellen-degeneres-oscars-selfie-100.jpg'},
]

It then loops through these images, sending each image to the Microsoft Cognitive Services API.

The important code here is the setting of the parameters for the API call

  • analyse_url - the url end point of the Microsoft Computer Vision API analyse function. Once set this does not need to change.

  • headers - The API key that authorises our access to the API

  • params - The features that we want the Analyse function to look for in the image

  • data - the location of the image that we want the Analyse function to analyse

and then, finally, the call to the API using requests.post and all of the parameters we have just set.

for image in celebrity_list:
image_count += 1
print(f'{image_count}. {image["name"].upper()}')
# set all of the parameters required to call the API
analyse_url = "https://westeurope.api.cognitive.microsoft.com/vision/v2.0/analyze"
headers = {'Ocp-Apim-Subscription-Key': API_KEY}
params = {'visualFeatures': 'categories,tags,description,faces,imageType,color,adult',
'details': 'celebrities,landmarks'} # turns on checking for both celebrities and landmarks
data = {"Url": image['url']}
# make the API call
response = requests.post(analyse_url, params=params, headers=headers, json=data).json()
print(f' API Response: {response}')
# Extract the celebrity details from the API response, making sure to check that the
# response contains the required sections before trying to read them, otherwise
# the code will fail with a runtime error.
print(f' Prediction results for image of {image["name"]} are:')
for category in response["categories"]:
if "detail" in category:
if "celebrities" in category["detail"]:
for celebrity in category["detail"]["celebrities"]:
print(f" {category['name']:>17} | {celebrity['name']:<16} | {celebrity['confidence']*100:.2f}%")
else:
print(f" No celebrity details in {category}")

Once we have made the call the API, we can then check the response to extract the information about the image. In this example we have requested all of the features of the Analyse API call so you can have a look at the response to see all of the information that it can provide.

We want to look for the Celebrity name so our code checks to see if the categories section of the response has a detail section and if that detail section has a celebrities section. There can be multiple celebrities reported for each image and there can be multiple categories so we use loops to check them all

The JSON response for one of the celebrity images looks something like this:

{
"categories": [
{
"name": "people_portrait",
"score": 0.9921875,
"detail": {
"celebrities": [
{
"name": "Tom Hanks",
"confidence": 0.7836723923683167,
"faceRectangle": {
"left": 53,
"top": 89,
"width": 120,
"height": 120
}
}
]
}
}
],
"requestId": "74d3dc76-f374-4aee-aee9-45a12fbc2f61",
"metadata": {
"width": 220,
"height": 306,
"format": "Jpeg"
}
}

Adapting the template to recognise landmarks

We've included a set of test landmark images for you to experiment with.

landmark_list = [
{'name': 'Roman Baths', 'url': 'https://matchgameimages.blob.core.windows.net/match-game/00007d55-a6a2-4968-869d-dfd6ff05b503/197ccc9b-1844-48f3-b3da-565acc1bf774.jpg'},
{'name': 'Puerto Princesa Subterranean River National Park', 'url': 'https://matchgameimages.blob.core.windows.net/match-game/00007d55-a6a2-4968-869d-dfd6ff05b503/1cf1e69e-f045-4dc2-bf1c-dbd32fd0fdc5.jpg'},
{'name': 'Millennium Park', 'url': 'https://matchgameimages.blob.core.windows.net/match-game/00007d55-a6a2-4968-869d-dfd6ff05b503/39c2c53b-bbfc-4840-ab4b-ee0e44aa711f.jpg'},
{'name': 'Worcester Cathedral', 'url': 'https://matchgameimages.blob.core.windows.net/match-game/00007d55-a6a2-4968-869d-dfd6ff05b503/5585579c-c9f5-4e63-84f5-baec77c74c7c.jpg'},
{'name': 'Rizal Park', 'url': 'https://matchgameimages.blob.core.windows.net/match-game/00007d55-a6a2-4968-869d-dfd6ff05b503/5f609007-a80d-4e98-9c43-956ce798e0c4.jpg'},
{'name': 'Eiffel Tower', 'url': 'https://kids.nationalgeographic.com/explore/monuments/eiffel-tower/_jcr_content/content/textimage_6.img.jpg/1581608715365.jpg/'},
]

You can copy the for loop that analyses the celebrity images and adapt it to analyse these landmark images.

The API parameters don't need to be changed as we already requested the landmark details. You will need to figure out how to get to the name of the landmark by checking to see if the categories section has a details section that contains landmark information.

The JSON response for one of the landmark images looks something like this

{
'categories': [{
'name': 'building_',
'score': 0.19140625,
'detail': {
'landmarks': [{
'name': 'Worcester Cathedral',
'confidence': 0.9910168647766113
}]
}
}, {
'name': 'building_street',
'score': 0.328125,
'detail': {
'landmarks': [{
'name': 'Worcester Cathedral',
'confidence': 0.9910168647766113
}]
}
}, {
'name': 'outdoor_',
'score': 0.00390625,
'detail': {
'landmarks': [{
'name': 'Worcester Cathedral',
'confidence': 0.9910168647766113
}]
}
}]
}

The code you create here to get the landmark names using the Microsoft Computer Vision API, will be the same code you need to add to the Rummy Vision template code.

Using the Microsoft API template

Now that you know how to call the Microsoft Computer Vision API and get the landmark names, you can return to the Rummy Vision game in the AI Gaming Editor and load the Microsoft API Template.

This template contains a complete framework for playing the Rummy Vision game and remembering the cards you have seen. For it to work you need to solve the same two problems that we have just solved in the Python Editor example above

  1. Sending the image to the Microsoft API Analyse function and getting the response.

  2. Extracting the landmark name from the response

You can find where you need to add these features by searching the template code file for the TODO comments.

Steps to Improve your Code

Once you have the basics in place and you can identify the landmarks in the images, it's time to add some strategy to your code and this is where you have free rein to come up with the best strategy you can to score the most points in the game

Some suggestions for you to consider including in your strategy:

  • You can see your opponent's cards. Make sure to analyse these as well as your own cards.

  • The highest score comes from a set of 5 of the same card. Try to wait until you have 5 matching cards for the highest individual score in the game.

  • Similarly, check your opponent's hand, do they have 4 or 5 of the same card? Take one from them to prevent them from making a high scoring move

  • Remember, the highest score wins, not the person that disposes of all of their cards first. Check to see if your score is higher than your opponents before you lay a hand that disposes of all of your cards (remember to take into account the penalty points your opponent will incur for every card left in their hand)

  • Don't want to make a move? Making an invalid Lay move will effectively forfeit your move without you having to lose any of the cards you have.