The Robo Cup Imitative Agent Project

In this page I will describe the Robo Cup imitative agent project, the current challenges of this project, and the problem I am set to study.

The Robo Cup Simulation League

The Robo Cup Simulation League is a platform for testing software agents playing in a soccer team. Each agent is a soccer player client that connects to a soccer server that keeps track of the game, sends messages to the clients and receives actions from them.

The server provides the agents with different kinds of information. There are three types of messages sent by the server: the see, the hear, and the sense body messages. The see messages describe objects such as other players, the ball, field lines and flags. These objects also have properties such as distance and direction.

Agents send back one of several types of actions to the server, such as dash, kick, or turn. These actions also have properties such as velocity and angle of a turn.

The Robo Cup Imitative Agent Project

The purpose of the Robo Cup Imitative Agent Project is to build Robo Cup agents that imitate other soccer playing (target) agents.

The objective is to develop a soccer-playing Robo Cup agent with the ability to draw from observations of other Robo Cup agents and use this knowledge to guide its own decision-making process, all with limited or no domain expert intervention.
Kevin Lam, Babak Esfandiari, David Tudino. A Scene Learning and Recognition Framework for RoboCup Clients. In Modeling Others from Observations (MOO 2006), Boston, 2006. PDF document

Observations are obtained from log files of the communication between an agent and the server. The captured data is transformed into an spatial representation (scenes) that includes information from the see messages together with the corresponding action sent by the agent at discrete times.

The scenes constitute the knowledge of an imitative agent, which is employed in a case-based fashion when playing a soccer game. The agent transforms the current game situation into an scene representation and then compares it to the stored scenes, the agent finds the k-nearest scenes and selects an action among the actions found on the k-nearest scenes.

The scene representation

A scene consists of two parts: a list of the objects (object type, position, velocity and direction) described in the see message, and the action sent back by the agent during a time period after receiving the see message. An entire game is represented by a collection of scenes.

A scene only represents the objects surrounding the agent that are inside its field of vision, as opposed to representing all the objects on the field. Thus, the number and type of objects represented may change from scene to scene.

Scene recognition

Whereas the target agent may be endowed with rules to deal with different numbers of objects, or to focus on certain objects, the imitative agent has no access to those rules and relies on recognizing the current game scene as similar to some of the stored scenes.

During a game the imitative agent calculates the distance between the current scene (for each game scene) and all the stored scenes (at least 3000 stored scenes), keeping a list of the k-nearest stored scenes. Then it selects (or calculates) the action that is going to be sent back to the server from the actions found on the k-nearest scenes.

The stored cases search must be done very quickly, otherwise the imitative agent response will be timed out by the server. Thus, scene recognition, and the distance calculation, must be very efficient.

Distance Calculation

Distance calculation is a (weighted) euclidean distance in the space of scenes, where each scene is a point on a multidimensional space defined by a set of features of the scenes, such as ball presence and distance (each feature may have a different weight on the computation of the euclidean distance.)

Feature Selection

The more features employed to describe the space of scenes, the more expensive is the scene recognition. Thus, it is preferable to keep the set of features to a minimum, and here is where feature selection plays a role. It is desirable to employ a small set of (relevant) features to describe the space of scenes, but not too small that it falls short to discriminate between different scene outputs, i.e. the action performed by the target agent on such cases with the same scene representation.

So far, three approaches to feature selection have been employed. The first is a by-design selection of features. The second is also a by-design selection, but with a discretized distance-to-object feature, rather than the continuous feature. The third approach employed a genetic algorithm to find close to optimal weights of the features employed.

Contribution

The three approaches to feature selection have had relative success, especially when imitating naive agents. My task is to analyze the set of features employed, find the cases where the feature sets fail to discriminate output accurately, and come up with a new set (or sets) of features such that when the scenes are represented in terms of these features, the imitative performance of the agent is improved.

The problem

Since we pursue to build agents that learn to behave like other agents from observations, with no (or very little) human intervention, we must look at the learning process of the agent in order to improve its imitation performance. Case-based reasoning, and k-nearest neighbors are very well established learning techniques to start with. Past experiences and similarity play an important role on how biological agents learn. However, there is more to past experiences than cases. For reasons of efficiency, as well as limited resources, it is also useful to consider pieces of information (features) that were relevant in the past.

Thus, we aim to improve the imitative performance of the agents by looking at the set of features that are employed on describing/representing the scenes. There are a number of problems that must be sorted out, for example, we would like the imitative agent to discover the features that are relevant on imitating another agent (or a group of agents), and not only so, but also to discover the operations that other agents perform on those features, and the knowledge about discrete categories within each feature that other agents possess.

Advantages/ Desiderata

Being able to do so would represent a number of advantages for the agent learning process, which would result on improved imitative behavior.

Having the ability to discover sets/hierarchies of relevant features would allow the imitative agent to:

  • have a very efficient decision making process because it takes into account only those features that are useful, reducing the response time.
  • represent the scenes in a concise way by ignoring those features that are not useful.
  • avoid storing duplicate scenes (those whose improved representation is the same.)
  • allocate more unseen scenes (since the improved representation requires less space, it liberates memory resources for more scenes.)

Having the ability to discover the operations that other agents perform on relevant features would allow our agent to imitate behavior more accurately by being able to reproduce not only the same action as the target agent, but also the action properties, such as direction or speed of dash or kick.

Having the ability to discover the discrete categories within each feature that target agents possess would allow
the imitative agent to further reduce the size of scene representations and to speed up the decision making task.

Current goal

By now, I will set most of those problems aside and, in order to get some insight on how to address them, I will concentrate on manually finding sets of features such that when employed to describe the scenes, the performance of the imitative agent is improved. The current imitative engine is not going to be modified.

In particular, the problem with the sets of features employed so far is that they are not able to discriminate some cases that have the same scene representation but a different output, which makes it impossible for the agent to decide the correct action to perform on such situations.

Thus, we propose to improve the agent's performance by finding a set of features such that there are no two cases with the same scene representation and different target agent's output. In this way, ambiguous scenes are eliminated allowing for more accurate decisions.

Of course, we can only aim to eliminate ambiguity on the training set, meaning that the feature sets obtained can fail to accurately discriminate unobserved cases. Thus, we don't expect to achieve perfect imitation agents, but only to improve their performance.

We expect, however, to be able to distinguish well-defined regions of scene outputs on the multidimensional space defined by the set of features employed. That is, that we can find that every scene output o on the training set either is an isolated point (with no other point on certain vicinity of o), or belongs to a cluster of points, all of them of the same scene output category as o. Such a result would allow us to have more confidence on the accuracy of the imitation agent, particularly since the k-nearest algorithm would likely tend to find scenes from a particular scene output region.

Such a result would also allow us to move on to discretizing the multidimensional space of scenes, which would allow us to employ a decision of the region to which scenes belong to rather than a k-nearest neighbor algorithm. Even more, such a result would allow us to discretize the values of the features employed, which in turn would facilitate the employment of other decision making techniques, such as decision trees.

Methodology

Finding a useful set of features doesn't seem to be an easy task. In fact, it seems to require a lot of systematic experimentation. The very same way the scientific method works, which calls for a plan.

My first goal is to identify both, the set of features employed by the current imitation agents, and the superset of features possibly available (some of which have not been tried). In this way, I expect to learn which are the features that the current agent pays attention to, and to hypothesize which other features may be relevant.

The second stage consists of measuring the accuracy of the agent on imitating agents with different soccer playing skill, probably the same as those already compared on Lam, Esfandiari, and Tudino (2006), and identify both, the cases where the scene output is ambiguous, and the cases that the imitation agent fails to accurately imitate. The accuracy measure will be employed when comparing the performance improvement on a later stage. The ambiguous scenes will become my target scenes, i.e. those which should be discriminated by other feature sets. The cases on which the imitation agent fails are going to be used for a further check, they should either be (close to) an ambiguous scene(s), or be far from the unambiguous scenes (being that the reason why the k-nearest scenes are not useful). If none of these two conditions are satisfied, then it would mean that the case also calls for other features, or there is a problem with the agent's decision making process.

On the third stage, I would apply a rough set approach (basically equivalent to an entropy based approach) to determine whether the current set of features includes dispensable features (features which remotion doesn't create more ambiguous scenes), and then reduce the set of current features as to keep a subset of only indispensable features (these concepts are discussed here.) There may be more than one such a subset, all of them having the property that they cannot discriminate exactly the same training cases as the original set of features. One such subset of indispensable features is going to be used as the initial set of features, in this way we can be sure that the sets of features we try don't create more ambiguous scenes.

On the fourth stage I will start trying different sets of features by adding features, one by one, to the initial set of features found on the previous stage. A feature will be included on the set of features if it allows to discriminate at least one of the ambiguous scenes (i.e. if the cases with the same scene representation and different output have different values on the new feature, allowing us to separate those cases on different scenes.) After including a feature, the set of ambiguous scenes will be updated to include only those scenes that remain ambiguous. This process will continue until there are no more ambiguous scenes, or there are no more features to try out. Hopefully, the former will happen first and we will succeed on finding a suitable set of features with which to represent the scenes.

A last stage is required to reduce the set of features obtained. It may happen that the features included along the way are correlated with other features in the set, and that only one of those correlated features is required, rather than all of them. This is achieved with the same process as in the third stage, by eliminating those features that are dispensable.

As a remark, we will likely perform this process for each of the target agents for which we will try to improve the performance of the imitative agent. Thus, we will maintain a set of features for each target agent, which will allow us to compare them.

Validation

Since we want to improve the performance of the agent, we basically have to measure the imitation accuracy when the original set of features, and the new set of features are employed to represent the scenes. Therefore, we are going to compare the accuracy measures before and after.

To this end I can use the same metrics that have been employed so far. However, Babak is not so happy with it, and I also think that we need a more rigorous validation. Then, I propose the following validation process.

I ignore the exact reasons why the number of observations is reduced to about 3000 to 6000. I guess is a matter of space and/or of speed on the case-based reasoning process. Thus we can keep it like that for now, i.e. we can still use only one game as the training example.

However, rather than evaluating performance by comparing the imitation agent outputs with the target agent outputs on just one another game, I would rather use 2 to 4 games. Ideally I would like to use 2 to 4 games as the training set, and about 5 as the validation set. The main reason is that games are inherently stochastic, thus, one game does not necessarily provide a wide range of possible game cases examples, therefore, the more example games, the better variety of examples. And the same goes for the validation games. It may turn out that a very tiny proportion of the observed cases are actually close to the cases on the validation game. Thus, the more validation games, the better is our accuracy measure, i.e. there are more chances to succeed and fail.

The accuracy measure, in any case, is the rate of actions that the imitation agent got right, compared to the target agent.

Here I have a question: As far as I understood, the imitation agent is validated against the log file of another game. And Babak told me that the behaviors were similar (at least for the Krislet agent), however I wonder, how can this possibly work, i.e. if the imitation agent gets an action wrong and such difference make the validation game very different from the logged game. Now, again, as far as I understood, you don't actually run the validation game on the actual RoboCup server, but on one that simply sends the logged game messages. Then, I guess you see the game on the monitor, then therefore what you see as the "agent" is not really the imitation agent, but rather the logged target agent. And then I wonder if that is the reason why the imitation agent seems to behave very closely as the target agent.

Now, from my experience, it is very hard to rely on subjective evaluations of behavior similarity. So, assuming that Babak comment was not about the monitored logged game, but about comparisons of behavior on actual games, in order to do so well, we would need to actually run the experiment with a good number of participants, say 20 or more, under the same laboratory conditions, which I think is beyond what we want to do.

I think that, in addition to the evaluation method that has been employed so far, we could try an interesting approach which would consists of ranking a good number of agents, and then see what would be the rank of the imitation agent, specially with respect to its target agent. I am assuming, of course, that similar game strategies would yield about the same rank.

Page tags: robo-cup-simulation
page_revision: 23, last_edited: 1195536706|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.