Notes on Characters, Agents, and Narrative Play
Workshop on Artificial Intelligence
and Interactive Entertainment
Q. What is the difference between agents and characters in traditional drama? Are characters simpler? Are agents more robust?
In terms of their traits and knowledge about themselves and the world, the behaviors of agents are of necessity less complex than dramatic characters because their responses must be dynamically generated. This may seem counter-intuitive, but consider. In drama, a playwright may represent very many of a character's past experiences and may assert a rich array of traits by simply revealing details of the character's history as well as through expository action. We know that Hamlet has been at the university through exposition and can therefore infer that he is a well-read, serious sort of fellow. Any action he takes that can be derived from what he might have learned in this context is thus made probable without specific or explicit causal links. For an agent to demonstrate behavior of similar complexity, the particular pieces of knowledge and the traits that produce actions on the basis of that knowledge must be explicitly represented. At present these requirements create a practical limit to complexity on the level of behavior.
Growth and change in a dramatic character also demonstrates this principle. We are able to accept the radical change in King Lear's character, for instance, primarily through the quality of the playwright's assertions. Shakespeare did not have to find organic ways to change Lear's traits, he simply had to represent the change convincingly. An autonomous agent could not be produced today that would be able to generate changes of equivalent complexity or magnitude. Probability is supplied in roughly equal parts by the playwright, the actor, and the audience, each of whom is making inferences, constructing explanation, and fleshing out the character with mental activity that is distinct from the character.
The complement of this observation is that agents must be more robust than dramatic characters for the same reason - because their actions must be dynamically generated. The program that constitutes an agent assumes the role of improvisational actor. But an improvisational actor can make assertions about his character in the same way that a playwright can, coasting along on the audience's natural tendency to construct explanations and fill in the gaps. Since the actor has some degree of control over what happens in the improvisation, he can naturally limit the conditions under which his assertions will be put to the tests of consistency and probability. Quite simply, an actor can steer clear of train wrecks. His lightning-fast judgments are based on an evaluation of alternatives that is exceedingly complex and which requires knowledge of dramatic form, expert knowledge of natural language and discourse, and knowledge of the subtle techniques of manipulating an audience's acceptance of assertions through the orchestration of their attention and emotions - variables like pacing, selective emphasis, and emotional coloring. Teaching a human being to perform such feats takes years and is successful only in the rare cases of extremely talented actors. How such things are taught and learned is still very much a dark art; modeling the process computationally is an impossibility.
An agent cannot have the rich intuitive skills that are necessary to employ assertion as a way of establishing character. All actions must proceed from an explicit set of traits. When human interaction is part of the system, those traits must be represented robustly enough to produce appropriate behavior in an unforeseeable variety of conditions. Neither scripted nor improvised dramatic characters are required to meet such stringent criteria for robustness.
Q. Should we be building agents or characters in interactive worlds?
Traditional dramatic characters exist in closed systems; characters in interactive worlds do not. Dramatic characters are representations; agents are engines of representation. The answer is that we must build agents that can create characters. And the characters they create will be more robust but less complex than the characters in plays.
Q. What about the difference between actors and agents? Actors know they are acting; agents just live. Should we be building actors?
An agent that can generate a character is a kind of actor - a simulated improvisational actor. Such agents can draw upon a limited set of dramatic techniques such as the relatively simple playwriting heuristics that I outlined in my dissertation. Acting expertise is much more difficult to model and depends increasingly upon humanlike facility with natural language and conversation. As Dr. Susan Brennan has observed in her work on the application of a theory of conversation to human-computer interaction, the principal activity is the construction of common ground, which relies heavily on back-channels and repair activities. This kind of knowledge is essential for a computer-based character.
A solution that is likely to be preferable in the
foreseeable future is to limit computer-based characters to
relatively minor roles in an interactive drama. Like ELIZA,
they can ask questions to elicit choices and actions from human
interactors. To the extent that a human dominates the action,
the human's imaginative 'filling in' will be evoked.
In his book The Actor's Freedom, Michael Goldman asserts that "the characters of drama are actors." I think that what Goldman is pointing to is the fact that a character as it exists on paper is only a recipe to be used by a living actor on the stage. The essence of drama is enactment of characters by intelligent human beings. This observation is relevant to us in two ways. The first is that a human is always the leading character in an interactive drama. His is the only mind engaged in dynamic interpretation and construction of the dramatic action. Regardless of what is going on, he is at the wellspring of the dramatic event - the center of the world. Therefore our emphasis should be on giving him fertile material to work, with his pleasure in the act as our primary goal.
Goldman also says, "Dramatic character always recapitulates
and rides on, draws energy from, the essential strangeness and
fascination of an actor for those who gather to watch him
perform." Acting, even in our relentlessly secular world, has a
strong spiritual component. Something is going on that
transcends everyday life and the actual world. It has to do
with opening one's mind and heart by witnessing the enactment
of possibility, and with the deliberate assumption of
otherness. It is the most powerful form of the universal ritual
of masquerade; it also derives from an ancient belief that
masking is a kind of unmasking that creates a conduit for new
kinds of illumination - an aspect of human activity that is
masterfully captured in Batman Returns. This truth,
along with many others of which I am sure you are aware,
suggests that the most powerful interactive dramatic
experiences will involve more than one human. A computer-based
agent cannot witness. Even if they could emulate human
responsiveness in every other way, they cannot replicate the
effect of being in the living presence of another
Q: In order to maximize the likelihood of the user having a dramatic experience, is an explicit drama controller needed, or are characters enough? If the latter, what kind of characters?
One of the nice things about writing a book is that you can get past it. This is the purgative theory of publication.
In my earlier work I have relied heavily on a traditional structuralist model of dramatic experience. Although I basically loathe the discourse of post-structuralism and post-modernism - in fact, anything that calls itself post- - I find that there is, as usual, some truth in the underlying observations. The meaning of a story or a play is indeed constructed in the mind of the beholder. The particular characteristics of that construction are, however, strongly influenced by the artifact itself - the story or the play that has been experienced. This is interpretation. Interpretation is not a derivative, parasitic, or second-rate activity. Rather it legitimately refers to the time-displaced collaboration that occurs between authors and readers or playwrights, actors, and audiences in the construction of meaning.
A drama is a representation of an action that is organic and complete in some way. The sense of completion is also the product of interpretation. Aristotle's principles about how a sense of completion could be achieved in drama represent one reliable and proven approach to predisposing an audience to achieve emotional pleasure or catharsis. There are others. As I said in my dissertation and my book, we should begin with the most tractable and continue to explore alternatives.
I think that the answer to Joe's question incorporates some of the things I said this morning. My experience with improvisation, participatory theatre, psychodrama, and children's play lead me to believe that people's requirements for formal and structural elegance in a dramatic experience are significantly relaxed when they are interactors as opposed to audience members. Generating action is also generating structure. When kids play cowboys and Indians they naturally perform this function and seem to have little difficulty with it. In fact, the distinctions between authoring, enacting, and observing dissolve completely. What replaces these traditional functions is a process I call assertion and witnessing: one child makes some assertions about the situation and characters in narrative language, and the others indicate acceptance by incorporating those assertions into their enactment. This active acceptance constitutes witnessing, and it is the way that common ground gets constructed in the context of play. Children's play is a fluid dance between assertion-and-witnessing and dramatic enactment. They change person and verb tense without hesitation, and the activity is perceived as modeless. The expertise for this kind of dramatic interaction logically resides in agent/characters and other human interactors.
Of course, in addition to these fundamentally conversational skills, computer-based characters need other attributes. A principal component of dramatic action is conflict, and I still know of no better way to achieve it than through the goal-and-plan paradigm.
There is no question in my mind that both system architects and content providers must understand dramatic form and structure in order to create worlds with dramatic potential. But it's unclear to me at this point how much additional top-down control is necessary. We don't have to capture the expertise of artists; we have to be artists! I think that an effective strategy would be to concentrate on embedding potential in environments and characters and to see just how successful human interactors are at achieving satisfying experiences through their own natural structural contributions. This is where I have most seriously revised my thinking about the research agenda.
Q: In order to provide the user the feeling of real power in the simulated world, is it sufficient to always maintain the appearance of freedom or is true freedom needed? That is, can we manipulate the user so they never do anything too unusual, or will any such manipulation necessarily feel manipulative?
In some of my recent work, I have been looking at the issue of control. There is a train-wreck where the traditional idea of authoring meets the requirement of interactivity. The explanation lies in the notion of control over form. The authors of stories, books, plays, or movies construct form through the selection and arrangement of their materials; authors are in control of the anatomy and metabolism of their plots. In interactive media, this sort of exclusive control is theoretically impossible; to the extent that participants have the power to make significant differences in what goes on, the author's control is eroded. Ultimately to provide robust interactivity while preserving formal control, one would have to manage the plot 'live' like a puppeteer, or build an exceedingly complex expert system. I think that, while both approaches are feasible in limited ways, we all sense a fundamental inelegance in these solutions. I believe that the solution lies in a radical reworking of our notion of authoring - one that replaces control with the notion of time-displaced collaboration, as I mentioned earlier.
Keywords for this discussion that arise from the work I've been doing that I don't have time to talk about are:
My colleague Rachel Strickland and I have been trying to come at the problem another way. Instead of asking how to author narratives that people can interact with, we've been looking at how kids construct narratives, if you will, in their story play. What predisposes kids at play to have an interesting time, to be inventive, to generate 'plots' that are emotionally satisfying? What kinds of environments and materials are most generative? Do highly specified toys like Barbie Dolls and GI Joes help or hinder?
During 1990-91, we did extensive field work in a kindergarten classroom and watched how a master teacher facilitated narrative play. We used Native American story materials, imagery, and objects in an attempt to avoid evoking the traditional narrative structures that are embedded in Western-style fairy tales. We think we've rediscovered some promising ideas.
One is the notion of ambiguity. Rudolf Steiner, in his writings about the Waldorf approach to education, said that if a child has a doll made of a folded napkin "he has to fill it from his own imagination with all that is needed to make it real and human. This work of the imagination moulds and builds the forms of the brain." Visual ambiguity also requires a certain kind of complexity to work - people see faces in rocks and clouds but rarely in polygon-based displays - a warning to the devotees of chrome-ball photorealism. In the classroom, we watched kids make silk scarves into wings and weather and wrinkly walnuts into the faces of fairies and kings.
Something else that evokes intensely imaginative responses among kids are things they can manipulate - both natural and symbolic objects. The theory was articulated by another important educator, Maria Montessori:
A child is delighted to make and unmake something, to place and replace things many times over and continue the process for a long time. A very beautiful toy, an attractive picture, a wonderful story, can, without doubt, rouse a child's interest, but if he may simply look at, or listen to, or touch an object, but dares not move it, his interest will be superficial and will pass from object to object.
- Maria Montessori, The Discovery of the Child, 1948.
Our experiences in the classroom confirmed the educators' wisdom. We are working on the design of a virtual world that incorporates various kinds of manipulatives, from animistic objects that have minds of their own, to 'stuff' - things that other things can be molded or shaped out of.
As part of the Coyote project, we've also been investigating the kinds of relationships that people have with environments as a source of design insight. We've been inspired by the work of environmentalist/writer Barry Lopez, who observed in an essay entitled "Children in the Woods" that a natural landscape - even one that is as seemingly sparse and uniform as a desert or a grassy plain - contains literally too much detail to be taken in as 'content.' Lopez asserts that when we engage with landscapes, what we have perceived is patterns, and what we infer is relationships. This suggests to us that a classical object-oriented approach to creating virtual landscapes may be inadequate for the task of creating relationships among objects and between objects and their environments, for specifying the dynamics from which patterns can emerge.
An approach that begins to address this issue and also achieves a satisfying kind of ambiguity is to work with camera-originated imagery. In the next phase of the Coyote project at Banff next summer, we are working with Michael Naimark to capture information about distance and size of features at the same time we capture their visual characteristics. Locales will be constructed like concentric cycloramas, with video-based imagery texture-mapped onto two- and three-dimensional models. In the near field, environment becomes much more like character. Near-field features will have the same sensory characteristics but will become increasingly intelligent and manipulable. Critters like Coyote will be represented simplistically (think of Native American petroglyphs) and will have limited interaction with humans, functioning primarily as guides, question-askers, and shit-disturbers. We will forbid finger-flying and limit gross motion to walking and will utilize non-contiguous spaces and magical transformations (many drawn from Native American conventions) for moving from here to there.
It's going to be an interesting summer. Assuming that it all comes off, I look forward to reporting our progress to this group next year.