Markup for RPG textboxes

Conversation is a large part of most RPGs. Information about the game world is communicated through conversation; the progress of quests, learning about the worlds background and stories of all sorts. Most modern games use voice acting instead of text but voice acting is expensive. I want to make in-game conversation as expressive as possible by just using text. To make authoring text easy I want good tools. One of those tools is a special language just for writing conversation called Discourse.

How to Make an RPG : Conversation

The book How to Make an RPG teaches the reader to make three mini-RPG games that are similar to the Final Fantasy games on the Super Nintendo. Each of these mini-RPGs has conversation with NPCs.

Here's some of the conversation from the first game:

The in-game menu without a magic option.

The conversations in the How to Make an RPG games are a good base to build upon. The conversations in the mini-rpgs don't have:

  • Typed text
  • Different type speed and pausing
  • Text coloring
  • Embedded icons
  • Special effects such as shaking or blinking words

Using these features can give text more emotional impact. To support these features I'm giving the dialog system a bit of an overhaul. I'm starting by considering how a writer might want author a game conversation in an ideal world.

Writing a conversation

Conversations in the current game code look like this:

local intro =
        {"handin", { def = "thief", id="thief", x = 4, y = 10}},
        {"handin", { def = "mage", id="mage", x = 6, y = 11}},
    SOP.MoveNPC("major", "handin",
    SOP.Say("handin", "major", "So, in conclusion...", 3.05, sayDef),
    SOP.Say("handin", "major", "Head north to the mine.", 4.3,sayDef),
    SOP.Say("handin", "major", "Find the skull ruby.", 4.3,sayDef),
    SOP.Say("handin", "major", "Bring it back here to me.", 4.25, sayDef),
    SOP.Say("handin", "major",
        "Then I'll give you the second half of your fee.", 5.25, sayDef),
    SOP.Say("handin", "major", "Do we have an agreement?", 4.0, sayDef),
    SOP.Say("handin", "hero", "Yes.", 1.5, sayDef),
    SOP.Say("handin", "major", "Good.", 1.5, sayDef),
    SOP.Say("handin", "major",
        "Here's the first half of the fee...", 4.0, sayDef),
    SOP.Say("handin", "major", "Now get going.", 2.5, sayDef),
    -- Party members can walk into the hero and
    -- return control to the player.
        SOP.MoveNPC("thief", "handin",
    SOP.FadeOutChar("handin", "thief"),
    SOP.RunAction("RemoveNPC", {"handin", "thief"},
        SOP.MoveNPC("mage", "handin",
    SOP.FadeOutChar("handin", "mage"),
    SOP.RunAction("RemoveNPC", {"handin", "mage"},
           = true
                    -- give the reward amount!
                    gGame.World.mGold = gGame.World.mGold + 500

This is straight up Lua code, tricky to write and not particularly easy to test.

Instead of Lua code, I want to write the conversations in Discourse, a simple, easy to write language. If I rewrite the first cutscene in this language it would look something like this:

So in conclusion...

Head north to the mine.

Find the skull ruby.

Bring it back here to me.

Then I'll give you the second half of your fee.

Do we have an agreement?

Hero: Yes



Here's the first half of the fee. <script>GiveGold(500)</script>

Now get going.

This conversation is much shorter and easier to read. The scripts aren't totally equivalent; some of the setup is assumed to occur outside of this script. Moving some of setup outside of the script represents a better seperation of concerns and I think makes the code better. Conversations shouldn't be responsible for fading out the screen, changing the world that should be the reponsbility of a cutscene, that makes use of conversations.

Discourse Definitions

Discourse itself is an unholy mix of markdown and html. Easy to read, easy to write; a little tricky to parse.

Speech Unit

A conversation is broken up into a list of speech units. Speech units are made of a speaker, one or more pages of text and some tags. The start of new speech units is denoted with a speaker label.


All speech needs a speaker. You want an anonymous voice? That's fine but in Discourse that voice needs labelling as a speaker. A speaker is usually a word followed by a colon.

Voice: I'm a mysterious voice.

The speaker is used to tag the text, it can be used by the game code in anyway you can imagine.


Long speech can be broken into a series of pages of text. Pages are small unit of speech that are easy for the player to read. A new page is started when there's a speaker label or an empty line between sentences.

This a page.

This is another page.

An empty line means these two lines of text are two seperate pages.

Pete: This a page.
Pete: This is another page.

Here there are two pages because the speaker label has been used twice.

This is a page.
This is the same page.
This is also the same page.

This text has no empty lines so it's considered all a single page.


Pages can include tags. The parser doesn't define which tags. You can add whichever tags make sense for your game. When you start the parser you can tell it which tags you expect to find and what type they are. The are three types are short tags, wide tags and cut tags.

  • Short tags : Denote some one-off action in the speech. Pause is good example. This tag would stop the text typing in for a short amount of time, this helps the speaker is considering their words or thinking. Short tags only have an opening tag.
  • Wide tags : Wide tags have an opening and closing tag and usually markup some section of the text. For instance you might have keyword tag that highlights the text in yellow.
  • Cut tags : Cut tags are like wide tags but all the text they enclose is stripped from the conversation. Script is cut tag type, everything in the script tags is removed from the conversation.

Here's an example conversation script using these tags.

Hello, <pause>
Here's the <keyword>Dungeon Key</keyword>.

The output from this script might look like this:

The in-game menu without a magic option.

Tags can run over multiple pages which probably contributes to how tricky the parser was to write!

As the conversation script is parsed all tags are stripped out of the text and they're locations are recorded.

From Language to Game Data

The parser is the program that takes our nice Discourse scripts and outputs a datastructure that's easy to use in code.

The parsers output looks like this snippet:

        ["speaker"] = "Speaker",
        ["text"] = { "Hello,\nHere's the Dungeon Key." },
        ["tags"] =
            { page = 1, offset = 6, id = "pause", op = "open" },
            { page = 1, offset = 18, id = "keyword", op = "open" },
            { page = 1, offset = 28, id = "keyword", op = "close" },
            { page = 1, offset = 30, id = "script", op = "open",
                data = 'GiveItemById("dungeon_key")' },

This code is pretty readable but would not be fun to write! This data structure is used to run conversations in the game.

Parser Code

I have an implementation of the parser. It has a nice test suite but the code quality is not up to standard, it's turns out mixing significant whitespace (and newlines) with tags is tricky. Check it out here. It requires Lua and may need a little fiddling with if your dev environment differs to mine.

Feature wise I don't think I'll be extending the language anymore but that's not a iron clad promise.

The next step is refactoring the textbox code and then upgrading the game code to run Discourse scripts.