If you don't know what a Gameduino is, start here.


Comments or suggestions can be emailed to: gameduino@artlum.com
Get updates to this page on twitter @Fungus_B
Update, July 31, 2011: New page - "Using an NES controller with a Gameduino",
an update to joystick library and a modified version of Gameduino asteroids.

Update, June 27, 2011: New section and code update - "SPI performance - a case study"

Update, June 19, 2011: New section and code update - "Digital sound playback"

Update, June 12, 2011: New section - "Raster effects without Forth"

Update, June 10, 2011: Due to popular demand I've just ordered an NES controller from ebay and I'll be adding a "howto" page and full joystick_lib support when it arrives.

Universal joystick reader:

There's different types of joystick in the world and on an Arduino there's plenty of different pins to connect the wires to. To ensure that every Gameduino game works with everybody's joystick with the minimum of fuss we need a joystick library which sorts out the physical differences.

The library needs to:

With this in mind, I propose the following solution: joystick_lib.zip


Unzip the 'joystick_lib' folder and copy it into your Arduino projects folder.

If everything is correct you'll have a new entry called 'joystick_lib' in your Arduino sketches menu:

The sample sketch will display the current state of your joystick on screen. If you have a joystick and it's configured correctly you should see the movements of the stick and the button changes when you press them.

If you're not seeing this then maybe you need to...

Configure your joystick:

By default the library is configured for the Sparkfun joystick shield. If you have a different joystick you need to edit "joystick.cpp" and define the layout of your joystick.

See comments in "joystick.cpp" for more info on how to do this.

Using the library:

To use the library in your own sketches, copy the files "joystick.cpp" and "joystick.h" into the sketch folder (close the Arduino editor while you do this so it will recognize the new files).


What to do if you download a game which uses the library:

Simple! Just copy your customized version of joystick.cpp into the game's sketch folder (overwrite the existing file) and you're good to go...

Code sample 1:

  Read the joystick and continuously display the state
  using 'Joystick::dump()'

// Uncomment this line to select non-Sparkfun joystick
//#define SPARKFUN 0

#include <SPI.h>
#include <GD.h>
#include <joystick.h>

// Create a joystick object
Joystick joystick;

void setup()
  delay(250);   // Give Gameduino time to boot
  // Calibrate analogue joystick at startup (optional)

void loop()

Code sample 2:

  Use the joystick in a game
#include <SPI.h>
#include <GD.h>
#include <joystick.h>

// Create a joystick object
Joystick joystick;

void setup()

void loop()
  if (joystick.isPressed(Joystick::ButtonA)) {
    if (joystick.changed(Joystick::ButtonA)) {
      // Button A went from "not-pressed" to "pressed" state...
  if (joystick.left()) {
  if (joystick.right()) {

See also: Making a Gameduino joystick and Using an NES controller with a Gameduino

Extra: Here's a version of Gameduino Asteroids (aka "potato shoot") which I modified to use the joystick library - it's much easier to play! (nb. I also hacked the game to make the background stars scroll much more smoothly...enjoy!)

Gameduino data compressor / source code generator:

The easiest way to include binary data in your Gameduino sketches is to convert it to source code (eg. header files) then include it directly in your sketches.

Here's a command line tool to compress files in 'Gameduino' format and output them as source code. The data can decompressed using the GD library's built-in "uncompress()" function.

gdcompress.zip - executable file for Windows, plus complete C++ source code.

Command line options:


gdcompress [-n name] [-u] [-v[v]] <file>

<file> is the name of the file to convert to source code.
"name" is the name of the array in the output source code.
'-u' will disable data compression.
'-v' will display compression statistics on the console.
nb. Type "gdcompress -?" to see these options.


Given a file "font8x8.bin", the following command:

gdcompress -n compressed_font font8x8.bin >font.h

Will create a file "font.h" containing the following:

static PROGMEM prog_uchar compressed_font[432] = {

To use the data in a sketch, use the uncompress() function...

#include "font.h"

void makeFont()

Gameduino Invaders:

Defend the Earth against the attacking aliens in a pixel-perfect Gameduino version of this classic game!

Download the complete sketch here

Version 0.9 alpha - still some things to add...check back for updates!

Update: David Cuartielles (one of the creators of Arduino) playing Gameduino Invaders at Campus Party 2011.

Raster effects without Forth:

Here's a little utility I wrote to make it much easier to do split screen effects on a Gamduino.

I used this method for the colors in Gameduino Invaders. All the sprites in the game are actually the same color but the coprocessor changes the sprite color palette as the screen is drawn: red in the area where the flying saucer appears; white where the area where the invaders are; green at the bottom for the shields and player.

If you look very closely at the game you can see the color of the player's bullet changing as it goes through the different zones. Also the invader's bombs...

This effect is totally authentic. The original arcade machines had strips of colored plastic stuck to the screen to achieve it, these days we emulate it using J1 coprocessors...

The code:

The way it works is that you set up a series of 'instructions' in the Gameduino RAM (a "copperlist") and a special coprocessor program then interprets them as the video frame progresses.

The code has two classes:

Copperlist instructions:

The available copperlist instructions are:

Wait for a screen line.
Write a byte to Gameduino memory.
Write a 16-bit value to memory (eg. a scroll register...)
Copy a block of Gameduino memory from one place to another
(eg. a color palette).
Stop executing instructions and wait for the next video frame
(nb. this is added automatically by "CopperlistBuilder::end()").

Modifying copperlists:

We need to be able to update copperlists to reflect current values from the game, eg. In the copperlist above we need to be able to set the game's X,Y scroll position.

If you look at the code which builds the list you'll see we set a variable "scrollInst" just before the write to the scroll registers. This points to the memory which needs to be updated. Now we use CopperlistBuilder again:

unsigned int scrollInst;

// Write current values of 'sx' and 'sy' into the copperlist...
CopperlistBuilder cp;
cp.rebuild(scrollInst);     // Start rebuild here

// nb. Do NOT do "cp.end()" when rebuilding...!


splitscreen2.zip - Complete sketch containing the source code for the copperlist library plus a modified version of the Gameduino "split screen" demo which uses copperlists.
Note: The code in this library is obsolete. The section "Digital sound playback" has an updated version
of this library which simultaneously does raster effects and sampled sound playback.

Bonus trivia question:

Where does the name "copperlist" come from...? :-)

Digital sound playback

The Gameduino can playback digital sound via the SAMPLE_L and SAMPLE_R registers. These registers are accessible from the host CPU but sample playback needs accurate timing and this would be difficult to achieve when doing other processing. The best solution is to playback the sound with the coprocessor.

The Gameduino demo "Sample playback" shows the basic method for doing this. A buffer is created in Gameduino memory and the coprocessor is programmed to play back this buffer in a loop. The main CPU can then keep the buffer filled with sample data, making sure it stays ahead of the coprocessor.

The code will look something like this:

// "sampleReadPos" is the 8-bit address the coprocessor is
// reading from in the playback buffer. This value will
// advance as the coprocessor consumes the sample data.

unsigned int rp = sampleReadPos;

// "sampleWritePos" is the 8-bit address of the last sample
// we wrote to the buffer.

unsigned int wp = sampleWritePos;

// The difference between "sampleReadPos" and "sampleWritePos"
// is the amount of empty space in the buffer. We need to fill
// this space with new sample data.

unsigned int emptySpace = (rp-wp)&255;

// nb. We leave a tiny gap between in the buffer to avoid
// confusion between '0' and '256' when using 8-bit math.
if (emptySpace > bufferGap) {
  emptySpace -= bufferGap;

  // Write new samples to the buffer

The function "writeSamples()" writes new sample data to the buffer (updating "sampleWritePos" in the process).

Choosing a buffer size

256 bytes is an obvious choice for the playback buffer size because Gameduino memory is organized in 256 byte pages. eg. We can use a color palette or a sprite image for the playback buffer.

256 bytes is also easy to code for - we're working with a circular buffer and 8-bit integers naturally which wrap around when they reach 256.

Choosing a playback rate

Choosing a playback rate is a bit more difficult. High sample rates obviously sound nicer but sound data occupies a lot of memory, something which most Gameduino hosts don't have.

An 8kHz sample rate seems a good compromise for the limitations of the system. Two seconds of sound at 8kHz is enough for a few simple sound effects and will fit in most Arduinos with enough memory left over for a game.

Sound playback library

The sound playback library below is based on the "copperlist" library presented earlier. With the new library you can do raster effects and sample playback at the same time! (This is the code used by Gameduino Invaders...)

The coprocessor code in the copperlist demo was extended to play back sample data at 8kHz and the "Coprocessor" object in the library was extended as follows:

Now we need functions to manage the sounds and put sample data into the buffer. A new object "SoundController" was created for this, it has the following functions:

Code sample

  Demonstration of raster effects and sample
  playback using libartlum.

#include "joystick.h"
#include "libartlum.h"

// The sound samples as source code
#include "samples.h"

// Joystick object
Joystick joystick;

void setup()

  // Start the coprocessor with sample page at 0x3f00

void loop()
  // Frame counter

  // Start a sound when you press a joystick button
  if (joystick.isPressed(Joystick::buttonA)) {
    if (joystick.changed(Joystick::buttonA)) {

  // Update sound (keep the sample buffer filled)


libartlum_demo.zip - complete sketch which does copperlist raster effects and uses the joystick buttons to play back sampled drum sounds.

The sound player in this version of the library supports four sound channels.

SPI performance, a case study

The digital sound playback demo sketch I wrote last week had some split screen and raster effects built into it to give the new version of the artlum library a workout. A big part of this raster effect was rebuilding a copperlist to change the SCROLL_X register on every line of the bottom half of the screen.

For every screen line there was a "wait" instruction (3 bytes) plus a 16-bit "write" instruction (5 bytes). This gives a total of 150*(3+5) = 1200 bytes of data written via SPI

But...for each copperlist instruction there's also the overhead of setting up the SPI transfer. Remember that an SPI write needs to do the following:

ie. For each block of memory written there's two extra bytes of data for the address plus you have to toggle an Arduino output pin from high to low and back again.

The copperlist rebuild writes 300 blocks of data so we need to add 600 more bytes of SPI traffic to the 1200 bytes of actual data, giving 1800 total. There's also 600 calls to digitalWrite() to set the state of the Gameduino device select.

It takes two microseconds to transfer one byte of data to the Gameduino over SPI so the data transfer should take about 3.6 milliseconds with all the rest of the time spent toggling the Arduino output pin.

To time the copperlist rebuild I looked at the coprocessor YLINE register after the rebuild and displayed it on screen. This is a simple method, but very effective when you're trying to get things running in a single video frame (which is game programmer Nirvana). The Artlum library has some simple functions to read/display the current raster line:

// Read current value of YLINE
unsigned int yline = Coprocessor::yline();

// Display it at (30,12)

One video frame at 72Hz is about 14 milliseconds. Our finger-in-the-air calculation says we need about 25% of that for the SPI data transfer plus a bit extra for SPI device selection. There's 300 video lines so I expected YLINE to be somewhere around 100...

But ... when I ran the program ... YLINE was 175 - it was taking nearly twice as long as I expected to rebuild the copperlist. Something was very wrong!

Investigating the cause...

After a round of checking for bugs/mistakes I decided that the value of YLINE being displayed was correct, it really was taking that long to rebuild the copperlist.

The time spent on SPI data transfer is easy to calculate so that only leaves one possible culprit - toggling the device select pin. Is it really possible that toggling an Arduino output pin 600 times takes nearly as long as tranferring 1800 bytes of data over SPI...?

Twenty minutes spent googling turned up an awful lot of complaints about the speed of the digitalWrite() function. Some pages even claimed it was taking as much as fifty times longer than it ought to. A quick look at the Arduino library source code confirmed that digitalWrite() is a very expensive function.

Fortunately a solution was offered: A replacement called digitalWriteFast which claims to compile an Arduino output pin change down to a single machine instruction.

I downloaded the new header file and hacked GD.cpp so that GDClass::__start() and GDClass::__end() functions became:

void GDClass::__start(unsigned int addr)
  digitalWriteFast(SS_PIN, LOW);

void GDClass::__end()
  digitalWriteFast(SS_PIN, HIGH);

The result? YLINE instantly dropped from 175 down to 100, exectly where the calculation predicted it should be. A whopping 42 percent of the total time taken to rebuild the copperlist was being wasted just toggling the Gameduino device select pin!

Further optimization

The overhead of the device select is now very small but we still have 600 bytes of data being sent over SPI which are just memory addresses. This represents one third of the total data transfer.

We know all the instructions in our copperlist are in consecutive memory locations so sending all this data is a complete waste of time - we need a smarter version of "CopperlistBuilder".

Remember that an SPI transaction works like this:

ie. Whenever the device select line goes LOW the Gameduino uses the next two bytes received over SPI as address bytes and everything else as data bytes until the device select line goes HIGH again.

I'm a C++ programmer so to me this immediately says "RAII". We need an object which grabs controls of SPI and releases it when the object goes out of scope in the destructor. C programmers will probably think "Ugh!" at this point but I don't care. I've done the math many times and C++ always wins.

Out object will look something like this:

class GDwriter {
  // Open SPI transaction
  GDwriter(unsigned int address);

  // Close the transaction when I go out of scope

  // Close the transaction manually (if needed)
  void close();

  // Start writing to a new address (intelligently)
  void reset(unsigned int address);

  // Send some data over SPI
  // nb. We follow the GD librray syntax to make it easier
  // to edit your code...
  GDwriter& wr();
  GDwriter& wr16();

I also reworked CopperlistBuilder so that it writes all data to a GDwriter instead of writing it directly

class CopperlistBuilder {


All the overhead of toggling the device select line and sending address bytes was removed by the intelligent writer. The result was that YLINE went from 100 down to 42, a massive saving!

The original version of the demo took about sixty percent of the available CPU time to rebuild the copperlist, the optimized version takes about fifteen percent - four times faster!

(nb. Pedants will note I haven't factored the vertical blanking time into the equation but the savings still speak for themselves....)


Transferring data over SPI is a big overhead, any effort spent optimizing SPI transactions is worth while. The standard GD library functions add a lot of overhead to every SPI transaction through use of digitalWrite().


libartlum_demo2.zip - the optimized version of the Artlum library with GDwriter and new CopperlistBuilder (also includes an updated version of the sound/raster effect demo).

Code sample 1 - make a copperlist using GDwriter

  // Make a copperlist to change a palette color on line 100
  // The copperlist is stored on the Gameduino at address 0x3f80
  GDwriter gdw(0x3f80);
  CopperlistBuilder cp(gdw);
  cp.write16(PALETTE4A, 0x7fff);
  cp.end();  // And start using the new copperlist

Code sample 2 - modify an existing copperlist using GDwriter

  unsigned int XscrollInst;
  void setup() {
    GDwriter gdw(...);
    CopperlistBuilder cp(gdw);
    XscrollInst = gdw.address();    // Get current output location
    cp.write16(SCROLL_X, 0);

  void loop() {
    GDwriter gdw(XscrollInst);      // Where the 'write16()' instruction is
    CopperlistBuilder cp(gdw);
    cp.write16(SCROLL_X, xscroll);  // Overwrite the previous instruction

Code sample 3 - remember to close GDwriter

  // This code will fail!
  GDwriter gdw(0x3f80);
  CopperlistBuilder cp(gdw);
  cp.write16(PALETTE4A, 0x7fff);

  // Error, you forgot to call gdw.close() before
  // using SPI with other functions

Correct code would be:

  GDwriter gdw(0x3f80);
  CopperlistBuilder cp(gdw);
  cp.write16(PALETTE4A, 0x7fff);

  // Finish the SPI transaction

  // OK to use other SPI functions

Or even:

  { GDwriter gdw(0x3f80);    // The gdw object is scoped
    CopperlistBuilder cp(gdw);
    cp.write16(PALETTE4A, 0x7fff);
    // The compiler will do the right thing here...
    // no need for you to do anything

  // OK to use other SPI functions


I also added functions GDrd(), GDrd16(), GDwr() and GDwr16() which work just like the matching "GD." functions except they use digitalWriteFast() to avoid the overhead of the Arduino digitalWrite() function.

With these four functions a quick search and replace in your code can give your sketch an instant speed boost (maybe as much as 50%!)

Digital sound compression

Save memory by compressing your sounds to half size!

Coming soon, honestly...!

Tips and tricks for Gamduino programmers:

  1. Use the smallest possible integers in your code

    The Arduino processor is an 8 bit processor and is happiest working with 8 bit numbers. Using "int" in your sketches means your programs will be much larger and slower then they need to be. Use char for signed integers byte for unsigned integers wherever possible.

    For extra style points define a small integer type like this:

    typedef int8_t int8;

    That makes it clear when you're using integers and when you're using ASCII characters, eg.

    // A 'char' as an index...?
    for (char i=0; i<10; ++i) {
      thing[i] = i;
    // This seems better to me
    for (int8 i=0; i<10; ++i) {
      thing[i] = i;

    You can use "int8_t" directly if you want to but I think it makes your code look messy and its much harder to type.

    If you really want 10/10 for good coding practice then do this as well:

    typedef int16_t int16;

    ...then only use int16 and int8 in your code.

  2. Wait a little while before calling GD.begin()

    The FPGA takes a few milliseconds to load its microcode so give it time to finish before you start using it. If you don't do this the Gameduino may not power on correctly.

    void setup()
      delay(250);   // A quarter of a second is enough...
  3. Write data to the Gameduino in batches

    All communication with the Gameduino is done via an SPI interface.

    To write a byte of the data to Gameguino you have to:

    • Select the SPI device (set Arduino pin 9 low)
    • Send the low byte of the address over SPI.
    • Send the high byte of the address over SPI.
    • Send the byte of data over SPI
    • Deselect the SPI device (set Arduino pin 9 high)

    It's obvious that there's a lot of overhead if you're only sending a single byte (only about 25% of the time is spent sending actual data, all the rest is overhead!)

    However, there's a trick: Every time you send a data byte the Gameduino's internal address register is incremented. If you send another data byte it will be written to the next address in memory. eg. To write two adjacent bytes you can do this:

    • Select the SPI device (set Arduino pin 9 low)
    • Send the low byte of the address over SPI.
    • Send the high byte of the address over SPI.
    • Send byte 1...
    • Send byte 2...
    • Deselect the SPI device (set Arduino pin 9 high)

    This is a big speedup...about 40% faster than writing the bytes separately!

    The Gameduino library has built-in functions for SPI control so you can easily combine your read/write operations into blocks, eg. to write four bytes:

    // This is slow...
    GD.wr(addr,   1);
    GD.wr(addr+1, 2);
    GD.wr(addr+2, 3);
    GD.wr(addr+3, 4);
    // This is approximately twice as fast as writing individual bytes

    Efficient use of SPI can make a big difference to the speed of your programs.

    nb. This tip has turned into a full article

  4. The most important tip of all: Keep checking this page for updates... ;-)