If you don't know what a Gameduino is, start here.
There's different types of joystick in the world and on an Arduino there's plenty of different pins to connect the wires to. To ensure that every Gameduino game works with everybody's joystick with the minimum of fuss we need a joystick library which sorts out the physical differences.
The library needs to:
With this in mind, I propose the following solution: joystick_lib.zip
Unzip the 'joystick_lib' folder and copy it into your Arduino projects folder.
If everything is correct you'll have a new entry called 'joystick_lib' in your Arduino sketches menu:
The sample sketch will display the current state of your joystick on screen. If you have a joystick and it's configured correctly you should see the movements of the stick and the button changes when you press them.
If you're not seeing this then maybe you need to...
By default the library is configured for the Sparkfun joystick shield. If you have a different joystick you need to edit "joystick.cpp" and define the layout of your joystick.
See comments in "joystick.cpp" for more info on how to do this.
To use the library in your own sketches, copy the files "joystick.cpp" and "joystick.h" into the sketch folder (close the Arduino editor while you do this so it will recognize the new files).
Advantages:
Simple! Just copy your customized version of joystick.cpp into the game's sketch folder (overwrite the existing file) and you're good to go...
/*--------------------------------------------------------
Read the joystick and continuously display the state
using 'Joystick::dump()'
---------------------------------------------------------*/
// Uncomment this line to select non-Sparkfun joystick
//#define SPARKFUN 0
#include <SPI.h>
#include <GD.h>
#include <joystick.h>
// Create a joystick object
Joystick joystick;
void setup()
{
delay(250); // Give Gameduino time to boot
GD.begin();
GD.ascii();
// Calibrate analogue joystick at startup (optional)
joystick.recalibrate();
}
void loop()
{
joystick.read();
joystick.dump(4,10);
GD.waitvblank();
}
|
/*--------------------------------------------------------
Use the joystick in a game
---------------------------------------------------------*/
#include <SPI.h>
#include <GD.h>
#include <joystick.h>
// Create a joystick object
Joystick joystick;
void setup()
{
delay(250);
GD.begin();
...
}
void loop()
{
joystick.read();
if (joystick.isPressed(Joystick::ButtonA)) {
if (joystick.changed(Joystick::ButtonA)) {
// Button A went from "not-pressed" to "pressed" state...
fireABullet();
}
}
if (joystick.left()) {
movePlayerLeft();
}
if (joystick.right()) {
movePlayerRight();
}
}
|
See also: Making a Gameduino joystick and Using an NES controller with a Gameduino
Extra: Here's a version of Gameduino Asteroids (aka "potato shoot") which I modified to use the joystick library - it's much easier to play! (nb. I also hacked the game to make the background stars scroll much more smoothly...enjoy!)The easiest way to include binary data in your Gameduino sketches is to convert it to source code (eg. header files) then include it directly in your sketches.
Here's a command line tool to compress files in 'Gameduino' format and output them as source code. The data can decompressed using the GD library's built-in "uncompress()" function.
gdcompress.zip - executable file for Windows, plus complete C++ source code.
Usage:
gdcompress [-n name] [-u] [-v[v]] <file> <file> is the name of the file to convert to source code. "name" is the name of the array in the output source code. '-u' will disable data compression. '-v' will display compression statistics on the console.nb. Type "gdcompress -?" to see these options.
Given a file "font8x8.bin", the following command:
gdcompress -n compressed_font font8x8.bin >font.h
Will create a file "font.h" containing the following:
static PROGMEM prog_uchar compressed_font[432] = {
0x41,0x01,0x03,0x00,0x08,0x70,0x50,0x60,0x04,0x28,0x28,0x00,0xD8,0x04,0x20,0xD2,
0xD8,0xB0,0xF1,0x0F,0x28,0x36,0x00,0x60,0xF0,0xB3,0xC0,0x07,0x9A,0x5F,0x4C,0x18,
0x30,0x03,0x03,0x03,0x03,0x33,0x60,0x00,0x70,0x24,0x00,0x07,0x5B,0x66,0xB8,0xA3,
0x13,0x96,0x5C,0x47,0x35,0x00,0x30,0xC0,0x00,0x08,0x6C,0x00,0x22,0x07,0x83,0x1F,
0x9E,0x74,0x56,0x28,0x2B,0xC1,0xAB,0x73,0x9A,0xF9,0xB5,0xBE,0xCD,0x09,0x04,0x56,
0x5E,0x06,0x7D,0xC1,0xC3,0x8C,0x1D,0x3F,0x6E,0xCC,0xF0,0xF4,0x07,0xC7,0xC0,0x12,
0x11,0x9E,0x18,0x05,0x1F,0x0E,0xB0,0xE8,0x82,0x81,0x83,0x87,0x0D,0xBF,0x5A,0xE4,
0x81,0x01,0x1F,0x60,0xE1,0x83,0x93,0x05,0x3E,0xCC,0xD2,0xCB,0x2F,0x7F,0xBD,0xC8,
0x6F,0x1E,0xC2,0x17,0x5C,0x7C,0x95,0xC0,0x71,0xF2,0xFA,0x07,0x7F,0x13,0x4D,0x9B,
0x5F,0xBF,0x06,0xB4,0xDA,0x9E,0x75,0xF7,0x3C,0xEF,0xEB,0x2F,0x2B,0x76,0x0C,0xF2,
0xCF,0xAF,0x3A,0xCC,0x00,0xD4,0x31,0x50,0xD3,0x65,0x10,0x80,0xE9,0xF2,0x60,0x13,
0x55,0x36,0x3C,0xFA,0x3B,0x80,0x88,0x14,0x36,0xF8,0x3B,0x88,0x2F,0xBB,0xE1,0x65,
0x36,0xFB,0x6F,0xDC,0x3F,0xBE,0xD5,0xAC,0x85,0xCD,0x70,0x32,0xCB,0x21,0xB7,0x00,
0x8A,0x5A,0xC2,0x5F,0x30,0xE6,0xCE,0x9F,0x35,0x6B,0xC6,0x8C,0xD3,0xCB,0x8D,0x5F,
0x0E,0xB7,0xD7,0xCE,0xEF,0xB4,0xAF,0xFC,0x7D,0x64,0xC5,0x06,0x5B,0xFF,0x3B,0xD9,
0x3F,0x3C,0xFF,0xAB,0xFE,0x25,0xD3,0xDD,0xFE,0x6C,0x3D,0xA2,0x14,0xE5,0xE0,0xCF,
0xBD,0x7C,0x61,0xF8,0xED,0xF0,0xA2,0xF2,0x4B,0x7E,0x77,0x5B,0xEE,0x92,0x36,0xF9,
0x9D,0xBF,0x01,0xFB,0x07,0x3E,0x60,0x01,0xC2,0xF7,0x91,0x1F,0x10,0x8A,0x42,0xC0,
0x83,0xE5,0x1F,0x0E,0x36,0x0C,0x7C,0x30,0x60,0xB8,0x4E,0x0A,0xDA,0x81,0x8F,0x59,
0x7C,0x9E,0xF9,0xC1,0xEC,0x3E,0xEC,0xBF,0x99,0x7D,0x64,0x76,0xC2,0xF7,0x9B,0xF7,
0xC0,0x49,0x43,0x7A,0x31,0x14,0x32,0xBA,0x55,0xC0,0x93,0xFE,0xFF,0x2E,0x04,0xC7,
0x7F,0xCF,0x13,0xFC,0xCD,0x70,0xF8,0x0E,0x33,0x36,0xFF,0x0B,0x7B,0x74,0x47,0x80,
0x8D,0xFF,0x46,0xFE,0x33,0x81,0xF4,0xD7,0xFF,0x7F,0xE1,0x5F,0x63,0xF9,0x97,0x00,
0x7C,0x03,0x6C,0xB8,0xED,0xB2,0x7E,0x6D,0x01,0xB6,0x9F,0x76,0x63,0x80,0xB3,0xC9,
0xFF,0x0D,0xEF,0xFF,0x87,0x64,0x5C,0x0E,0xFE,0xD8,0xA2,0xF7,0x9D,0xCB,0xC4,0x7F,
0x7C,0x1F,0xE0,0x07,0x06,0xC6,0x79,0x83,0x30,0x98,0x09,0x80,0x79,0xCF,0x54,0x02,
0x8F,0x61,0x2E,0xE0,0x16,0xB0,0x1D,0x8C,0xAC,0x89,0xFD,0xDB,0x7F,0x80,0x83,0x02
};
|
To use the data in a sketch, use the uncompress() function...
#include "font.h"
void makeFont()
{
GD.uncompress(compressed_font,RAM_CHR+...);
}
|
Defend the Earth against the attacking aliens in a pixel-perfect Gameduino version of this classic game!
Download the complete sketch here
Update: David Cuartielles (one of the creators of Arduino) playing Gameduino Invaders at Campus Party 2011. |
Here's a little utility I wrote to make it much easier to do split screen effects on a Gamduino.
I used this method for the colors in Gameduino Invaders. All the sprites in the game are actually the same color but the coprocessor changes the sprite color palette as the screen is drawn: red in the area where the flying saucer appears; white where the area where the invaders are; green at the bottom for the shields and player.
If you look very closely at the game you can see the color of the player's bullet changing as it goes through the different zones. Also the invader's bombs...
This effect is totally authentic. The original arcade machines had strips of colored plastic stuck to the screen to achieve it, these days we emulate it using J1 coprocessors...
The way it works is that you set up a series of 'instructions' in the Gameduino RAM (a "copperlist") and a special coprocessor program then interprets them as the video frame progresses.
The code has two classes:
'Coprocessor' - this class initializes the Gameduino coprocessor and loads the microcode to interpret copperlists. You need to initialize this in your sketch's "setup()" function.
#include "artlum_util.h"
void setup()
{
delay(250);
GD.begin();
// Load the copperlist code
Coprocessor::reset();
...
} |
A class called 'CopperlistBuilder' which helps you build copperlists in Gameduino RAM.
eg. Create a copperlist in memory at location 0x3f00, the copperlists scrolls the top part of the screen in X and Y then resets the scroll registers to (0,0) on line 200.
CopperlistBuilder cp(0x3f00); cp.write16(); // See section "modifying copperlists"... scrollInst = cp.location(); // Set scroll X,y for top part of screen cp.write16(SCROLL_X,0); cp.write16(SCROLL_Y,0); // Wait for line 200 cp.wait(200); // Set scroll to 0,0 cp.write16(SCROLL_X,0); cp.write16(SCROLL_Y,0); // Add a 'halt' instruction and start // executing the copperlist... cp.end(); |
The available copperlist instructions are:
wait(line) | Wait for a screen line. |
|
write(addr,val) | Write a byte to Gameduino memory. |
|
write16(addr,val) | Write a 16-bit value to memory (eg. a scroll register...) |
|
copy(from,to,size) | Copy a block of Gameduino memory from one place to another (eg. a color palette). |
|
'halt' | Stop executing instructions and wait for the next video frame (nb. this is added automatically by "CopperlistBuilder::end()"). |
We need to be able to update copperlists to reflect current values from the game, eg. In the copperlist above we need to be able to set the game's X,Y scroll position.
If you look at the code which builds the list you'll see we set a variable "scrollInst" just before the write to the scroll registers. This points to the memory which needs to be updated. Now we use CopperlistBuilder again:
unsigned int scrollInst; ... // Write current values of 'sx' and 'sy' into the copperlist... CopperlistBuilder cp; cp.rebuild(scrollInst); // Start rebuild here cp.write16(SCROLL_X,sx); cp.write16(SCROLL_Y,sy); // nb. Do NOT do "cp.end()" when rebuilding...! |
The Gameduino can playback digital sound via the SAMPLE_L and SAMPLE_R registers. These registers are accessible from the host CPU but sample playback needs accurate timing and this would be difficult to achieve when doing other processing. The best solution is to playback the sound with the coprocessor.
The Gameduino demo "Sample playback" shows the basic method for doing this. A buffer is created in Gameduino memory and the coprocessor is programmed to play back this buffer in a loop. The main CPU can then keep the buffer filled with sample data, making sure it stays ahead of the coprocessor.
The code will look something like this:
// "sampleReadPos" is the 8-bit address the coprocessor is
// reading from in the playback buffer. This value will
// advance as the coprocessor consumes the sample data.
unsigned int rp = sampleReadPos;
// "sampleWritePos" is the 8-bit address of the last sample
// we wrote to the buffer.
unsigned int wp = sampleWritePos;
// The difference between "sampleReadPos" and "sampleWritePos"
// is the amount of empty space in the buffer. We need to fill
// this space with new sample data.
unsigned int emptySpace = (rp-wp)&255;
// nb. We leave a tiny gap between in the buffer to avoid
// confusion between '0' and '256' when using 8-bit math.
if (emptySpace > bufferGap) {
emptySpace -= bufferGap;
// Write new samples to the buffer
writeSamples(emptySpace);
} |
The function "writeSamples()" writes new sample data to the buffer (updating "sampleWritePos" in the process).
256 bytes is an obvious choice for the playback buffer size because Gameduino memory is organized in 256 byte pages. eg. We can use a color palette or a sprite image for the playback buffer.
256 bytes is also easy to code for - we're working with a circular buffer and 8-bit integers naturally which wrap around when they reach 256.
Choosing a playback rate is a bit more difficult. High sample rates obviously sound nicer but sound data occupies a lot of memory, something which most Gameduino hosts don't have.
An 8kHz sample rate seems a good compromise for the limitations of the system. Two seconds of sound at 8kHz is enough for a few simple sound effects and will fit in most Arduinos with enough memory left over for a game.
The sound playback library below is based on the "copperlist" library presented earlier. With the new library you can do raster effects and sample playback at the same time! (This is the code used by Gameduino Invaders...)
The coprocessor code in the copperlist demo was extended to play back sample data at 8kHz and the "Coprocessor" object in the library was extended as follows:
The "reset()" function now takes an optional parameter to set the location of the sample playback buffer.
void setup()
{
delay(250);
GD.begin();
// Start the coprocessor with sample buffer at 0x3f00
Coprocessor::reset(0x3f00);
...
}
|
A function was added to get the current sample read position
class Coprocessor {
public:
...
// Where the coprocessor is currently reading
// samples from in the sample buffer.
static byte sampleReadPos();
}
|
Now we need functions to manage the sounds and put sample data into the buffer. A new object "SoundController" was created for this, it has the following functions:
reset()
Stop all sounds.
update()
Keep the sample buffer filled with sample data. You need to call this function at least once per video frame.
playSample(prog_char *data, int numSamples, byte channel)
Play back a sample from program memory. The function needs a pointer to thesample data (in PROGMEM), the number of samples to play ("ns") and the sound channel to play the sample on ("c").
/*--------------------------------------------------- Demonstration of raster effects and sample playback using libartlum. ---------------------------------------------------*/ #include |
libartlum_demo.zip - complete sketch which does copperlist raster effects and uses the joystick buttons to play back sampled drum sounds.
The sound player in this version of the library supports four sound channels.
The digital sound playback demo sketch I wrote last week had some split screen and raster effects built into it to give the new version of the artlum library a workout. A big part of this raster effect was rebuilding a copperlist to change the SCROLL_X register on every line of the bottom half of the screen.
For every screen line there was a "wait" instruction (3 bytes) plus a 16-bit "write" instruction (5 bytes). This gives a total of 150*(3+5) = 1200 bytes of data written via SPI
But...for each copperlist instruction there's also the overhead of setting up the SPI transfer. Remember that an SPI write needs to do the following:
ie. For each block of memory written there's two extra bytes of data for the address plus you have to toggle an Arduino output pin from high to low and back again.
The copperlist rebuild writes 300 blocks of data so we need to add 600 more bytes of SPI traffic to the 1200 bytes of actual data, giving 1800 total. There's also 600 calls to digitalWrite() to set the state of the Gameduino device select.
It takes two microseconds to transfer one byte of data to the Gameduino over SPI so the data transfer should take about 3.6 milliseconds with all the rest of the time spent toggling the Arduino output pin.
To time the copperlist rebuild I looked at the coprocessor YLINE register after the rebuild and displayed it on screen. This is a simple method, but very effective when you're trying to get things running in a single video frame (which is game programmer Nirvana). The Artlum library has some simple functions to read/display the current raster line:
// Read current value of YLINE unsigned int yline = Coprocessor::yline(); // Display it at (30,12) showNumber(yline,30,12); |
One video frame at 72Hz is about 14 milliseconds. Our finger-in-the-air calculation says we need about 25% of that for the SPI data transfer plus a bit extra for SPI device selection. There's 300 video lines so I expected YLINE to be somewhere around 100...
But ... when I ran the program ... YLINE was 175 - it was taking nearly twice as long as I expected to rebuild the copperlist. Something was very wrong!
After a round of checking for bugs/mistakes I decided that the value of YLINE being displayed was correct, it really was taking that long to rebuild the copperlist.
The time spent on SPI data transfer is easy to calculate so that only leaves one possible culprit - toggling the device select pin. Is it really possible that toggling an Arduino output pin 600 times takes nearly as long as tranferring 1800 bytes of data over SPI...?
Twenty minutes spent googling turned up an awful lot of complaints about the speed of the digitalWrite() function. Some pages even claimed it was taking as much as fifty times longer than it ought to. A quick look at the Arduino library source code confirmed that digitalWrite() is a very expensive function.
Fortunately a solution was offered: A replacement called digitalWriteFast which claims to compile an Arduino output pin change down to a single machine instruction.
I downloaded the new header file and hacked GD.cpp
so that GDClass::__start() and
GDClass::__end() functions became:
void GDClass::__start(unsigned int addr)
{
digitalWriteFast(SS_PIN, LOW);
SPI.transfer(highByte(addr));
SPI.transfer(lowByte(addr));
}
void GDClass::__end()
{
digitalWriteFast(SS_PIN, HIGH);
} |
The result? YLINE instantly dropped from 175 down to 100, exectly where the calculation predicted it should be. A whopping 42 percent of the total time taken to rebuild the copperlist was being wasted just toggling the Gameduino device select pin!
The overhead of the device select is now very small but we still have 600 bytes of data being sent over SPI which are just memory addresses. This represents one third of the total data transfer.
We know all the instructions in our copperlist are in consecutive memory locations so sending all this data is a complete waste of time - we need a smarter version of "CopperlistBuilder".
Remember that an SPI transaction works like this:
I'm a C++ programmer so to me this immediately says "RAII". We need an object which grabs controls of SPI and releases it when the object goes out of scope in the destructor. C programmers will probably think "Ugh!" at this point but I don't care. I've done the math many times and C++ always wins.
Out object will look something like this:
class GDwriter {
public:
// Open SPI transaction
GDwriter(unsigned int address);
// Close the transaction when I go out of scope
~GDwriter();
// Close the transaction manually (if needed)
void close();
// Start writing to a new address (intelligently)
void reset(unsigned int address);
// Send some data over SPI
//
// nb. We follow the GD librray syntax to make it easier
// to edit your code...
GDwriter& wr();
GDwriter& wr16();
};
|
I also reworked CopperlistBuilder so that it writes all data to a GDwriter instead of writing it directly
class CopperlistBuilder {
CopperlistBuilder(GDwriter&);
...
};
|
All the overhead of toggling the device select line and sending address bytes was removed by the intelligent writer. The result was that YLINE went from 100 down to 42, a massive saving!
The original version of the demo took about sixty percent of the available CPU time to rebuild the copperlist, the optimized version takes about fifteen percent - four times faster!(nb. Pedants will note I haven't factored the vertical blanking time into the equation but the savings still speak for themselves....)
Transferring data over SPI is a big overhead, any effort spent optimizing SPI transactions is worth while. The standard GD library functions add a lot of overhead to every SPI transaction through use of digitalWrite().
libartlum_demo2.zip - the optimized version of the Artlum library with GDwriter and new CopperlistBuilder (also includes an updated version of the sound/raster effect demo).
// Make a copperlist to change a palette color on line 100 // The copperlist is stored on the Gameduino at address 0x3f80 GDwriter gdw(0x3f80); CopperlistBuilder cp(gdw); cp.wait(100); cp.write16(PALETTE4A, 0x7fff); cp.end(); // And start using the new copperlist |
unsigned int XscrollInst;
void setup() {
...
GDwriter gdw(...);
CopperlistBuilder cp(gdw);
cp.wait(220);
XscrollInst = gdw.address(); // Get current output location
cp.write16(SCROLL_X, 0);
cp.end();
}
void loop() {
GDwriter gdw(XscrollInst); // Where the 'write16()' instruction is
CopperlistBuilder cp(gdw);
cp.write16(SCROLL_X, xscroll); // Overwrite the previous instruction
}
|
// This code will fail! GDwriter gdw(0x3f80); CopperlistBuilder cp(gdw); cp.wait(100); cp.write16(PALETTE4A, 0x7fff); cp.end(); // Error, you forgot to call gdw.close() before // using SPI with other functions GD.wr16(...); |
Correct code would be:
GDwriter gdw(0x3f80); CopperlistBuilder cp(gdw); cp.wait(100); cp.write16(PALETTE4A, 0x7fff); cp.end(); // Finish the SPI transaction gdw.close(); // OK to use other SPI functions GD.wr16(...); |
Or even:
{ GDwriter gdw(0x3f80); // The gdw object is scoped
CopperlistBuilder cp(gdw);
cp.wait(100);
cp.write16(PALETTE4A, 0x7fff);
cp.end();
// The compiler will do the right thing here...
// no need for you to do anything
}
// OK to use other SPI functions
GD.wr16(...);
|
I also added functions GDrd(), GDrd16(), GDwr() and GDwr16() which work just like the matching "GD." functions except they use digitalWriteFast() to avoid the overhead of the Arduino digitalWrite() function.
With these four functions a quick search and replace in your code can give your sketch an instant speed boost (maybe as much as 50%!)
Save memory by compressing your sounds to half size!
Coming soon, honestly...!
Use the smallest possible integers in your code
The Arduino processor is an 8 bit processor and is happiest
working with 8 bit numbers. Using "int" in your sketches means
your programs will be much larger and slower then they need to be.
Use char for signed integers byte for
unsigned integers wherever possible.
For extra style points define a small integer type like this:
typedef int8_t int8; |
That makes it clear when you're using integers and when you're using ASCII characters, eg.
// A 'char' as an index...?
for (char i=0; i<10; ++i) {
thing[i] = i;
}
// This seems better to me
for (int8 i=0; i<10; ++i) {
thing[i] = i;
}
|
You can use "int8_t" directly if you want to
but I think it makes your code look messy and its much harder
to type.
If you really want 10/10 for good coding practice then do this as well:
typedef int16_t int16;
...then only use int16 and int8 in your code.
Wait a little while before calling GD.begin()
The FPGA takes a few milliseconds to load its microcode so give it time to finish before you start using it. If you don't do this the Gameduino may not power on correctly.
void setup()
{
delay(250); // A quarter of a second is enough...
GD.begin();
...
}
|
Write data to the Gameduino in batches
All communication with the Gameduino is done via an SPI interface.
To write a byte of the data to Gameguino you have to:
It's obvious that there's a lot of overhead if you're only sending a single byte (only about 25% of the time is spent sending actual data, all the rest is overhead!)
However, there's a trick: Every time you send a data byte the Gameduino's internal address register is incremented. If you send another data byte it will be written to the next address in memory. eg. To write two adjacent bytes you can do this:
This is a big speedup...about 40% faster than writing the bytes separately!
The Gameduino library has built-in functions for SPI control so you can easily combine your read/write operations into blocks, eg. to write four bytes:
// This is slow... GD.wr(addr, 1); GD.wr(addr+1, 2); GD.wr(addr+2, 3); GD.wr(addr+3, 4); // This is approximately twice as fast as writing individual bytes GD.__wstart(addr); SPI.transfer(1); SPI.transfer(2); SPI.transfer(3); SPI.transfer(4); GD.__end(); |
Efficient use of SPI can make a big difference to the speed of your programs.
nb. This tip has turned into a full article
The most important tip of all: Keep checking this page for updates... ;-)