Frameskipping, interpolation and re-entrant IRQ code by Lasse Öörni
--------------------------------------------------------------------
This is a theoretical rant about what you can possibly do when it seems you run
out of rastertime and your game/demo starts to slow down. Of course, the
obvious solution is to optimize code or leave routines out entirely, but this
is not about that...
1. Running out of time
----------------------
Traditionally games & demos use frame-based movement; for example 50 times a
second the screen & sprites move a little bit to create the illusion of
motion. But for the C64, trying to do all the movement/logic code, as well
as the actual graphics code (scrolling the screen, raster IRQs etc.) on
each frame can simply be too much.
If a program has been designed well, going "over" that "limit" doesn't involve
ugly graphical effects like flickering/jerking of screen, but it's just visible
as an overall slowdown. Practically, as I wrote in a previous rant, this
design involves making sure that no matter what happens, raster interrupts
have enough time to do their screen-update task; in fact their task should be
just the immediate setting of VIC registers (like screen-splits and sprite
multiplexing) and playing music/sound.
All the time-consuming things like movement, AI & scrolling are done in
the main program; if it is too "slow" the screen will nevertheless show
correctly. However, this rant challenges that tried-and-true approach with
a new, more complex but more powerful approach.
The basic question is, what can we do once we see that the C64 can't simply
handle the load? Of course, there's no magic in this; no approach can magically
give you more clock cycles, but there are ways to sort of "cheat", while
keeping the internal logic of the program uncompromised.
2. Movement/logic vs. graphics update
-------------------------------------
To know what we can do about slowing down we have to see if two distinct
areas can be identified from the code:
Movement/logic
- For advancing the internal state of the game/demo. This means moving
the characters, processing AI, doing collision detection, and executing
virtual machine bytecode :)
Graphics update
- For rendering the internal state onscreen. On C64, this means things like
scrolling (shifting the screen memory if no VIC tricks are used), sorting
the sprites for multiplexing, and actually showing the multiplexed sprites.
Usually in games these are easy to separate, while in demo effects this is
not always the case. If they can't be separated in your project, the rest of
this rant will not be of much use to you.
3. Frameskipping
----------------
Let's first look at an approach many know from PC games/demos: frameskipping.
PCs are usually powerful enough to handle the AI & movement on each frame but
not necessarily to render complex graphics scenes.
The idea is to keep track of the time a frame was last rendered onscreen; if
for example the previous rendering was 3 frames away, call the movement/logic
code 3 times before the next rendering call. Now the motion becomes more
jerky, as not each frame is drawn, but the apparent speed remains the same.
Frameskipping can be very useful in 3D games or isometric 2D bitmap games
even on C64. Of course, if also the movement/logic code takes a lot of time, it
will not help much.
Also in games like Last Ninja, where the sprites have to be masked against
the background (very time-consuming!), frameskipping can help immensely. Now
I'm not sure if Last Ninjas actually frameskip. I remember seeing the
characters move with bigger steps in some CPU-intense scenes (like the
maintenance areas in LN2:Office level), so likely at least LN2 frameskipped.
4. Interpolation
----------------
But what if we want to still keep running at 50Hz, scrolling almost the whole
screen with lots of sprites flying around? And what if we have some really
complex AI routines that take a lot of time? Perhaps we want to depack sprites
in realtime, too?
This all is (within some limits) possible on a stock C64. The key is to not
run your full movement/logic every frame. Run it for example each 2nd or each
4th frame, and you'll have quite a bit of time left over. In the in-between
frames, interpolate the movement of sprites with a line equation (linear
interpolation).
I'm not sure if anyone has used this in a C64 game before? The realization came
to me in the end of 2001, and after that I started developing this idea. I know
that some games like Gauntlet 3 or Myth scroll at 50Hz, while updating the
movement of sprites at a lower rate; however they don't show the inbetween-
frames for sprites.
There are different ways to do this. The easier but not-so-powerful approach
is this: The main program handles everything, raster interrupts just show the
screen & sprites in the way the main program wishes. The update loop could then
be something like:
1st frame:
- Do full movement/logic code. Before movement, store all the "old"
positions of sprites.
- Scroll screen if necessary
- Sort sprites
- Instruct raster IRQs to update screen on next frame
2nd frame:
- Interpolate sprite positions between old & new (much faster
than performing the full movement)
- Scroll screen if necessary
- Sort sprites
- Instruct raster IRQs to update screen on next frame
3rd frame:
Start the cycle again from the beginning
The key to get good performance is to always calculate as much as you can,
don't stop to wait! If you have a double-buffered screen, you don't have to
care of the raster beam position when you scroll the screen-RAM (except if
you use character-sprites like Turrican etc.) Of course, you *do* have to
care of it when you scroll the color-RAM :)
However, the bottleneck is still the full movement/logic code. By doing only
the interpolation on each second frame we win a little time but still, if
the movement/logic takes too much time, the frame update won't be ready
on time and the whole action slows down.
5. Advanced interpolation & re-entrant IRQs
-------------------------------------------
This is a bit like multitasking. The concept of this was quite complex &
disgusting to me at first but then I realized it's the best way to do things.
Now the movement/logic can continue on its task whenever the CPU has free
time and frame updates will still come in time.
We'll have 3 areas of code:
- Movement/logic, handled by main program. Execution of this is triggered
each 2 or 4 frames. This code doesn't touch any VIC registers or handle
scrolling on its own!
- Code for handling the next frame update and interpolating sprite
movement. Scrolling goes here also. This will be executed from
IRQs, however, whenever needed, it will be interrupted by
- The actual low-level raster IRQ code. Screen splits, setting
multiplexed sprites onscreen and playing music/sound.
The idea is that once the movement/logic part has completed processing, it waits
that a frame counter maintained by the frame update IRQ has reached its end
value (2 or 4). Now it will reset this counter, and the frame update IRQ will
interpolate the next 2 or 4 frames. After this, it stops. The catch is that
if the movement/logic part is too slow or has too little CPU time left over
from the other areas, a visible "pause" will occur at this point.
Another thing to consider (especially on NTSC machines) is that if a big
portion of screen is scrolled, with color-RAM, and with lots of sprites to
be interpolated & sorted, the frame update IRQ might also run too slow. In that
case it should just wait for the next frame.
Now the big question is: how do we invoke the frame update IRQ while also
letting the low-level IRQs run? I don't want to use multiple interrupt sources,
so I did the following:
For starters, all IRQs must save CPU registers on the stack. Using fixed
zeropage addresses instead would corrupt the regs in case of nested IRQs
(that will occur with this approach!)
I usually have one raster interrupt at the top of the screen, to set up the
gamescreen display, and to start firing up the sprite multiplexing interrupts.
And then another in the bottom of the screen for scorepanel display & playing
music.
Now, whenever either the top or bottom interrupt has completed processing,
we do, before exiting it:
dec $d019 ;Acknowledge raster interrupt
cli ;Allow further interrupts
jmp frameupdate ;Jump to the frame update code
The frame update IRQ code itself must not be re-entered, so the first thing
it has to do is to maintain an execution counter like:
frameupdate:
inc exec_count
lda exec_count ;If already executing, skip the update
cmp #$02 ;code
bcs skip
<actual frame update code here>
skip: dec exec_count
pla ;Exit the interrupt
tay
pla
tax
pla
rti
We see that the frame update code can be invoked from either the top or the
bottom interrupt. So it must maintain some kind of internal state about what
it is going to do next. And whenever it can't do anything useful for the
time being, it exits. For example a color-RAM update is only sensible to do
after the bottom interrupt (must not be visible!) This complicates things
a bit...
A problem with this "multitasking" approach is that each of the code areas
must maintain their own set of sprite information, so that they don't step
on each other's toes, causing wrong sprite movement/animation to be shown.
You might already be familiar with this if you've done doublebuffered sprite-
multiplexing; this is just extending that idea one more layer.
Doing movement/logic only on each 4 frames is already quite heavy interpolation.
The actual motion happens in big steps. I do this in MW4's current work version
that runs at 50Hz, unlike the preview. I was personally worried that control
might feel too lagged with this low update rate so I got myself some
guinea pigs (thanks CreaMD and Pixman) but after they confirmed that it
didn't feel too lagged I carried on with my plans.
6. Conclusion
-------------
I hope these ideas were interesting. Any example code would be as big as an
entire game engine itself so I'll leave that out :), but if things go well I
will soon release the 50Hz-version of the MW4 engine for public use with full
sources, and that contains the "advanced interpolation & re-entrant IRQs" idea
fully implemented.
The greatest thing about "advanced interpolation" is that you don't have to
make compromises. I was going to perhaps have 50Hz update, but only block-
colors in MW4. The next day I altered the engine for 25Hz full-color again,
and kept juggling between the alternatives. :) Then at one point I realized
that I need packed sprites (depacked realtime for use), and thought that now
the 50Hz option was lost forever. But this method gave me possibility to do
everything I wanted (50Hz, full-color scrolling, realtime sprite-depacking)
+ run some heavy AI routines & virtual machines on top of it!
Of course, MW4 still slows down occasionally when the CPU load is just too
much (6+ Ninjas / Agents onscreen, or something like that.) This isn't
about doing miracles either :)
My projects tend to be quite heavy on the AI & movement code, as I don't like
simplifying things or doing them incorrectly for the sake of speed. So I
wonder what could the results be if a Turrican/Enforcer-like game (simple
AI, no sprite depacking, no virtual machine execution :) but TONS of sprites
& movement) used this interpolation idea. Maybe that's something for
Protovision to ponder. This technique is completely free for commercial use :)
Lasse Öörni
loorni@student.oulu.fi