Developing a BASIC language interpreter in 2025

Mattel Electronics Intellivision along the ECS running Jetsons

Recently, I had the chance to get an Intellivision II console along the ECS (Entertainment Computer System) and keyboard. I found and typed a game Bomb Run 1, using its integrated BASIC language, and I was pretty surprised to see how incredibly slow is it.

Apparently, Mattel Electronics only developed the ECS to avoid paying a daily penalty of $10,000 dollars to the USA government because they were advertising a keyboard component that wasn't yet available. They developed the ECS in secret, while putting all their money in the ill-fated Keyboard Component which was a full 6502 computer around the Intellivision with an officially licensed MS-Basic.

After getting the ECS on sale, Mattel forgot completely about it. It is a shame, because the keyboard is reasonable enough, it looks nice, and it could have been a nice starter BASIC platform.

Anyway, the turtle speed intrigued me as I was pretty convinced that the Intellivision processor could do faster floating-point, so why not write my own extended BASIC interpreter?

For the remaining of this article, the original Mattel ECS BASIC will be simply called ECS BASIC.

System background

The implementation of my extended BASIC interpreter is for a General Instruments CP1610 processor. This is a 16-bit processor introduced in 1975, with a resemblance to the PDP-11. However, General Instruments didn't allowed second sources, only wanted big orders, and ignored small requests (basically shooting itself in the foot), and this processor was used widely only in the Mattel Intellivision, and it ceased production in 1985 at the same time of the Intellivision demise.

A full documentation of the instruction set is available along the jzintv emulator that can be downloaded from http://spatula-city.org/~im14u2c/intv/.

I'll be using the basic 8K words between $5000-$6fff for the BASIC language, and I'll ignore altogether the EXEC (the Intellivision BIOS) and the ECS ROMs (I didn't even bother disabling it).

I never coded a full BASIC language before, because it was not needed. In the eighties, I had access to a Z80 BASIC that was already ported to the homebrew computer I was using, and in the nineties, I managed to put floating-point in the Li-Chen Wang's Tiny BASIC, do some statement extensions for it, and also tried to do a tokenized BASIC, but I never went too far.

The floating-point core

I started in Sep/17/2025 by coding a floating-point addition subroutine. I didn't had a format in mind, it only had to be 32-bit because the CP1610 registers are 16-bit, so two registers fit nicely for keeping a floating-point number, and another two for the second operand.

The format was decided in the basis of how so easy was to extract the sign, exponent, and the mantissa with 16-bit operations, as 8-bit operations are difficult to do. This automatically discarded an IEEE-754 compatible format, and I settled for a format based on a 24-bit mantissa in the higher bits, followed by the sign bit, and the exponent in the lower 7 bits. The exponent is a bit smaller than the classic IEEE-754, but we get an extra precision bit.

The code for extracting the mantissa is pretty short:


    ANDI #$FF00,R1  ; Remove the sign and exponent.
    SETC
    RRC R0,1        ; Insert the top bit of the mantissa (fixed one)
    RRC R1,1        ; Now we have a 25-bit mantissa,
                    ; aligned at the higher bit.

I didn't define denormalized numbers, nor infinity and NaN (Not a Number), because this wasn't supported in the BASIC interpreters of the eighties.

It was easy to get the subtraction routine once I got the addition subroutine working, because you only need to flip the sign bit in the second operand.

I did later the multiplication routine, but I stumbled over a problem where sometimes the mantissa overflowed. I simply added a very complicated code to move the mantissa by one bit. It was several days later when I studied it, and I discovered you can only have a result of x+y-1 bits or x+y bits (where x is the number of significant bits in the first operand, and y is the number of significant bits in the second operand), and I could optimize it by simply inserting an extra zero bit at the left to account for the carry.

Of course, I made a small test program to check for the validity of the arithmetic operations with several cases. It took me four days to code the fully functional floating-point library.

The interface

I couldn't start a BASIC interpreter without the keyboard reading code, and terminal-style output. Fortunately, Joe Zbiciak (intvnut) already had developed routines for reading the ECS keyboard, and I integrated these with a ROM header, adding along terminal handling for displaying letters, scrolling the screen, and moving the cursor.

With all this integrated I had a dumb terminal working, you type anything on the keyboard, and you get the same keys displayed on the screen. This was Sep/19/2025.

The ECS keyboard.

Inner guts

The CP1610 processor cannot address directly the internal memory in byte terms, instead everything is handled by full words. I had to take this in account for my tokenized BASIC representation. A standard Intellivision doesn't have enough memory for a BASIC interpreter, so the ECS BASIC included 2K of 8-bit RAM.

However, a few years ago, the JLP-Flash cartridge was manufactured and it provides 8K of 16-bit RAM over $8000-$9fff, so for my extended BASIC this was excellent.

When I talk about tokenization, I mean that all the language's reserved words are represented with a token. This speeds up the execution of the language, as it doesn't have to run a word match every time.

My first version of the internal representation for BASIC lines was the line number as a word, followed by a pointer to the next line, followed by the tokenized BASIC code for the line, ended with a zero word.

As I coded the line insertion routines in Sep/22/2025, I discovered the pointer to the next line wasn't a good idea, because it needed to move every pointer after a line insertion. Instead, I converted the pointer into a length (the number of words used by the tokenized line). This allowed for a very compact code to jump over lines:


    INCR R4     : Jump over the line number.
    ADD@ R4,R4  ; Add the tokenized length to the current pointer
                ; Et voila! It jumped over the line.

With the line insertion routines completed, I went to implement the BASIC tokenization subroutine. I decided against handling tokenization byte-per-byte, and instead made each token a word. Of course, it is wasted space if you are using strings, but it is faster on execution. Token numbers start at $0100. It only remained to interface the input with the new routines.

I decided to read the text directly from the screen, very unplanned, and probably buggy, but it has worked for the current time. And maybe later I'll extend it for a full-screen editor.


keywords:
	DECLE ":",0	; $0100
	DECLE "LIST",0
	DECLE "NEW",0
	DECLE "CLS",0
	DECLE "RUN",0	; $0104
	DECLE "STOP",0
	DECLE "PRINT",0
	DECLE "INPUT",0
	DECLE "GOTO",0	; $0108
	DECLE "IF",0
	DECLE "THEN",0
	DECLE "ELSE",0
	DECLE "FOR",0	; $010C
	DECLE "TO",0
	DECLE "STEP",0
	DECLE "NEXT",0
	DECLE "GOSUB",0	; $0110

Excerpt of the tokenization table.

Execution

Now I was able to edit, correct, and delete BASIC code lines. The next logical step was the execution of the program. I implemented RUN by reading each program line sequentially, and each token found choose directly the command to execute.

My first program was simply 10 CLS and I was happy when I typed RUN and the screen was cleared.

This was followed shortly by PRINT and GOTO. Where PRINT was only capable of putting a string on the screen, and GOTO changed the execution flow. I added a check for the Esc key to exit an infinite loop.

I was also pretty impatient to see if my extended BASIC language was speedier than the ECS BASIC, so I decided to implement IF, and a small expression parser supporting the relational operators, and the basic arithmetic operators (+, -, * and /), along numbers and variables.

The numbers were simply read as integers and converted to floating-point format, while the variables used 26 double-word memory spaces covering the A to Z variables.

In order to create a loop, it was required to implement variable assignment.


    10 A=1
    20 PRINT "Hello"
    30 A=A+1
    40 IF 6>A THEN 20

The tokenization of the BASIC program.

For some reason, I couldn't type the less-than operator with the emulated ECS keyboard. Later, I discovered that intvnut missed the character in the Shift table, and it was a matter of a simple fix.

It was past midnight when I finally could try the benchmark. As I didn't had yet a FOR statement, I had to replicate it using increment and comparison.

I ran it, and I was amazed when I discovered it took only 15 seconds. In the ECS BASIC it takes 210 seconds! There are screenshots of the programs in the git.

More floating-point curiosities

This wasn't the first time I programmed a floating-point package. My first one was for a Z80-based computer, I don't remember if it was complete, if it had bugs, or if it was actually used. What I can remember is that I was never able to make a proper subroutine for displaying floating-point numbers. I got stuck with a simple conversion to integer, and printing the integer.

The display of a number followed by fraction and exponent, for me was closer to black magic than anything. I believed that a single routine did everything, but I was wrong. And I came to illumination by reading a Commodore 64 BASIC manual, it says something like numbers in this range are displayed complete, while in other cases the number will be displayed in exponent format.

This triggered a pattern in my mind: If the whole integer fits in the mantissa, display it alongside a small fraction, and if the number doesn't fit, make it bigger or smaller so it fits in an integer, and this one can be displayed in exponent format.

The algorithm is as follows:

The 25-bit mantisa allows integers up from 0 to 33554431. We limit it to the biggest integer all nines, or 9999999.
If the floating-point number is less than 10,000,000 then the integer part is displayed, and then it gets 2 fraction digits.
If the floating-point number is less than 0.01 then it is multiplied until it reachs the range 1,000,000 - 9,999,999. The first digit will be the integer part, the following digits will be the fractional part, and the exponent will be displayed along.
If the floating-point number is greater than 9,999,999 then it is divided by 10 until it fits the same range. And again display like in step 3.

And this way, thirty years later, I discovered printing floating-point numbers isn't so obscure as I believed, but indeed it has a lot of magic.

Core completion

The statements GOSUB/RETURN allow to create small subroutines, and these have their own stack to keep track of where to return (a pointer plus the line number)

The FOR/NEXT loop is one of the most known statements of the core BASIC language. Implementing this required a redesign of my execution loop, because I was doing it line-by-line, but the NEXT changed the line, but on the next statement it would lost track and get back to the line following the NEXT.

The loops also require their own stack, but including the counter variable address, a pointer to the TO expression, and a pointer to the STEP expression (5 words in total)

The RUN statement was replaced with a code that runs sequentially over the tokens, and jumps over the line headers. This way is easier to change the execution flow to a new token.

I also added the negation operator (required for STEP -1) and some functions like INT, ABS, SGN, and RND. The RND function in particular allows to create little games for guessing numbers, and so.

Checking against the ECS BASIC, I was only missing READ, DATA, and REM. So I bite the bullet to implement these. Adding along RESTORE.

DIM for creating arrays was pretty easy, and I adjusted all the variable access paths of the interpreter in a way that any indexed access is the same as accessing a normal variable.

Diverging ways

At this point, my extended BASIC language was already orders of magnitude faster than the ECS BASIC, and it could be used to write little text games (well, using only numbers)

However, it didn't handled yet the controllers, sound, graphics, and sprites. The ECS BASIC had some statements for it, but the sprites cannot be defined, and instead these had to be "grabbed" from a game cartridge. Of course, the user was limited to these game sprites. Also positioning sprites was done with multiple variable assignments in an array-like style of access.

For my extended BASIC I decided for a kind of advanced statements patterned after the ones from my compiled IntyBASIC language but not exactly the same:

MODE 0,0 for setting the color stack mode.
MODE 1 for setting the foreground/background mode.
DEFINE 0,"55AA55AA55AA55AA" for defining GRAM cards.
COLOR for setting the paint color used in PRINT.
SPRITE for displaying a sprite on the screen.
WAIT for waiting the next video frame.
SOUND for accessing the sound chip.
STICK(0) for reading the 16 disc directions.
TRIG(0) for reading the side-buttons.
KEY(0) for reading the keypad.
BK(0-239) for accesing the screen.

Once these were implemented, I started coding a minimal game to test the interpreter, and I called it UFO Invasion. Of course, I found a few bugs in my interpreter and fixed them.

The game was working, and at a reasonable speed. What about testing in real hardware? I loaded the interpreter into a LTO-Flash cartridge and connected my ECS system.

My first attempt crashed continuously. I lost half an hour looking for errors, until I noticed anything crashed the interpreter. I had forgot to enable the extra RAM of the JLP cartridge. And finally it worked!

Typing the program was difficult, as the keyboard bounced a lot. This happens when you read too fast the keyboard, so fast you can see that effectively the key contact isn't perfect. I had to add a small wait before reading the keyboard, and it solved most of the problems.

At the end, my extended BASIC interpreter was coded in six days! I think it is way faster when you are enjoying programming it.

10 CLS:REM UFO INVASION. NANOCHESS 2025
20 DEFINE 0,"183C00FF007E3C000018183C3C7E7E000000183C3C3C3C7EFF2400"
50 x=96:w=0:v=0:u=0:t=159
60 SPRITE 0,776+x,344,2061
70 SPRITE 1,776+v,256+w,2066
80 SPRITE 2,1796+t,256+u,6149
90 WAIT:c=STICK(0)
100 IF c>=3 AND c<=7 THEN IF x<152 THEN x=x+4
110 IF c>=11 AND c<=15 THEN IF x>0 THEN x=x-4
120 IF w=0 THEN SOUND 2,,0:IF STRIG(0) THEN v=x:w=88
130 t=t+5:IF t>=160 THEN t=0:u=INT(RND*32)+8
140 IF w THEN SOUND 2,w+20,12:w=w-4:IF ABS(w-u)<8 AND ABS(v-t)<8 THEN
    t=164:w=0:SOUND 3,8000,9:SOUND 1,2048,48
150 GOTO 60

Partial listing of UFO Invasion on the Mattel Intellivision ECS

UFO Invasion running on the Mattel Intellivision ECS, and a partial listing of the game.

What's in the future

A big difference against "standard" BASIC is the lack of proper strings. In the ECS BASIC, you could read a string from the keyboard using GET, put it again on the screen using PUT, and only three strings variables (A$, B$, and C$). That was all!

Adding the support for standard strings would mean my interpreter could run some text-processing programs like Eliza in BASIC, and some other small games could be easily translated.

This was one of the portability things that the BASIC language had at the time, and it was used by many books in a way that the programs were written with that "core" BASIC language in mind, and these could be typed into almost any computer with a decent interpreter.

The source code is released at https://github.com/nanochess/ecsbasic. I tried to release it so early as possible, so you can get a glance of how it was growing in the commits.

Enjoy it! Did you like this article? Invite me a coffee on ko-fi!

But... why so slow?

After publishing this article in Sep/28/2025, several people pointed to me that I didn't explained why the ECS BASIC was so slow. Truth to be told, I was so happy with my working extended BASIC that I didn't even bother to look more on the ECS BASIC.

First and all, there is a thread in Atariage about the ECS BASIC Color Patent, and the eighth post also by intvnut explains in great detail how he disassembled the code and found a terrible way of doing a shift of the floating-point accumulator.

However, there are a few other details that make it slow. For example, the extra RAM is 2K of 8 bits, and all the Intellivision memory accesses are for 16 bits (one word), so every single access to variables requires the SDBD instruction. This instruction tells the CP1610 processor to read the word in two steps.

I did my own disassembly, and after giving a look around the same zone disassembled by intvnut, I found this code that extracts the exponent of a floating-point number:


 $E1DD: PSHR R5                                     
 $E1DE: MVI@ R1,R2                                  
 $E1DF: ANDI #$007F,R2                              
 $E1E1: MOVR R2,R5                                  
 $E1E2: ANDI #$0040,R5                              
 $E1E4: BNEQ $E1E8                                  
 $E1E6: NEGR R2                                     
 $E1E7: PULR R7                                     
 $E1E8: XORI #$0040,R2                              
 $E1EA: PULR R7

It extracts the seven bits of the exponent. In the range $00-$3f makes it negative, and the range $40-$7f is converted to $00-$3f. So simply reading the exponent takes 7 instructions. Whoever developed this code didn't take in account that you could save the exponent in two's complement format offset by $40, and it is used nine times.

For comparison, my code for extracting the exponent is simply ANDI #$007F,R1.

It gets worst when I found the code calling $E1DD, and it is for extracting two exponents and doing a comparison between both:


 $E147: PSHR R5                                     
 $E148: MOVR R3,R1                                  
 $E149: JSR  R5,$E1DD                               
 $E14C: MOVR R2,R0                                  
 $E14D: MOVR R4,R1                                  
 $E14E: JSR  R5,$E1DD                               
 $E151: CMPR R0,R2                                  
 $E152: BEQ  $E159                                  
 $E154: BMI  $E15B                                  
 $E156: MVII #$0001,R0                              
 $E158: PULR R7                                     
 $E159: CLRR R0                                     
 $E15A: PULR R7                                     
 $E15B: CLRR R0                                     
 $E15C: DECR R0                                     
 $E15D: PULR R7

My code for exponent comparison is composed of only three instructions (two AND and one CMPR) This big code would be kind of reasonable if it wasn't for the fact that it is only called one time, and it is by the floating-point addition subroutine starting at $E059:


 $E067: CLRR R0                                     
 $E068: SDBD                                        
 $E069: MVII #$47D4,R3                              
 $E06C: MOVR R3,R4                                  
 $E06D: SUBI #$0007,R4                              
 $E06F: JSR  R5,$E147                               
 $E072: TSTR R0                                     
 $E073: BEQ  $E081                                  
 $E075: MOVR R4,R1                                  
 $E076: TSTR R0                                     
 $E077: BMI  $E07B                                  
 $E079: ADDI #$0007,R1                              
 $E07B: CLRR R0                                     
 $E07C: JSR  R5,$E15E                               
 $E07F: B    $E067

This routine calls the exponent comparison at $E06F, and if both are equal it jumps out to $E081, else it adjusts the exponent of one number, and repeats the comparison (notice the B $E067 instruction) The mantissa shifting routine at $E15E operates shifting in steps of 4 bits.

At $E081 (not shown), it checks the sign of the second operand, and if it is negative, it calls $E194 to do a negation of the number.

At $E091 it does the addition of the two numbers calling $E1C0, and calls $E183 to check if both signs are equal.

The code at $E0A4 reinserts the exponent in a very slow way, and it ends by calling $E21A to normalize the floating-point number, again shifting the mantissa in steps of 4 bits.

It has been just too much code yet, but let's look at the shifting routine:


 $E238: MVI@ R1,R2                                  
 $E239: MOVR R2,R5                                  
 $E23A: ANDI #$000F,R2                              
 $E23C: SLL  R2,2                                   
 $E23D: SLL  R2,2                                   
 $E23E: XORR R4,R2                                  
 $E23F: MVO@ R2,R1                                  
 $E240: MOVR R5,R2                                  
 $E241: ANDI #$00F0,R2                              
 $E243: SLR  R2,2                                   
 $E244: SLR  R2,2                                   
 $E245: MOVR R2,R4                                  
 $E246: DECR R1                                     
 $E247: SDBD                                        
 $E248: CMPI #$47CD,R1                              
 $E24B: BNEQ $E238

It takes a byte, shifts it left 4 bits, inserts the carry, and copies the extra 4 bits as the new carry. I couldn't resist showing how it could be made a lot smaller and faster this way:


LE238:
    MVI@ R1,R2  ; R2 = 0x00ff
    SLL R2,2
    SLL R2,2    ; R2 = 0x0ff0
    XORR R4,R2  ; R2 = 0x0ff0 + carry
    MVO@ R2,R1  ; This saves the low byte.
    SWAP R2     ; R2 = 0xf00f
    ANDI #$000F,R2
    MOVR R2,R4  ; R4 = 0x000f
    DECR R1
    SDBD
    CMPI #$47CD,R1
    BNEQ LE238

This saves four instructions in the loop, and it is faster.

The ECS BASIC has a mantissa of six bytes, so it has more precision than my extended BASIC (three bytes), but it is done in a very slow way!

For sure the ECS BASIC could be optimized to run at least two times faster.

Exact cycles

Ok, but I haven't yet answered a simple question. How many cycles takes adding 3.0 and 7.0?

I ran the jzintv emulator with this command line (remember we need a game cartridge so the ECS BASIC works):

./jzintv -d -s1 -z3 Basketball.bin

I entered the R command to make it run the ECS BASIC. When I reached the ECS BASIC, I typed PRIN 3+7 and before pressing Enter, I went to the debugger window and pressed Ctrl+C. Then I put breakpoints at selected places (start of the floating-point addition, the place where it calls the addition, and the return instruction):

B E060
B E091
B E0BA

Again, I typed the R command, using along M47C0 to watch the memory addresses where floating-point addition happened until I saw the numbers 3 and 7.

I took note of the cycle number at the right. The operation 3+7 takes exactly 1558 cycles. By the way, the interpreter executes further three floating point additions when processing the numbers, and another two for displaying the number.

jzintv debug window with the ECS BASIC. The cycle count at the right shows how much time takes a floating-point addition.

Now, let's do the same with my extended BASIC interpreter.

After assembling with as1600, I generate a .lst file where I searched for the fpadd label (start) and fpadd.2 label (return) The addresses are $61d1 and $625b.

I ran again the jzintv emulator with the debugging option, and I typed PRINT 3+7, and before pressing Enter, I stopped the debugger using Ctrl+C, and I setup these breakpoints:

B 61D1
B 625B

Then I entered the command R, and pressed Enter on the BASIC screen. It took me four R commands to see the values 3 and 7 in the registers, and the result 10.

And the total cycles used were 479. This means that only in the floating-point addition the Mattel ECS BASIC is five times slower.

jzintv debug window with my extended ECS BASIC. The cycle count at the right shows how much time takes a floating-point addition.