Tuesday, January 18, 2022

Making a Good Thing Better: A Forth Unit Testing Framework

Regardless of the programming language you're coding in, unit testing is an obvious best practice. However, given Forth's anything goes philosophy, testing goes from nice to have to the only sane way to code without losing your mind.

Testing Without a Framework

The interactive nature of Forth, the ease of word definition and the handy assert word in Gforth provides most of what I needed for testing my code.

Consider these tests for coin.fs, a module that simulates coin flips.

: test-coin-basics ( -- )
    assert( heads heads? )
    assert( tails tails? )
    assert( heads tails? false  = )
    assert( heads coin? )
    assert( tails coin? )
    assert( 100 coin? false = )
    assert( flip coin? )
;

0 value #heads      0 value #tails
: close-enough? ( x y -- b )
    - abs 30 < ;

: test-coin-flip-test ( -- )
    randomize
    0 to #heads
    0 to #tails
    assert( #heads 0 = )
    assert( #tails 0 = )

    100 0 +do
        flip
        heads? if 1 0 else 0 1 endif
        #tails + to #tails
        #heads + to #heads
    loop

    assert( #heads  #tails close-enough? )
;

\ Run the tests!
test-coin-basics test-coin-flip-test

By executing require on this test file, the tests are not only defined but executed. If there's a failure in the test, the system will alert you via a failed assertion. Otherwise, the tests will quietly succeed.

This approach works well, but there are a few minor details that nag at me. I don't love that I have to duplicate the names of the tests: first to create them, and then to execute them. And I don't love that I'm defining test words in the global scope which may be unnecessary for the rest of the application, or may conflict with other tests defined later.

The latter problem I could solve by defining the tests as private words in a module. But the issue of having to name tests, and then repeat that name below still stands.

What I wanted was to borrow a page out of my lightweight PHP unit test framework. A central idea there is that tests are not named; they're simply functions added to a list that can be executed later by a test runner.

Somewhat surprisingly, Gforth makes it almost trivial to implement this model.

Testing With a Framework

Here's how the above tests look when they're taking advantage of my lightweight testing framework:

\ test out coins.fs

:test
    assert( heads heads? )
    assert( tails tails? )
    assert( heads tails? false  = )
    assert( heads coin? )
    assert( tails coin? )
    assert( 100 coin? false = )
    assert( flip coin? )
;

0 value #heads     0 value #tails
: close-enough? ( x y -- b )
    - abs 30 < ;

:test
    randomize
    0 to #heads
    0 to #tails
    assert( #heads 0 = )
    assert( #tails 0 = )

    100 0 +do
        flip
        heads? if 1 0 else 0 1 endif
        #tails + to #tails
        #heads + to #heads
    loop

    assert( #heads  #tails close-enough? )
;

The defining word :test* creates a new, anonymous test which will be executed when run-tests is invoked. Not only did I avoid having to duplicate the name of the test to execute, I didn't have to name the test in the first place. Whoo!

shuffler.fs uses the functionality in coins.fs, and so it therefore requires its tests as well.

\ Import libraries
require lib/modules.fs
require lib/utils.fs
require lib/arrays.fs
require lib/strings.fs
require lib/random.fs
require lib/testing.fs
require lib/coins.fs   \ <<< The Code
require lib/cards.fs
require lib/decks.fs
require lib/shuffle.fs  

\ Import tests
require tests/utils.fs 
require tests/modules.fs
require tests/cards.fs
require tests/coins.fs \ <<< The Tests
require tests/decks.fs
require tests/shuffle.fs 
require tests/random.fs

\ Run the tests
cr run-tests cr cr

\ Top level code that uses the libraries
new-deck shuffle .deck cr
new-deck 7*shuffle .deck cr cr

When I execute shuffler.fs I see the following messages:

Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
s" /home/ben/dt/i2x/code/src/master/forth/shuffler.fs" included redefined .card  
0 Failures, 13 Tests
...

The first four lines of output are typical for Gforth. The 0 Failures, 13 Tests message indicates that my test all ran and there were no failures.

This approach bakes running the tests into every execution of my Forth project, and does so in a streamlined way.

If I want to be verbose, it's possible to ask the test framework for the status of every test that was run:

tests. 0  OK /home/ben/dt/i2x/code/src/master/forth/tests/utils.fs:5 
1  OK /home/ben/dt/i2x/code/src/master/forth/tests/utils.fs:12 
2  OK /home/ben/dt/i2x/code/src/master/forth/tests/utils.fs:22 
3  OK /home/ben/dt/i2x/code/src/master/forth/tests/utils.fs:33 
4  OK /home/ben/dt/i2x/code/src/master/forth/tests/modules.fs:20 
5  OK /home/ben/dt/i2x/code/src/master/forth/tests/modules.fs:28 
6  OK /home/ben/dt/i2x/code/src/master/forth/tests/cards.fs:3 
7  OK /home/ben/dt/i2x/code/src/master/forth/tests/coins.fs:3 
8  OK /home/ben/dt/i2x/code/src/master/forth/tests/coins.fs:20 
9  OK /home/ben/dt/i2x/code/src/master/forth/tests/decks.fs:3 
10  OK /home/ben/dt/i2x/code/src/master/forth/tests/decks.fs:13 
11  OK /home/ben/dt/i2x/code/src/master/forth/tests/shuffle.fs:6 
12  OK /home/ben/dt/i2x/code/src/master/forth/tests/random.fs:3 
 ok

The number in the first column can be used to re-execute a test, should I want to do so interactively:

8 run-test
8 run-test OK ok

These extra capabilities are a nice bonus, but ultimately, it's that one terse line telling me that all the tests passed that really saves the day. As soon as I see a reported failure, I stop what I'm doing and focus on fixing that issue.

A Surprisingly Simple Implementation

Building this unit test framework mirrored the development of my Forth module system. I struggled with a number of false starts, and when I finally figured out an ideal approach, the code came together effortlessly.

The :test defining word is a core part of the system, and it's delightfully short:

: :test ( -- )
  noname : latestxt register-test ;

:test makes use of Gforth's noname word, which has the following characteristics:

The next defined word will be anonymous. The defining word will leave the input stream alone. The xt of the defined word will be given by latestxt.

The anonymous function's execution token, created by noname is handed to register-test which stores it away for later use.

run-tests loops through the registered execution tokens and calls catch on them. catch leaves a value on the stack as to whether the word executed without raising an exception. This value is stored in outcomes, which can be inspected later.

: run-tests ( -- )
    #tests 0 +do
        i tests @ catch i outcomes !
    loop .summary ;

.summary prints out a summary of the return codes in the outcomes array.

You can find the unit testing framework's source code here. I love that I've simplified test definition, and that test execution becomes a seamless part of app development. It's also bonus nice that I've simplified the process of coding in Forth without losing my mind.


*I'm still not sure if the best convention is: :test or test:. Oh well; naming.

No comments:

Post a Comment