Sunday, November 2, 2014

Trying Out Property-Based Testing in Clojure

I am taking Erik Meijer’s Introduction to Functional Programming course on edX. In the homework for the lesson on list comprehensions, the Caesar Cipher is introduced. The way the cipher works is that each character in a message is shifted a number of characters. If the number is 3, then a becomes d, b becomes e, and it wraps around so z becomes c.

Implementing the cipher in Clojure seems like a good project to try out test.check, a property based testing library. With traditional unit testing it is up to the programmer to come up with example cases and their expected results. In property based testing, the programmer describes the relationship between the inputs and outputs of a function and specifies the valid inputs and the testing tool generates the examples itself.

It was really easy to get started, I just added test.check to my dependencies, and then referred the appropriate namespaces in to the core_test.clj file that leiningen creates by default. Tests can be run either from the repl or with lein test.

Phase One
In the project, the code is in core.clj, and the tests are in core_test.clj.

My implementation of the Caesar Cipher is only going to work with lowercase letters. The first thing I need to do is translate a character to a number between 0 and 25. 

(defn let-to-num [l]
  (- (int l) (int \a)))
 
To test the function, I am going to tell test.check to generate random letters. Because I only want to test lowercase letters, I am going to use a lower-case? function as a filter:

(defn lower-case? [c]
  (and (>= (int c) (int \a))
       (<= (int c) (int \z))))

(defspec let-to-num-returns-0-to-25
  100
  (prop/for-all [c (gen/such-that lower-case? gen/char-alpha 100)]
    (let [v (let-to-num c)]
      (and (>= v 0)
           (< v 26)))))

defspec is used to hook the test into the clojure.test runner. I name the test, then specify that I want it to be run 100 times each time the tests are run. prop/for-all specifies that for all of the values in the binding, I want the expression after the binding to evaluate to true. For each iteration of the tests, c is going to be bound to a value generated by test.check.

The base filter I am using is char-alpha which generates a random letter that can be either uppercase or lowercase. gen/such-that takes a predicate that can be used to filter the values from a generator. The optional third parameter specifies the number of times the generator should attempt to generate a qualifying value before throwing a runtime error. By specifying 100, I am saying that the generator should try to generate a lowercase letter 100 times before giving up and throwing a runtime exception. 100 is a ridiculous number of times to try a 50-50 proposition, but I did have one run where I got a runtime exception using the default value of 10. 

lein test produces the following output:

lein test caesar-cipher.core-test
{:test-var let-to-num-returns-0-to-25, :result true, :num-tests 100, :seed 1414982695288}

Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

For each test, we can see the result and how many times each test was run. The :seed value is provided so if there is a failing test we can supply that seed value to the test function and it will run the tests against the same test cases that were generated when the tests failed. This allows us to ensure that tests are passing because the problem was actually fixed, not because we got lucky and didn’t hit the failing test case the next time.

Adding a number-to-letter function provides a new relationship to test. Translating from a letter to a number and back again should yield the original letter. Because I am again using the lower-case letter generator, I have refactored it into its on def.

(defn num-to-let [l]
  (char (+ l (int \a))))

(def gen-lower-case (gen/such-that lower-case? gen/char-alpha 100))

(defspec let-to-num-to-let
  100
  (prop/for-all [c gen-lower-case]
    (= c (num-to-let (let-to-num c)))))

Adding a shift function gives another relationship that should yield the initial value after a round trip. Shifting down 3 letters the result of shifting up 3 letters should yield the original result.

(defn shift [n i]
  (+ i n))

(defspec shift-round-trip
  100
  (prop/for-all [n (gen/elements (range 26))
                 c gen-lower-case]
                (= (num-to-let (shift (* -1 n) (shift n (let-to-num c))))
                   c)))

Testing shift requires a new generator. gen/elements selects a random element from a collection, in this case (range 26), which is the full range of translations we want to allow. 

The encoding is a 3 step process: convert to int, shift, convert to letter. While not strictly necessary, it is nice to have a function that aggregates these steps. The test just duplicates the logic of the translate (xlate) function, but because we have the test, if later we find a more efficient way to perform the translation we can still rely on the known good algorithm to validate it.

(defn xlate-let [n let]
  (->> (let-to-num let)
       (shift n)
       (num-to-let)))

(defspec xlate-let-combines-the-steps
  100
  (prop/for-all [n (gen/elements (range 26))
                 c gen-lower-case]
                (= (num-to-let (shift n (let-to-num c)))
                   (xlate-let n c))))

All that remains is to translate a whole phrase:

(defn translate [n phrase]
  (for [letter phrase]
    (xlate-let n letter)))

(defspec translate-round-trip
  100
  (prop/for-all [n (gen/elements (range 26))
                 v (gen/vector gen-lower-case)]
                (= (translate (* -1 n) (translate n v))
                   v)))

We have added another generator here. gen/vector generates a vector containing 0 or more elements from the provided generator, in this case gen-lower-case.

Running all 5 tests yields this result:

lein test caesar-cipher.core-test
{:test-var xlate-let-combines-the-steps, :result true, :num-tests 100, :seed 1414985079696}
{:test-var shift-round-trip, :result true, :num-tests 100, :seed 1414985079723}
{:test-var translate-round-trip, :result true, :num-tests 100, :seed 1414985079734}
{:test-var let-to-num-to-let, :result true, :num-tests 100, :seed 1414985079784}
{:test-var let-to-num-returns-0-to-25, :result true, :num-tests 100, :seed 1414985079790}

Ran 5 tests containing 5 assertions.

Phase Two
The tests are passing, but I am not happy with the results. The translate function is passed a string but it returns a sequence of characters:

(translate -13 (translate 13 "Property-based testing is fun."))
(\P \r \o \p \e \r \t \y \- \b \a \s \e \d \space \t \e \s \t \i \n \g \space \i \s \space \f \u \n \.)

Also, translate will yield unprintable characters:  
(translate 13 "Property-based testing is fun.")
(\] \ \| \} \r \ \ \ \: \o \n \ \r \q \- \ \r \ \ \v \{ \t \- \v \ \- \s \ \{ \;)

For phase two of this project, I want to only translate lowercase letters, I want the result of translating a lowercase character to always yield a lowercase character and I want the result of translation to be a string, not a sequence of characters. Each of these are testable properties.

Adding the test that shift returns a number between 0 and 25 (which we translate to lowercase letters) lets us see what a failing test looks like.

(defspec shift-returns-0-to-25
  100
  (prop/for-all [n (gen/elements (range 26))
                 c gen-lower-case]
    (let [v (shift n (let-to-num c))]
      (and (>= v 0)
           (< v 26)))))

{:test-var shift-returns-0-to-25, :result false, :seed 1414986620265, :failing-size 2, :num-tests 3, :fail [23 l], :shrunk {:total-nodes-visited 19, :depth 3, :result false, :smallest [15 l]}}

Here we see that we failed when we tried shifting the character ‘l’ 23 characters. test.check has a feature called shrinking, that tries to find the smallest value for an input that will cause the test to fail. Here we see that test.check has determined that shifting the letter l by 15 is the minimum value that will cause it to exceed 25. L, as the 12th letter has a value of 11, since we start counting at 0. In this case, it is not an important feature, but if we were testing complicated data structures it would be really nice to have our failing cases simplified for us.

This problem is easily solved:

(defn shift [n i]
  (mod (+ i n) 26))

To test that we are translating only lower-case letters, we can generate a string and compare the non-lowercase letters to the non-lowercase letters of the translated string.

(defspec translate-only-lowercase
  100
  (prop/for-all [n (gen/elements (range 26))
                 s gen/string-ascii]
    (= (remove lower-case? s)
       (remove lower-case? (translate n s)))))

For this test to work, instead of generating lowercase letters, we will use the build in gen/string-ascii.

To implement this, I will move the lower-case? function to the core namespace (since I am doing :refer :all in the test namespace, it will still be available in my tests).

(defn translate [n phrase]
  (for [letter phrase]
    (if (lower-case? letter)
      (xlate-let n letter)
      letter)))

For the last step the translate-round-trip test needs to be modified to generate strings instead of vectors so that the translate function will only pass the tests if it returns a string.

(defspec translate-round-trip
  100
  (prop/for-all [n (gen/elements (range 26))
                 v gen/string-ascii]
    (= (translate (* -1 n) (translate n v))
       v)))

And modify the function to return a string:

(defn translate [n phrase]
  (apply str
         (for [letter phrase]
           (if (lower-case? letter)
             (xlate-let n letter)
             letter))))

Now we have 7 passing tests and if we call our translate function, the results are more satisfying:

(translate -13 (translate 13 "Property-based testing is fun."))
"Property-based testing is fun."

(translate 13 "Property-based testing is fun.")
"Pebcregl-onfrq grfgvat vf sha."

A Few More Tests
For the sake of completeness, there are 3 more tests I want to write. I want to test the lower-case? function and I want to test that if translate is called with 0, the translated string equals the original string, and I want to test that if translate is called with a number other than 0 that the translated string does not equal the original string.

(defspec lower-case?-works
  100
  (prop/for-all [c gen/char-ascii]
    (= (lower-case? c)
       (and (>= (int c) (int \a))
            (<= (int c) (int \z))))))

(defspec translate-0-change-nothing
  100
  (prop/for-all [s gen/string-ascii]
    (= s (translate 0 s))))

(defspec translate-changes-things
  100
  (prop/for-all [n (gen/elements (range 1 26))
                 v (gen/vector gen-lower-case)]
    (not (= v (translate n v)))))

Notice, for the test that translate really does change its inputs, I had to go back to the lower-case generator. If I had used the string generator, sometimes the generator would have produced strings that had no lowercase values, and the tests would have failed because there was nothing to translate.

Conclusion
My core namespace has 6 functions in it. The 10 tests in the test namespace test the return values for all legal inputs to these 6 functions. How many unit tests would it take to do that? How would you know if you had tested all of the possible inputs?

If you did manage to write unit tests to cover every possible case, how many tests would you have to change when refactoring rendered the tests obsolete? I don’t know if property-based tests will be more resilient against such changes, but I do know that when I have to change my tests, I will have a lot fewer tests to change.

Today is my first day writing property-based tests. And right away I am a believer. I am completely confident in my code. It is a nice feeling.

You can find the code for this project on github.