Begriffs

The Design and Use of QuickCheck

2017-01-14

QuickCheck is the grandfather of property-based testing libraries. Despite being imitated in over thirty languages, the original implementation remains pre-eminent due to the type system and consistent logic of the Haskell language in which it is written.

Prior randomized testing tools required learning a special language and grammar to generate complex test cases. QuickCheck embeds the generator language in Haskell for a succinct domain specific language that reduces the learning curve and build system dependencies.

Property-based testing is enormously effective and it pays to learn to use it well. Getting started with QuickCheck is easy, but there is a lot to learn under the surface to achieve greater customization and better results. The typical examples you’ll find online have deceptively simple code (usually involving list reversal). The examples make it hard to learn how their succinct tests are possible, and how to customize them when things go wrong. This guide puts it all together piece by piece so you can use QuickCheck with confidence.

First things first, what is property-based testing? A property of a program is an observation that we expect to hold true regardless of the program’s inputs. It may involve only the output (“always outputs a positive number”) or compare input and output (“preserves list length”) or even assess external effects (“matches the output of a trusted external program”).

That is, instead of writing individual unit tests consisting of certain expected input-output pairs of a function, we write the desired properties of functions. Then QuickCheck automatically generates random values which can be run to verify (or rather, falsify) the property.

Using QuickCheck, the developer is forced to think about what the code should do at a high level rather than grunt out a few unmotivated test cases. The tool finds corner-cases where the specification is violated, which leads to either the code or the specification getting fixed. Additionally the specifications persist as rich machine-checkable documentation.

Proponents of formal methods sometimes stress the notion of specification above that of implementation. However it is the inconsistencies between these two independent descriptions of the desired behavior that reveal the truth. We discover incomplete understanding in the specs and bugs in the implementation. Programming does not flow in a single direction from specifications to implementation but evolves by cross-checking and updating the two. Property-based testing quickens this evolution.

Of course the only way to truly guarantee properties of programs is by mathematical proof. However property-based tests approximate deductive confidence with less work by checking properties for a finite number of randomized inputs called test cases.

Methodology of Testing

Given that property based tests can discover only the presence of bugs and never their absence, when should tests be considered adequate? One criterion is code coverage: that every program statement be executed at least once. QuickCheck does not measure code coverage, nor does it knowingly generate test data to exercise particular code paths. When running a test suite you should enable a code coverage report.

# Be sure to enable a code coverage report
# https://docs.haskellstack.org/en/latest/coverage/

$ stack test --coverage

Random data tends to exercise small programs considerably, so applying QuickCheck against one small function at a time is a good way to build total coverage. Also test cases whose statistical distribution follows that of actual data tend to generate higher coverage. For example, think of a function that manipulates email addresses. The vast majority of long random strings will exercise only the invalid email code path. Realistic data will exercise more. QuickCheck’s built-in test case generators use a uniform distribution but the library provides a way to customize generation.

Note there are situations when random test cases are not the most effective tool. QuickCheck will be less definitive than an alternative like SmallCheck for functions which take a finite number of possible input values. SmallCheck generates not just a random set of test cases but all possible cases beneath a given “size.” For example all Unicode strings of length ten or less. If a function has a finite (and small enough) number of possible input values then SmallCheck can prove conclusively when properties hold.

Generating Random Test Cases

As discussed, QuickCheck generates random test cases to falsify program properties. It uses a newtype with a single type argument, Gen a, to generate values of type a. Gen wraps a function mapping a random number generator and a size parameter to a value of a given type.

newtype Gen a = MkGen {
    unGen :: QCGen -> Int -> a
  }

QuickCheck provides combinators to obtain and modify Gen values. For instance here are two basic functions to make Gens:

-- Generate a random element in the given inclusive range
-- (using Random from the "random" package)
choose :: Random a => (a, a) -> Gen a

-- Pick elements at random from a (non-empty) list
elements :: [a] -> Gen a

You get values out of Gen by using generate. It runs in the IO monad so that it can advance along an internal sequence of pseudo-random values.

-- generate :: Gen a -> IO a

-- produce 1, 2, or 3
generate $ elements [1,2,3]

-- produce a lowercase letter
generate $ choose ('a', 'z')

-- produce a constant value (since Gen has a Monad instance)
generate $ return 1

So far the grammar is fairly limited, but QuickCheck uses a typeclass called Arbitrary to produce generators for more varied types.

class Arbitrary a where
  arbitrary :: Gen a

Generally it’s a Haskell convention that typeclass methods are inter-related by laws (like m >>= return ≡ m for Monads), but Arbitrary doesn’t follow that convention. It’s an “unprincipled” typeclass in this regard, but still useful.

You can think of Arbitrary as providing common-sense default generators for basic types. You’re still free to create your own generators or use those from third-party libraries, but QuickCheck provides plenty of Arbitrary instances. Compare two ways to generate the same distribution of values:

-- manually generating random Bool
generate $ choose (False,True)

-- ...vs using the Bool instance of Arbitrary
generate (arbitrary :: Gen Bool)

QuickCheck add-on libraries differ in their approach to providing new Gen types. Some (like quickcheck-unicode) provide functions which return Gen directly, and others (like quickcheck-instances) introduce newtypes with Arbitrary instances.

Type inference allows Arbitraries to combine recursively and easily build more complicated generators from simpler ones. This compounding means it is helpful to take the time to create Arbitrary instances for application-specific data types so they can be more easily incorporated into property tests.

-- there are many recursive instances like these
-- Arbitrary a => Arbitrary [a]
-- (Arbitrary a, Arbitrary b) => Arbitrary (a, b)

-- QuickCheck uses them to build things like this
generate (arbitrary :: Gen [(Int, Bool)])

Gen is also an instance of Applicative, so it can be applied to fill the fields of custom datatypes.

data MyType = MyType {
    foo :: Int
  , bar :: Bool
  , baz :: Float
  } deriving (Show)
  
generate $ MyType <$> arbitrary <*> arbitrary <*> arbitrary

If you’re truly lazy you can avoid repeatedly applying arbitrary by using the generic-random package.

{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
import Generic.Random.Generic

data MyType = MyType {
    foo :: Int
  , bar :: Bool
  , baz :: Float
  } deriving (Show, Generic)

generate (genericArbitrary :: Gen MyType)

To generate more complicated recursive test cases, use the Monad and Applicative instance of Gen. For instance,

-- Randomly use one of several generators with
-- oneof :: [Gen a] -> Gen a

myList :: Arbitrary a => Gen [a]
myList = oneof
  [ return []
  , (:) <$> arbitrary <*> myList
  ]

This generator tends to produce very short lists, many of them empty. Statistically that’s how it works when at each step the function is fifty-percent likely to terminate. Rather than use oneof to choose generators we can skew the probability.

-- like oneof, but with weighted distribution
-- frequency :: [(Int, Gen a)] -> Gen a

myList' :: Arbitrary a => Gen [a]
myList' = frequency
  [ (1, return [])
  , (4, (:) <$> arbitrary <*> myList')
  ]

Now the function is four times as likely to proceed at every step, generating longer lists on average. Between myList and myList' – one which generates short lists and the other slightly longer – there is a common pattern. It would be more flexible to have an adjustable size.

In general when generating unbounded types like Integer or list the question naturally arises: how large an integer or long a list will we choose? It’s impossible to pick from among all integers with equal probability. QuickCheck resolves this conundrum by including the size parameter in a generator. Remember the Gen type wraps a function whose second parameter is an Int which holds the “size.” Some generators such as choose (0, 9) disregard it, but others consult it.

We can modify our list generator to heed its size parameter. We’ll tie the size to the likelihood of continuing to build a list and use the sized function to read the parameter.

-- sized :: (Int -> Gen a) -> Gen a

flexList :: Arbitrary a => Gen [a]
flexList = sized $ \n ->
  frequency
    [ (1, return [])
    , (n, (:) <$> arbitrary <*> flexList)
    ]

The size parameter is especially useful when generating structures like trees which are less likely to terminate when randomly spawning new branches. The size can be decreased deeper in the tree to stop its growth.

Our old friend the generate function sets size to thirty. Notice the length of lists produced by the flexList generator by default:

-- there is a debugging function to call generate repeatedly,
-- sample :: Show a => Gen a -> IO ()

> sample (flexList :: Gen [Int])
[]
[-1]
[3,0,1,1]
[0]
[-4]
[0,7,-2,-2,2,-5,6,10,4,9,-5,-4]
[-12,-2,-1,-11,7,-3]
[-6,8,-3,-2,-6,9,-14,14,14,3,-6]
[0,-10,-14,11,-12,-12,-6,4,-6,-16,-16,-12,-9,4,-13,-7,-7,1,13]
[-16,-6,18,1,0,-17,-14,-18,2,-7,7,6,12,15]
[-12,11,-15,13,-17,-20,10,-18,-15,-5]

The size parameter applies to all generators at once in a complex result. Notice that the Int values in the generated lists above vary between -30 and 30 (in fact a strictly smaller range in the example above). Generating longer lists also generates larger numbers inside. QuickCheck is designed this way to avoid correlation between the sizes of different parts of the test data which may distort test results.

There are helper function to adjust the size parameter in a generator as well.

-- We can modify a generator to use a fixed size with
-- resize :: Int -> Gen a -> Gen a
--
-- Here's a big list of big numbers
generate (resize 1000 flexList :: Gen [Int])

-- Alternately we can alter a generator's sensitivity
-- to size. This will be useful later
generate (scale (*33) flexList :: Gen [Int])

Now that we can generate a variety of test cases it is time to move on to the fun part, testing programs.

Specifying Laws

Laws, or properties are predicates that can be checked by testing. QuickCheck represents them as Gen Result where Result holds various facts about a test including:

Is the test successful, failed, or should it be discarded?
Whether to retry the test
Statistics about test cases
Reason for failure
Exceptions thrown

At a high level each property generates a test case inside the Gen monad, evaluates the data against part of the program under test, then constructs a Result. As all this happens inside Gen, the result is of type Gen Result aka Property. To run the tests, QuickCheck generates the results and displays statistics about them.

Let’s make a Property by hand (the hard way) and then see how QuickCheck makes it easier.

-- returned values are modified versions of
-- succeeded :: Result
-- failed :: Result

prop_commutativeAdd :: Gen Result
prop_commutativeAdd = do
  (x, y) <- arbitrary :: Gen (Int, Int)
  return $ if x + y == y + x
    then succeeded
    else failed { reason = "stupid non-commutative addition" }

(This is a slight simplification in that nowadays Property is defined as Gen of a tree of Results rather than literally Gen Result, but it’s fine for illustration.)

Unfortunately prop_commutativeAdd contains a lot of boilerplate to test a simple assertion. QuickCheck offers a Testable typeclass to help us simplify. Testable is to Property as Arbitrary is to Gen. In particular Testable a provides a function property :: a -> Property. QuickCheck actually runs Testable and extracts a Property rather than expecting Property directly.

-- These two instances make life easy:
--
-- Testable Bool
-- (Arbitrary a, Show a, Testable prop) => Testable (a -> prop)

prop_commutativeAdd :: Int -> Int -> Bool
prop_commutativeAdd x y = x + y == y + x

The Testable instance of Bool is similar to what we wrote by hand for the first version of prop_commutativeAdd. With that and because Functions from Arbitrary to Testable are themselves Testable, a simple function like the one above is Testable. It’s important to see how Properties work because the flexibility of Testable makes properties so short that it obscures how things work under the hood.

In fact our example can get shorter. Simple properties like commutativity, associativity, or idempotence have helper methods in the test-invariant package.

import Test.Invariant

prop_commutativeAdd :: Int -> Int -> Bool
prop_commutativeAdd = commutative (+)

To check an individual property use the quickCheck function:

-- quickCheck :: Testable prop => prop -> IO ()

> quickCheck prop_commutativeAdd
+++ OK, passed 100 tests.

Note that properties must use monomorphic types, you can’t have a property with polymorphism like [a] -> [a] -> Bool. You must pick a monomorphic instance for which to generate test cases. Be careful which instance you choose! Observe the difference between addition of Ints vs Doubles.

> quickCheck $ associative ((+) :: Int -> Int -> Int)
+++ OK, passed 100 tests.

> quickCheck $ associative ((+) :: Double -> Double -> Double)
*** Failed! Falsifiable (after 5 tests and 188 shrinks):
-1.4210880596288213e-14
2.842174061166933e-14
128.0

Test Case Distribution and Shrinking

Now that we’ve seen the basics of properties and test case generation let’s see them in action. Imagine we want to test two functions, encode to perform Base64 encoding of binary data, and decode to restore the data back again. Let’s focus on encode first. Our goal is to determine properties of this function that hold for all inputs.

Base64 encoding breaks its input into groups of six bits, and assigns the sixty-four possible values to characters which are safe for various kinds of transmission. Now, regardless of the target alphabet, this maps each six bits to one byte, leading to an output size of ceil(4n/3) where n is the number of input bits. (The encoding pads its output with trailing equals signs to make up for input whose bit length is not divisible by six.)

Let’s express the input-output bit ratio property in QuickCheck. The encode function will deal with lazy ByteString to read raw binary data. ByteString does not have an Arbitrary instance supplied by core QuickCheck, but it’s available in the quickcheck-instances package.

import Test.QuickCheck.Instances ()
import qualified Data.ByteString.Lazy as BL

-- assuming our encode function is defined
-- encode :: BL.ByteString -> BL.ByteString
import Codec.Binary.Base64 (encode)

-- Haskell infers the type of b because it is passed to
-- to encode. The quickcheck-instances provides an Arbitrary
-- for it, and this whole property is thus Testable. That's
-- a reminder of why it is written so concisely.
prop_sizeRatio b =
  BL.length (encode b) ==
      4 * ceiling (fromIntegral (BL.length b) / 3)

If we tried to use the identity function, id :: a -> a, as our supposed encode then this test will fail with any non-empty series of bytes:

> quickCheck prop_sizeRatio

=== prop_sizeRatio ===
*** Failed! Falsifiable (after 3 tests and 1 shrink):
"\NUL"

Examine the output more carefully. We haven’t talked yet about how long and in what order QuickCheck tries test cases. The quickCheck function generates Results starting with a small size parameter, gradually increasing it. This makes it more likely to find simpler counterexamples if they exist. By default it checks one hundred cases and declares the test a success if they all pass.

In our case it found a counterexample and then started “shrinking” it to find a simpler one. Shrinking is an operation defined per-type as an optional method in the Arbitrary class. It has type shrink :: a -> [a], returning a list of simpler potential counterexamples. Candidates are tried in the order they appear in the list.

If not customized by Arbitrary instances the default behavior is to return an empty list, which will prevent shrinking. The documentation in the Test.QuickCheck.Arbitrary module is quite good, and suggests that implementations of shrink should try at least three things:

Shrink a term to any of its immediate subterms.
Recursively apply ‘shrink’ to all immediate subterms.
Type-specific shrinkings such as replacing a constructor by a simpler constructor.

Additionally when defining shrink on a recursive type it’s important that the shrunken results are actually smaller in some measureable way to avoid infinite descending chains. Never-ending or cyclic shrinkage will prevent QuickCheck from terminating.

In the case of ByteString, shrinking shortens the string and causes the byte values inside to move their values closer to zero (0x00). That’s why our string stopped shrinking at a single zero byte, denoted \NUL. It can’t go any closer to zero and any shorter string passes the test.

Not only ought the input and output sizes be in a constant ratio, but Base64 encoding should pad the end with equals signs when needed to keep the output a multiple of four bytes. For instance “A” maps to “QQ==”, “AB” to “QUI=” and “ABC” doesn’t need any padding and maps to “QUJD.” This is another property we can test.

import Text.Regex.Posix

prop_endsWithPadding b =
  (encB =~ ("(^|[^=])" <> suffix <> "$"))      -- at end
    && not (encB =~ ("=[^=]"::BL.ByteString)) -- only at end
 where
  encB = encode b
  remainder = fromIntegral $ BL.length b `rem` 3
  suffix = BL.replicate
    ((3 - remainder) `rem` 3)
    (fromIntegral $ ord '=')

This checks with regexes that the output ends with the correct number of equals signs and that no stray ones exist earlier. If our encode function is written correctly then it passes! However, are the random inputs really exercising all the padding possibilities? What if the input cases never attain certain sizes modulo four? We can get a better window into the test by labeling each case.

-- Just add this to the beginning of the property
-- collect :: (Show a, Testable prop) => a -> prop -> Property

prop_endsWithPadding b =
  collect suffix $ ...

The test results show that the cases we care about are all represented about equally.

=== prop_endsWithPadding ===
+++ OK, passed 100 tests:
36% ""
35% "="
29% "=="

We’ve now established that the encode function outputs the correct length and pads it properly. The point of Base64 encoding however is to keep its output within a limited subset of ASCII for transmission over restricted protocols. It’s important to verify that the output stays within its allowed alphabet. Here’s the first version of the test:

import Data.Char (ord)
import Data.Monoid ((<>))
import qualified Data.Set as S

prop_outputAlphabet b =
  used `S.isSubsetOf` allowed
 where
  used = S.fromList . BL.unpack $ encode b
  allowed = S.fromList . map (fromIntegral . ord) $
    ['A'..'Z'] <> ['a'..'z'] <> ['0'..'9'] <> ['+','/','=']

If the encode function is written correctly then this test will pass! However, as with the padding property, we want to know if it is testing thoroughly enough. Does encode output to the full alphabet? It might never hit certain values (maybe a range is off). To check we can classify certain outputs as being “half-alphabet” or “full-alphabet.”

-- classify :: Testable prop =>
--             Bool
--          -> String
--          -> prop -> Property


prop_outputAlphabet b =
  classify (S.size used >= 32) "half-alphabet"
    . classify (S.size used >= 63) "full-alphabet"
    $ used `S.isSubsetOf` allowed
 where
   ...

Running it reveals some of the outputs are half-alphabet, but none are full-alphabet. Our tests are not giving us the full picture.

=== prop_outputAlphabet ===
+++ OK, passed 100 tests (36% half-alphabet).

Calling classify provides a nice way to keep a casual eye on distributions but we want a stronger guarantee. To insist that we get at least one full-alphabet output we switch from classify to cover.

prop_outputAlphabet b =
  cover (S.size used >= 63) 1 "full-alphabet"
    $ used `S.isSubsetOf` allowed
 where
  ...

Now the test fails the way we want.

=== prop_outputAlphabet ===
*** Insufficient coverage after 100 tests (only 0% full-alphabet, not 1%).

Either the encode function has a bug or we’re not giving it big enough test cases to exercise the full output alphabet. Let’s see how to crank up the generator.

Refining Generated Data

As mentioned, QuickCheck begins with a small size parameter when generating results and raises it slowly as testing proceeds. If our failure to get output coverage in the previous section was due to insufficiently large input then there are at least two ways to enlarge the input: use more test cases to allow the size to grow further, or scale the generator sensitivity.

Using more test cases is a matter of configuring the test runner, which we will discuss in a later section. Instead let’s make the most of the number of tests already used. We’ll employ a custom generator which is a scaled version of one provided by quickcheck-instances.

-- To use a custom generator in a Property use
-- forAll :: (Show a, Testable prop)
--        => Gen a -> (a -> prop) -> Property

prop_outputAlphabet =
  forAll (scale (*3) (arbitrary :: Gen BL.ByteString)) $ \b ->
    let ... in
    cover (S.size used >= 63) 1 "full-alphabet"
      $ used `S.isSubsetOf` allowed

We saw how to scale a generator a few sections before, so that part is nothing new. As luck would have it, running this modified property with a suitable encode function reliably covers full-alphabet in one hundred runs.

We succeeded above by modifying the generator. Another way to affect generated values is by using newtypes specially designed to adjust generators of underlying types. This approach avoids the need for forAll. The Test.QuickCheck.Modifiers module includes helper newtypes like NonNegative, NonZero, Positive, OrderdedList etc. Internally there is not much to these types, they simply allow a way for QuickCheck to define new Arbitrary instances. You use them in Testable type signatures:

prop_foo :: (Positive i) -> Bool
prop_foo = ...

A useful member of this family is Small a. Restricting the number of distinct values taken in test cases is appropriate for testing certain code paths. For instance Gen HashMap will detect more corner cases if the keys’ cardinalities are reduced to cause collisions.

Some people recommend creating a newtype and Arbitrary instance any time you use a custom Gen. The idea is that it may be more likely that you’ll use the custom Gen uniformly by using the newtype throughout code and tests. However it’s a matter of style.

Getting back to the Base64 example, we want to enforce that encode and decode are inverses. Encoding first and then decoding is the easy direction because encode is not picky about its inputs. The propery looks like this:

-- assume the decode function has this type:
-- decode :: ByteString -> Either ParseError ByteString

-- the easy direction
prop_decEnc b = decode (encode b) == Right b

The other direction does not work as well because many strings are not valid Base64 messages and cannot be decoded.

import Data.Either (rights)

-- this will quickly fail
prop_encDec b =
  [b] == (encode <$> rights [decode b])

We can discard test cases that aren’t suitable by using the conditional operator.

-- (==>) :: Testable prop => Bool -> prop -> Property

prop_encDec b =
  legit ==> [b] == (encode <$> rights [dec])
 where
  dec  = decode b
  legit = isRight dec

This works but it is inefficient when the ratio of suitable to unsuitable test cases is small. After getting too many rejections QuickCheck fails.

=== prop_encDec ===
*** Gave up! Passed only 45 tests.

It’s always more efficient to use a custom generator than to conditionally discard test cases. Here’s generator for messages in the Base64 alphabet with proper equals padding, plus the modified test.

-- using more Gen combinators: listOf, vectorOf

encoded :: Gen BL.ByteString
encoded = do
  body <- concat <$> listOf (group 0)
  end <- group =<< choose (0, 2)
  return . BL.pack $ body <> end
 where
  group :: Int -> Gen [Word8]
  group pad = do
    letters <- vectorOf (4 - pad)
      . elements . map (fromIntegral . ord)
      $ ['A'..'Z'] <> ['a'..'z'] <> ['0'..'9'] <> ['+','/','=']
    return $ letters <> replicate pad 61  -- 61 is ascii for =

prop_encDec = forAll encoded $ \b ->
  [b] == (encode <$> rights [decode b])

QuickCheck, being the powerful tool it is, actually discovers a problem.

=== prop_encDec ===
*** Failed! Falsifiable (after 1 test):
"1yx="

Decoding “1yx=” and re-encoding produces “1yw=”. How can this be? RFC4648 Section 3.5 designates certain encodings as “canonical” and the Base64 decode function is only injective when restricted to the canonical encodings. The prop_encDec test will pass only if the custom generator produces nothing but canonical forms. Modifying the generator is an exercise for the reader!

Testing encoder/decoders is a natural task for QuickCheck, and this examples goes to show you should make two properties to check that functions are inverses of one another – check both directions. Even our check that the encoder hits all letters of the alphabet did not reveal the importance of its canonical encodings.

A final note about generators. When the data type under test is internally complicated but presents an API for modification it may be easier to generate an Arbitrary series of actions against the API. Have test cases execute the API actions, thus building the datatype. This is the approach used at Pusher to test their Raft protocol implementation with QuickCheck.

Running Multiple Tests

We already used the quickCheck function to run a single test, but given a file full of Testable properties how do you run them all? The Test.QuickCheck.All module uses Template Haskell to detect properties and run them.

{-# LANGUAGE TemplateHaskell #-}
module MyTests where

import Test.QuickCheck.All

prop_foo = ...

prop_bar = ...

--------------------------
return []
runTests :: IO Bool
runTests = $quickCheckAll

This creates a runTests function which detects the tests due to the prop_ prefix in their names. That is the reason for the naming convention which might have looked a little odd.

To adjust the number of test cases run against each property (or other parameters) use forAllProperties rather than quickCheckAll.

-- quickCheckWithResult :: Testable prop => Args -> prop -> IO Result
-- stdArgs :: Args

return []
runTests :: IO Bool
runTests = $forAllProperties $
  quickCheckWithResult (stdArgs {maxSuccess = 10000})

Your main test runner itself can call runTests for each testing module and exit with the appropriate Unix status code.

module Main where

import qualified MyTests as MT
import System.Exit

main :: IO ()
main = do
  -- add test runners into the array for each module
  good <- and <$> sequence [MT.runTests]
  if good
     then exitSuccess
     else exitFailure

While the Tasty, HSpec, and test-framework testing frameworks can run QuickCheck Properties they do not output statistics equally well. The built-in Template Haskell test runner in QuickCheck outputs full information (including collect and classify results), as does Tasty. The other two hide the statistics and obscure when failures are caused by too many rejected conditionals.

Testing IO or ST Actions

QuickCheck was originally designed for pure functions, but the Test.QuickCheck.Monadic module supports tests which perform IO actions like calling external programs. There is some highly academic material about it online but it’s quite easy in practice. Here’s the general form of monadic tests:

-- using the functions
-- monadicIO :: PropertyM IO a -> Property
-- run :: Monad m => m a -> PropertyM m a
-- assert :: Monad m => Bool -> PropertyM m ()

prop_monadic a b = monadicIO $ do
  a' <- run (f a)
  b' <- run (f b)
  -- ...
  assert someBoolean

The module exposes a bunch of new monadic functions but they correspond to concepts we already know. The examples in Test.QuickCheck.Monadic are very clear and will quickly get you up and running.

pre is (==>)
forAllM (and pick) is forAll
monitor adapts collect/classify
monadic, monadicIO, monadicST all fit the new code into regular Properties

CoArbitrary and Gen (a -> b)

Sometimes it’s useful to generate not just values, but functions. Consider a test involving images. The JuicyPixels package provides a function generateImage to make them. It requires a function from coordinate to pixel value as its first argument, and the image width and height. Suppose we want to use it to generate not a specific picture with known pixel function, but random pictures.

We know how to generate an arbitrary width and height, but how do we get a random function to assign colors to pixels? It turns out all we have to do is provide an Arbitrary instance for pixels and QuickCheck can “magically” generate functions Pixel px => (Int -> Int -> px). Let’s see it in action first, and then discover how it works internally.

-- Magically generates a function "f" and also uses
-- suchThat :: Gen a -> (a -> Bool) -> Gen a

import Codec.Picture

instance Arbitrary PixelRGB8 where
  arbitrary = PixelRGB8 <$> arbitrary <*> arbitrary <*> arbitrary

genImage :: Gen (Image PixelRGB8)
genImage = do
  f <- arbitrary
  (x, y) <- arbitrary `suchThat` ( \(x,y) -> x > 0 && y > 0 )
  return $ generateImage f x y

Here is one of these images.

img <- generate $ resize 1000 genImage
BL.writeFile "/tmp/qc.png" $ encodePng img

The randomly generated image

(As an aside: in real test code you won’t even have to write your own image generator. Image and other media generators are provided by the QuickFuzz library.)

QuickCheck was able to automatically generate a random function for us because the type of the function’s arguments, Int, is an instance of the CoArbitrary typeclass.

The fundamental trick is the observation that a function which modifies a generator can be transformed into a generator of functions. That is, it is possible to write a function promote :: (a -> Gen b) -> Gen (a -> b). (The modern incarnation of QuickCheck generalizes to Monad m => m (Gen a) -> Gen (m a).) The way such a function would work is not too deep, it’s a matter of rearranging arguments:

a -> Gen b = {meaning of Gen}  a -> Int -> Rand -> b
           = {reorder args}    Int -> Rand -> a -> b
           = {meaning of Gen}  Gen (a -> b)

The real trick is how to get a function a -> Gen b in the first place. QuickCheck provides a way to get a related function, a -> Gen b -> Gen b for a variety of choices of a through instances of the CoArbitrary typeclass:

class CoArbitrary a where
  coarbitrary :: a -> Gen b -> Gen b

The type signature of coarbitrary allows it to fit into the crucial instance for Arbitrary (a -> b).

instance (CoArbitrary a, Arbitrary b) => Arbitrary (a -> b) where
  arbitrary = promote (\a -> coarbitrary a arbitrary)

It was this instance which, through repeated application, gave us Arbitrary (Int -> Int -> PixelRGB8).

How do CoArbitrary instances even work? How do you write a function a -> Gen b -> Gen b to transform any generator, without knowing something about the b type and how to alter its values? The answer is to deterministically alter the stream of pseudo-random numbers feeding the generator. QuickCheck provides a primitive variant :: Int -> Gen a -> Gen a function to do this.

With high probability, variant alters a generator differently for different integer values. In fact the function has been designed carefully so that any composition variant i_0 . variant i_1 . … . variant i_k modifies a generator differently for distinct i_0, i_1, …, i_k.

The variant primitive allows CoArbitrary instances to modify unknown generators:

instance CoArbitrary Bool where
  coarbitrary b =
    if b then variant 0 else variant 1

The variant composition property above allows the technique to extend to recursively defined datatypes. QuickCheck provides CoArbitrary instances for a large number of basic types in the Test.QuickCheck.Arbitrary module, even Map and Set. Creating variants can be automated in your own datatypes by deriving Generic, at which point you can simply write instance CoArbitrary MyType.

Resources

Helper Libraries

https://hackage.haskell.org/package/quickcheck-unicode
Wider sample of Char with unicode weirdness.
https://hackage.haskell.org/package/quickcheck-special
Boundary values for ints and floats.
https://hackage.haskell.org/package/quickcheck-instances
Arbitrary instances for many types, including dates and times.
https://hackage.haskell.org/package/quickcheck-regex
Generator for strings matching a regex.
http://quickfuzz.org/
Generators for images, audio, documents, net addresses, time.
https://hackage.haskell.org/package/props, https://hackage.haskell.org/package/test-invariant
Helper abbreviations to test laws like associativity and idempotence.
https://hackage.haskell.org/package/checkers
Generator combinators and properties for common typeclasses.
https://hackage.haskell.org/package/fgl-arbitrary
Arbitrary instances for graphs.

Written by Joe "begriffs" Nelson