Tagless with Discipline - Testing Scala Code the Right Way

With the recent boom in the adoption of so-called final tagless encoding in Scala land, which in turn seems to be addressing the shortcomings of the Free monad approach, the testability of programs is better than ever. The general consensus is that one of the main benefits of the Free / tagless style is that it allows for easy unit testing programs without the tedious process of setting up dependencies etc…

class TaglessService[M[_]: Monad](taglessRepository: TaglessRepository[M]) {
  def getList(limit: Int): M[Seq[Item]] = {
    taglessRepository.getList(limit)
  }
}

You simply bind to the ID monad (or swap interpreters if you’re a Free fan) and you’re good to test all the pure logic. Obviously, these techniques also help with integration testing by virtue of being able to easily transform components between various monadic contexts.

For example, you can produce a DBIO instance out of your computation and interpret it as a Future in an automatically rolled back transaction. No more setting up fixtures and maintaining the ever-elusive db state in tests. Integration tests can be fully parallel and much less flaky. That being said, it is still a bit of a burden to actually write these tests, mainly because of the informality of a component’s specification that results in a lot of repetitive case-by-case testing.

What do I mean?

Let’s see what the typical approach might look like.

Problem Domain

Let’s say that we have a system storing users along with their preferences and e-mails. GDPR aside, we want to identify these users by their e-mail, set some properties, etc… So, in the tagless manner, we have created two repositories with abstracted monadic context:

trait Emails[F[_]] {
    def save(email: Email): F[Either[EmailAlreadyExists.type, Email]]
    def known(email: Email): F[Boolean]
    def findEmail(email: Email): F[Option[Email]]
}

trait Users[F[_]] {
    def createUser(primaryEmail: Email, 
                   userProfile: UserProfile = UserProfile()): F[PersistedUser]
    def findUser(uid: UID): F[Option[PersistedUser]]
    def identifyUser(email: Email): F[Option[PersistedUser]]

    def attachEmail(user: PersistedUser, email: Email): F[Int]
    final def attachEmails(user: PersistedUser, emails: Email*)(
                           implicit F: Applicative[F]): F[Int] = {
      import cats.instances.list._
      val fa = Traverse[List].sequence(emails.map(attachEmail(user, _)).toList)
      F.map(fa)(_.sum)
    }

    def getEmails(uid: UID): F[Option[NonEmptyList[Email]]]
    def updateUserProfile(uid: UID,
                          f: UserProfile => UserProfile): 
                          F[Option[PersistedUser]]
}

In short – you have a bunch of emails and you can attach them to the user. Nothing extraordinary here. Additionally, you can do some lookups and updates. Supposedly, you’d like these structures to be kept in a relational database, so you implement these repositories for DBIO (if you use slick) or something similar (if you don’t).

final class EmailRepository(implicit ec: ExecutionContext)
    extends Emails[DBIO] {

  // ...

  override def save(email: Email) = {
    val row = EmailRow.from(email)

    (EmailsTable += row)
      .map(_ => Right(email): Either[EmailAlreadyExists.type, Email])
      .recoverPSQLException {
        case UniqueViolation("emails_pkey", _) => Left(EmailAlreadyExists)
      }
  }
  override def known(email: Email) = existsQuery(email).result
  override def findEmail(email: Email) =
    filterEmailQuery(email).result.headOption

  //...
}

class UserRepository(emailRepository: EmailRepository)(
                     implicit ec: ExecutionContext) extends Users[DBIO] {
  //...

  override def createUser(primaryEmail: Email, userProfile: UserProfile) = {
    val row = DbUser.from(primaryEmail, userProfile)
    (UsersTable += row).map(PersistedUser(_, row))
  }

  override def identifyUser(email: Email) = identifyQuery(email).result.flatMap{
    case Seq()          => DBIO.successful(None)
    case Seq(singleRow) => DBIO.successful(Some(singleRow))
    case _              => DBIO.failed(
      new IllegalStateException(s"More than one user uses email: $email"))
  }

  override def attachEmail(user: PersistedUser, email: Email) = {
    val id = user.id
    emailRepository.upsert(email, id)
  }

 //...

}

The details of the implementation do not really matter. It’s sufficient to say that it maps abstract operations to the “real ones.” What matters is that it needs to be tested at some point to ensure correctness of mapping, constraint violation handling, etc…

Typical Integration Testing

Normally, you would write a test case for each expected behavior. First, you would need some fixtures and transactions to prepare the db for a test scenario:

class BaseFixture(db: Database) {
  private case class IntentionalRollbackException[R](result: R)
      extends Exception("Rolling back transaction after test")

  def withRollback[A](testCode: => DBIO[A])(
                      implicit ec: ExecutionContext): Future[A] = {
    val testWithRollback = testCode flatMap (a => 
      DBIO.failed(IntentionalRollbackException(a)))

    val testResult = db.run(testWithRollback.transactionally)

    testResult.recover {
      case IntentionalRollbackException(success) => success.asInstanceOf[A]
    }
  }
}

class UserFixtures(db: Database) extends BaseFixture(db) {

  //...

  def mkUser(primaryEmail: String, 
             userProfile: UserProfile = UserProfile()): User =
    User.from(Email(primaryEmail), userProfile)

  def mkUser(primaryEmail: String,
             emails: NonEmptyList[Email],
             userProfile: UserProfile): UserWithEmails =
    UserWithEmails(mkUser(primaryEmail, userProfile), emails)

  def withUser[A](user: User)(testCode: UID => DBIO[A])(
                  implicit ec: ExecutionContext): Future[A] =
    withRollback(usersRepository.insert(user).flatMap(testCode))

  def withUser[A](primaryEmail: String,
                  userProfile: UserProfile = UserProfile())(
      testCode: UID => DBIO[A])(
      implicit ec: ExecutionContext): Future[A] =
    withUser(mkUser(primaryEmail, userProfile))(testCode)

  def withUsers[A](users: User*)(testCode: Seq[UID] => DBIO[A])(
                   implicit ec: ExecutionContext): Future[A] =
    withRollback(DBIO
                 .sequence(users.map(user => usersRepository.insert(user)))
                 .flatMap(testCode))

  //...
}

And then, using all the test infrastructure you wrote, you can write integration tests:

class UserRepositorySpecs extends ItTest with OptionValues {
  val repository = new UserRepository(new EmailRepository())

  val fixture = new UserFixtures(db)
  import fixture._

  it should "insert user to database" in withRollback {
    repository.createUser(
      Email("[email protected]"),
      UserProfile(
        name = Some(Name("John")),
        aboutMe = Some("I am John"),
        birthdate = Some(Birthdate(LocalDate.of(1982, 11, 12))),
        languagesSpoken = Some("Polish, English, German, Russian"),
        language = Some(Language(new Locale("pl", "PL")))
      )
    )
  }.map { fromDb =>
    fromDb.primaryEmail shouldEqual "[email protected]"
    val profile = fromDb.profile

    profile.name.value shouldEqual "John"
    profile.aboutMe.value shouldEqual "I am John"
    profile.birthdate.value shouldEqual LocalDate.of(1982, 11, 12)
    profile.languagesSpoken.value shouldEqual "Polish, English, German, Russian"
    profile.language.value shouldEqual new Locale("pl", "PL")
  }

  it should "identify user by email" in withUser(
    mkUser("[email protected]",
           userProfile = UserProfile(aboutMe = Some("I am John"),
                                     language = Some(Language(Locale.US)))))(_=>
    repository.identifyUser(Email("[email protected]")) zip repository.
      identifyUser(Email("[email protected]"))).map {
    case (maybeJohn, noOne) =>
      noOne shouldBe empty

      val john = maybeJohn.value
      john.profile.aboutMe shouldEqual Some("I am John")
      john.profile.language shouldEqual Some(Locale.US)
      john.primaryEmail shouldEqual "[email protected]"
  }

  it should "get users by uid" in withUsers(
    mkUser("[email protected]"),
    mkUser("[email protected]")
  ) {
    case Seq(uid1, uid2) =>
      repository.getUser(UID()) zip repository.
        getUser(uid1) zip repository.
        getUser(uid2)
  }.map {
    case ((noUser, user1), user2) =>
      noUser shouldBe empty

      user1.value.primaryEmail shouldEqual "[email protected]"
      user2.value.primaryEmail shouldEqual "[email protected]"
  }
}

//...

class EmailRepositorySpecs extends ItTest 
                           with EitherValues 
                           with OptionValues {
  val repository = new EmailIdentityRepository

  val fixture = new EmailFixtures(db)
  import fixture._

  it should "save email to the db" in withRollback {
    repository.save(mkEmail("[email protected]"))
  }.map { errorOrEmail =>
    val fromDb = errorOrEmail.right.value
    fromDb shouldEqual "[email protected]"
  }

  it should "not save email if it already exists" in withEmail(
    "[email protected]") {
    repository.save(mkEmail("[email protected]"))
  }.map { errorOrEmail =>
    val fromDb = errorOrEmail.left.value
    fromDb shouldEqual EmailAlreadyExists
  }

  it should "check if email is known" in withEmail("[email protected]")(
    repository.known(Email("[email protected]")) zip repository.known(
      Email("[email protected]"))).map {
    case (known, unknown) =>
      known shouldBe true
      unknown shouldBe false
  }

  //...

}

Unfortunately, writing these tests is very tedious and it’s tempting to cut some corners by skipping some important cases. For instance, would you test what happened when you saved a string with all the whitespace? Or various combinations when parts of a UserProfile are missing? Or all the VARCHAR constraints?

Moreover, you need to test a lot of implicit interactions between various methods. What should double save do? What should find do after a successful create ? If find returns something, then what is its relation to getEmails ? All these facts are tested by checking the behavior of the method with respect to some implicit database state. This is achieved by preparing a vast array of fixtures meticulously recreating the desired state before the test.

All this has one detrimental effect when it comes to generality – people work with tagless to abstract away effects. Yet, if you were to exercise this benefit, you’d have to rewrite all the tests with all the specifics of the new effect, making it prohibitively expensive.

Bring on Some Discipline

So, we seek to obtain the following:

Write Less
Test More
Be Explicit about How Methods of Algebra Should Behave
Be Generic

These things seem a bit contradictory. With the typical approach, the only way to test more is to write more tests and fixtures! And how can you be more explicit while being more generic if being explicit means writing fixtures that are everything but generic? It turns out that these properties cannot be obtained through a typical approach, so the only way to proceed is to change the approach.

Instead of writing tests, let’s formulate laws that the implementation of algebra should respect. Then, let’s use the automated law-checking library, Discipline, which will generate a large number of random test cases with ScalaCheck. This will allow us to test with sufficient confidence that any implementation is following our laws.

Thus, we get the following benefits:

We do not have to write tests – only laws and some infrastructure code (data generators, equality definitions).
Tests can exercise cases that are hard to come by when writing them manually (e.g., very large strings, empty values).
Tests work regardless of implementation.
Laws serve as an explicit documentation of behavior.

Let’s see the details!

Writing Laws

When you take a look at the Emails algebra, the following laws come to mind:

For every saved email e, find(e) returns e.
For every saved email e, known(e) returns true.
find is consistent with known i.e., find(e) is defined IFF known(e) is true.
Saving the same email twice always returns EmailAlreadyExists error.

By translating these laws into operations using this algebra, you get (in pseudo-code):

save(e) >> findEmail(e) <-> pure(Some(e))
save(e) >> known(e) <-> pure(true)
findEmail(e).fmap(_.isDefined) <-> known(e)
save(e) *> save(e) <-> pure(Left(EmailAlreadyExists))

In the example above, we use the standard cats syntax where: a >> b means a flatMap (_ => b); a *> b means product(a, b).map(_._2). We also use the <-> symbol to express the equivalent to relation.

UPDATE: Oleg Pyzhcov commented on Reddit:

You need to be careful to also capture the effects in laws, not just the result. A good litmus test is to see if the law specifies a possible refactoring that doesn’t break anything.
For example, I would not be able to blindly substitute by this law:
save(e) >> known(e) <-> pure(true)
because it completely removes the effect of saving stuff. The correct law would be
save(e) >> known(e) <-> save(e) >> pure(true)

I agree with his insight. For one thing, it makes reasoning about longer expressions correct. Thus save(e) *> save(e) <-> pure(Left(EmailAlreadyExists)) should similarly be rewritten as: save(e) *> save(e) <-> save(e) >> pure(Left(EmailAlreadyExists))
You should keep this advice in mind when working on your laws.
Thanks, Oleg!

Similarly, you can devise a set of laws for Users algebra:

For every created user u, identifyUser(primaryEmail(u)) returns u.
For every created user u, identifyUser(e) returns u IFF e has been attached to the user u.
For every user u with profile p, creating the user and then updating their profile is equivalent to creating the user with the profile already updated i.e., createUser(e, p) >>= (u => updateUserProfile(uid(u), f(p))) <-> createUser(e, f(p)).
Attaching n emails via n calls to attachEmail is equivalent to calling attachEmails once with collection of all n-emails.

To be complete, we should have written laws governing the behavior of the remaining methods – find, getEmails, etc… I took the liberty of skipping it to be concise.

Let’s see how we implement these laws in Discipline.

Implementing Laws

The implementation of law checking needs to be tailored to ScalaCheck to achieve automated testing. That is, a law must be a valid ScalaCheck property. We’ll be using cats-kernel-laws provided IsEq type for that. The purpose of this type is twofold. First, IsEq(lhs, rhs) states that the left-hand side of the IsEq expression is equivalent to its right-hand side. Second, it is convertible to ScalaCheck Prop by Discipline. We form IsEq instances by using a handy <-> operator.

So, the implementation of the first law for the Emails algebra might look like this:

import cats.Monad
import cats.kernel.laws._

trait EmailAlgebraLaws[F[_]] {
  def algebra: Emails[F]
  implicit def M: Monad[F]

  import cats.syntax.apply._
  import cats.syntax.flatMap._
  import cats.syntax.functor._

  def saveFindComposition(email: Email) =
    algebra.save(email) >> algebra.findEmail(email) <-> M.pure(Some(email))
}

object EmailAlgebraLaws {

  def apply[F[_]](instance: Emails[F])(implicit ev: Monad[F]) =
    new EmailAlgebraLaws[F] {
      override val algebra = instance
      override implicit val M: Monad[F] = ev
    }

}

The way we read the method is:

For any email: Email, the expression algebra.save(email) >> algebra.findEmail(email) must be equivalent to M.pure(Some(email)). And that’s what we want every implementation of the Email algebra to respect.

We’ll also prepare a generic test suite (called Laws in Discipline’s terminology):

import org.typelevel.discipline.Laws
import cats.kernel.laws.discipline._
import cats.{Eq, Monad}
import org.scalacheck.Arbitrary
import org.scalacheck.Prop._

trait EmailAlgebraTests[F[_]] extends Laws {
  def laws: EmailAlgebraLaws[F]

  def algebra(implicit arbEmail: Arbitrary[Email],
              eqFOptEmail: Eq[F[Option[Email]]]) =
    new SimpleRuleSet(
      name = "Emails",
      "find and save compose" -> forAll(laws.saveFindComposition _)
    )
}

object EmailAlgebraTests {

  def apply[F[_]: Monad](instance: Emails[F]) = new EmailAlgebraTests[F] {
    override val laws: EmailAlgebraLaws[F] = EmailAlgebraLaws(instance)
  }

}

Now, we have to tell ScalaCheck how to generate emails. I recommend reading a ScalaCheck tutorial first, but it’s very simple in essence. There has to be an Arbitrary[Email] instance in the implicit scope of tests:

trait ArbitraryInstances  {
  final val MailGen: Gen[Email] = (for {
    mailbox  <- nonEmptyListOf(alphaNumChar).map(_.mkString)
    hostname <- nonEmptyListOf(alphaLowerChar).map(_.mkString)
  } yield s"$mailbox@$hostname.com") suchThat (_.length <= 254) map (Email(_))

  implicit final val ArbitraryEmail: Arbitrary[Email] = Arbitrary(MailGen)
 }

Finally, we’ll have to specify how to check the equivalence for a given monad M. For the DBIO context, we’ll have to run both sides of the action that we check against the test db in a rolled-back transaction (to give each test a clean db state) and then compare the outputs. (This is an adaptation of a snippet found in slick-cats.)

trait DBIOTestInstances extends ScalaFutures {
  import cats.instances.either._

  protected implicit def executionContext: ExecutionContext
  protected def db: Database

  private case class IntentionalRollbackException[R](result: R)
      extends Exception("Rolling back transaction after test")

  private def withRollback[A](testCode: => DBIO[A]): Future[A] = {
    val testWithRollback = testCode flatMap (a => DBIO.failed(
      IntentionalRollbackException(a)))

    val testResult = db.run(testWithRollback.transactionally)

    testResult.recover {
      case IntentionalRollbackException(success) => success.asInstanceOf[A]
    }
  }

  private def futureEither[A](fut: Future[A]): Future[Either[Throwable, A]] =
    fut.map(Right(_)).recover { case t => Left(t) }

  implicit final val throwableEq: Eq[Throwable] = Eq.fromUniversalEquals
  implicit def eqDBIO[A: Eq]: Eq[DBIO[A]] =
    (fx: DBIO[A], fy: DBIO[A]) => {
      val fz = futureEither(withRollback(fx)) zip futureEither(withRollback(fy))
      fz.map { case (tx, ty) => 
        implicitly[Eq[Either[Throwable, A]]].eqv(tx, ty) }.futureValue
    }
}

Given all these pieces, we can finally test our implementation:

class EmailRepositorySpecs extends ArbitraryInstances 
                                   with DBIOTestInstances 
                                   with Discipline {
  checkAll("EmailRepository", EmailAlgebraTests(new EmailRepository).algebra)
}

We now have a test suite that checks if a bunch of random emails can be saved to db and, subsequently, looked-up. Not only is the logic of these operations tested, but you can also catch errors stemming from incorrect mapping of the db schema. Let’s see how the whole suite looks:

trait EmailAlgebraLaws[F[_]] {
  def algebra: Emails[F]
  implicit def M: Monad[F]

  import cats.syntax.apply._
  import cats.syntax.flatMap._
  import cats.syntax.functor._

  def saveFindComposition(email: Email) =
    algebra.save(email) >> algebra.findEmail(email) <-> M.pure(Some(email))

   def saveKnownComposition(email: Email) =
    algebra.save(email) >> algebra.known(email) <-> M.pure(true)

  def alreadyExistsCondition(email: Email) =
    algebra.save(email) *> algebra.save(email) <-> M.pure(
       Left(IdentityAlreadyExists))

  def findKnownConsistency(email: Email, f: Email => Email) = {
    (algebra.save(email) >> algebra
      .findEmail(f(email))
      .map(_.isDefined)) <-> (algebra
      .save(email) >> algebra.known(f(email)))
  }
}

trait EmailAlgebraTests[F[_]] extends Laws {
  def laws: EmailAlgebraLaws[F]

  def algebra(implicit arbEmail: Arbitrary[Email],
              arbEmailF: Arbitrary[Email => Email],
              eqFBool: Eq[F[Boolean]],
              eqFOptId: Eq[F[Option[Email]]],
              eqFEither: Eq[F[Either[EmailAlreadyExists.type, Email]]]) =
    new SimpleRuleSet(
      name = "Emails",
      "find consistent with known"  -> forAll(laws.findKnownConsistency _),
      "find and save compose"       -> forAll(laws.saveFindComposition _),
      "known and save compose"      -> forAll(laws.saveKnownComposition _),
      "ensure AlreadyExists"        -> forAll(laws.alreadyExistsCondition _)
    )
}

You can see that you usually need to write additional generators as you add tests. But since they are composable, writing them is quite easy and bound by the size of your domain. (You only have to write new generators for domain-specific things that ScalaCheck does not know how to mock.) Another idea that may be worth exploring is the automatic derivation of Arbitrary instances for regular product/sum types by Magnolia.

You might be wondering why we are interested in generating a function? Sometimes you can form compact laws by stating that: the law holds under any transformation f. Just as we did in the findKnownConsistency test, def findKnownConsistency(email: Email, f: Email => Email).

It can be read as: given any saved email e and arbitrary transformation f, the result of find is defined for f(e) if the known(f(e)) is true. This is a stronger statement than saying that the law holds for any saved email e, letting us skip separately testing the case when find returns None.

To see why, let’s consider that f is a function that appends xyz to the mailbox part of the email address. The equivalence should hold for the choice of f, and, indeed, save(email) >> find(Email(s"xyz$email")).map(_.isDefined) is None, and save(email) >> known(Email(s"xyz$email")) is false. Alternatively, when f is the identity function, we expect find to return Some and known to return true. Thus, we have conflated two cases into one law.

Conclusion

I hope that my article has convinced you that testing in tagless should be based on abstract law rather than ad-hoc test cases. Can you think of any cases where this approach would be inferior to writing tests manually? Certainly, writing laws is harder than coming up with a bunch of test cases, and it requires some getting used to.

Conceptually, all type classes come with laws. These laws constrain implementations for a given type and can be exploited and used to reason about generic code. typelevel.org

9 Comments

Thank you for reading. I hope you enjoyed the article. Let’s start the discussion!

1. Do you think it’s a good idea to use these tests?

2. What kind of drawbacks might there be?

3. What kind of laws would be difficult to formulate?

4. What’s your approach to integration testing in the presence of tagless?

We’d love to hear from you!

Thanks, great article!

Great insights and examples, thank you.

While studying this am wondering if there are limitations in terms of kinds of effects and errors conditions under test and combinations of those.

See “Error handling in Http4s with classy optics – Part 2” https://typelevel.org/blog/2018/11/28/http4s-error-handling-mtl-2.html for ways to combine a service/repo/etc algebra with its error algebra and enforcing consistency of error algebra at web endpoint also.

Would it make sense to have a second iteration of this testing approach to see show how this it scales for Gabriel’s examples?

Hi, sorry for keeping your comment in limbo for such a long time. It got buried in spam :-/ I saw Gabriel’s talk on this, it’s certainly a nice approach. I am thinking that introducing Coproduct-based handlers does not change that much – the set of laws for respective algebras remain the same, while `CoXErrorHandler` is tested in isolation for its own laws eg. that all error types that are part of the coproduct are handled. Does that seem viable?

Its such as you learn my thoughts! You seem to grasp so much about this,
such as you wrote the ebook in it or something.
I believe that you simply can do with a few % to power the message home a bit, but instead of that, this is excellent blog.

An excellent read. I’ll certainly be back.

Nice answers in return of this question with firm arguments and describing
everything concerning that.

Fabulous, what a weblog it is! This blog gives helpful data to us, keep it up.

Very good write-up. I absolutely appreciate this site.
Continue the good work!

Thanks to my father who shared with me about this web
site, this blog is truly awesome.

Tagless with Discipline — Testing Scala Code the Right Way