Tagless Final Encoding in Haskell
Table of Contents
- Source Code
- Introduction
- Why Tagless Final?
- What is Tagless Final Encoding?
- Tagless Final Encoding in practice
- Provide and Test with different implementations using Type Application
- Extensibility
- Conclusion
Source Code
You can find source code of the example described in this post here
Introduction
In this post i am going to explore a simple technique for organizing our programs which is called Tagless Final Encoding to write testable programs in Haskell. I am also use TypeApplication LANGUAGE directive to write more readable and flexible test.
Why Tagless Final?
Nowadays in Haskell Community there is an open discussion about using Free Monads, mtl or Tagless Final Encoding to write internal DSL (Domain Specific Language) for representing our programs in a descriptive and Functional way.
In my personal opinion i think all of these tools, theories and techniques are suitable to do it but depends on the context of the person, team or solution you are writing to decide which is more useful.
For example:
Free Monad: I think it is great to have tools that are based on Category Theory concepts such as Free Applicative, Free Monad and so on. There is a great paper about this Notions of Computation as Monoid. In that sense Free Monad not only help us to describe our programs but also to have certain Math Properties in our toolbox to manipulate them. Although i have never benchmarked any of the Free Monad implementations out there, i know there are complains about their performance in the community. Beyond this i think for beginners it is a little difficult to implement.
Monad Transformers (mtl): Also great tool, based on simple Monad concept which is easier to understand for beginners and without performance penalties if you are using carefully. It is also the most used tool for dealing with different Monads in a single program from the beginning of Haskell. The only disadvantage i could pointed out is that is less readable and understandable in the code when you are stacking more than 3 Monads. Also a drawback for beginners.
Tagless Final: You only need to define and implement Typeclasses. In the original post i have wrongly mentioned "it is a technique not based on any paper or Math Theory", but thank you to p-alik who pointed me out in reddit channel that there is paper for this here Typed final (tagless-final) style. The main advantage for me is it is beginner friendly, readable, easy to understand, test and extend.
Having said that, I would like to talk about Tagless Final as an approach for Haskell beginners in order to help them to organize and describe programs; make them extensible and testable.
What is Tagless Final Encoding?
Tagless Final Encoding is a technique for embedding a DSL (Domain Specific Language) in a Functional Programming Language. We need to define a Language for using it and an Interpreter to indicate how it should behave on each defined term. For this purpose we are going to use Typeclasses.
To sum up in Tagless Final Encoding style there are:
Typeclasses: Set of operations over a Type.
Interpreter: Instances of those Typeclasses for each specific Type
Tagless Final Encoding in practice
We are going to build a basic program which request some user data. The program is going to do the following:
- First try to recover the data from a cache
- If data is found it is returned
- If there is no data in the cache, search user data in source repository and update cache
type UserName = String
data DataResult = DataResult String
requestData userName = do
cache <- getFromCache userName
result <- case cache of
Just dataResult -> return dataResult
Nothing -> do
dataResult <- getFromSource userName
storeCache dataResult
return dataResult
storeCache result
return result
Here it is our basic program which implements what we described above, but obviously this code doesn't work because we need to define functions such as getFromCache
, getFromSource
and storeCache
.
For defining that we are going to use Typeclasses as we mentioned, in order to represent our program capabilities.
Why are we defining Cache
and DataSource
Typeclasses as Monad
also? Basically because we want to combine and chain our DSL terms in a single program.
But we still need to change our program definition since we are constraining only on Monad
and we want to use Cache
and DataSource
terms from the implicit context.
= do
cache <- getFromCache userName
result <- case cache of
Just dataResult -> return dataResult
Nothing -> getFromSource userName
storeCache result
return result
requestData userName
Notice that we don't need anymore Monad
Constraint in our signature because both Cache
and DataSource
are Monad
s also.
The only thing left is to write our Instances to provide some implementation. We are going to provide a fake implementation for IO
.
= return Nothing
storeCache _ = return ()
getFromSource user = return $ [DataResult $ "source: " <> user]
getFromCache _
If we run our program from ghci we are going to see it is working:
λx.x> import Data
λx.x> requestData "john"
[DataResult "source: john"]
Provide and Test with different implementations using Type Application
One of the things I have announced on the beginning of this post is i am going to show how easy it is to test our programs using this technique combined with TypeApplication LANGUAGE extension. This combination enable us not only to test, but also to provide and interchange different instances of our Typeclasses in a straightforward way.
Instances
In order to provide different instances of Cache
and DataSource
, and play around with different cases, for example when data is in cache or not, i am going to wrappe IO
type in different newtype
representations.
newtype NotInCache a = NotInCache { unNoCache :: IO a }
getFromCache _ = NotInCache $ return Nothing
storeCache _ = NotInCache $ return ()
getFromSource user = return $ [DataResult $ "source: " <> user]
For the first instance we need to do enable GeneralisedNewtypeDeriving extension to allow us deriving Functor
, Monad
and Applicative
because our Typeclasses Cache
and DataSource
are also Monad
and we need to provide implementations of those Typeclasses for our custom type NotInCache
Now if we are trying to run this in ghci we are getting the following error:
λx.x> requestData "john"
<interactive>:5:1: error:
• Ambiguous type variable ‘m0’ arising from a use of ‘print’
prevents the constraint ‘(Show
(m0 [DataResult]))’ from being solved.
Probable fix: use a type annotation to specify what ‘m0’ should be.
These potential instances exist:
instance (Show a, Show b) => Show (Either a b)
-- Defined in ‘Data.Either’
instance Show a => Show (Maybe a) -- Defined in ‘GHC.Show’
instance (Show a, Show b) => Show (a, b) -- Defined in ‘GHC.Show’
...plus 14 others
...plus 89 instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
• In a stmt of an interactive GHCi command: print it
Basically the compiler is saying us that it cannot find an unambiguous instance to use for our program. But also as the compiler is pointed out we can use TypeApplication extension to tell the compiler what instance should use and provide an explicit evidence of that.
λx.x> :set -XTypeApplications
λx.x> :t requestData "john"
requestData "john"
:: (Data.Cache m, DataSource m, Data.Logging m) => m [DataResult]
λx.x> :t requestData @NotInCache "john"
requestData @NotInCache "john" :: NotInCache [DataResult]
Here we've enabled extension and after that we are running our program with NotInCache
type. Notice that now we need to call unNoCache
to unwrap our underlying IO
and effectively running in our ghci IO
loop.
x.x> unNoCache $ requestData "john"
[DataResult "source: john"]
We can also do it from our .hs file.
= (unNoCache $ requestData "john") >>= (putStrLn . show)
main
Now we are ready for different instances!!!
newtype InCache a = InCache { unInCache :: IO a }
getFromCache user = InCache $ return $ Just [DataResult $ "cache: " <> user]
storeCache _ = InCache $ return ()
getFromSource _ = undefined
main = do
(unNoCache $ requestData "john") >>= (putStrLn . show)
(unInCache $ requestData "john") >>= (putStrLn . show)
The outputs now look like this:
λx.x> Data.main
[DataResult "source: john"]
[DataResult "cache: john"]
Extensibility
One of the important aspects of Tagless Final Encoding is its extensibility property. It is extensible in 2 dimensions:
Vertical Extensibility: It is what we have just done adding different implementations for the same Typeclasses without altering our program.
Horizontal Extensibility: It is adding new capabilities to the program in order to extend some functionality inside it.
Horizontal Extensibility
Our program capabilities beyond Monad
, Functor
and Applicative
Typeclasses are Cache
and DataSource
. If we are saying that it is Horizontal Extensible we can add more capabilities apart from those mentioned. For example what about Logging
?.
Let do it with our example:
= do
cache <- getFromCache userName
result <- case cache of
Just dataResult -> return dataResult
Nothing -> getFromSource userName
storeCache result
logMsg $ "Result data for user: " <> userName <> " - data: " <> show result
return result
requestData userName
And now providing instances for Logging
= NotInCache . putStrLn
logMsg = InCache . putStrLn
logMsg
If we run the program we obtain the following:
λx.x> Data.main
Result data for user: john - data: [DataResult "source: john"]
[DataResult "source: john"]
Result data for user: john - data: [DataResult "cache: john"]
[DataResult "cache: john"]
Conclusion
As we can see, Tagless Final Encoding is a pretty good technique to build testable and extensible programs.
We have also demonstrated how easy is to interchange and provide different instances using TypeApplication extension.