Why type systems matter

A fistful of formal methods

We want our programming language environments at large to be able to tell well-behaved programs from those that behave poorly. It doesn't matter how it's achieved.

Runtime monitoring which includes things like operations on incompatible objects (a la Python and JavaScript) and underappreciated contract programming based on preconditions and postconditions, as well as invariant checking (a la DLang) [4].
Some will remember model-driven engineering with UML modelling (my Vim highlighted "UML" as a non-existent word! It brings me joy). Automatically deriving constraints from such models and rejecting models that are self-contradicting or breaking some constraints. (a la EMFtoCSP [3]).
Both digital and analog circuits can be accepted or rejected based on automatically derived finite state machine models and checking for desired properties [1][2].
Type systems for rejecting classes of poorly-behaved programs statically, during compilation (a la Java).

Rather interesting observation is that these discriminators should be reproducible, which calls for underlying formalisms. Furthermore, it's preferred that domain experts (JavaScript programmers, UML architects, embedded systems engineers) can reap benefits from those. That property is called "lightweight" in culture. When we put these considerations together, we see that all of these things, including JavaScript's runtime monitoring, which many people may deem as basic, are lightweight formal methods! Not scary at all.

Not all formal methods, however, are made for the same reason and not everything achievable with one can be achieved with another. To illustrate that, consider the following use-case: we build up an array of validation functions and then, at the validation site we call them one by one.

01 |	let module1 = {
02 |	  defaultValidators: [
03 |	    (x) => 2 == x.split(' ').length,
04 |	  ],
05 |	  validate: (input) => (fs) =>
06 |	    fs.reduce((acc, f) => acc && f(input), true),
07 |	};
08 |
09 |	let module2 = {
10 |	  alsoCapitalised: [
11 |	    (x) =>
12 |	       x.split(' ').reduce(
13 |	         (acc, x) => acc && (/[A-Z]/.test(x[0])), true
14 |	       )
15 |	  ] + module1.defaultValidators,
16 |	}
17 |
18 |	let main = {
19 |	  main: (input) => {
20 |	    let validators = module2.alsoCapitalised;
21 |	    if (module1.validate(input)(validators)) {
22 |	      console.log("It's time to open the door");
23 |	    }
24 |	  }
25 |	}
26 |
27 |	main.main("Viktor Tsoi");

When we run this code, the following error will be reported:

Uncaught TypeError: fs.reduce is not a function
    validate debugger eval code:6
    main debugger eval code:21
     debugger eval code:27

The true place where the error happens is line 15. Getting there from lines 6 and 21 would probably require a little bit of debugging, especially in a real project. Indeed, the error happens due to nonsense operation + over two arrays. Let's fix it:

15 |	  ].concat(module1.defaultValidators),

When we run the code again, we get the expected message in the log:

It's time to open the door

Let's compare runtime monitoring with a type system.

Here's code with the same error in Haskell.

01 |	{-# LANGUAGE OverloadedStrings #-}
02 |	import qualified Data.Text as T
03 |	import Data.Text( Text )
04 |	import Data.Char( isUpper )
05 |
06 |	defaultValidators :: [Text -> Bool]
07 |	defaultValidators = [\x -> 2 == (length $ T.splitOn " " x)]
08 |
09 |	validate :: Text -> [Text -> Bool] -> Bool
10 |	validate input fs = foldl (\acc f -> acc && f input) True fs
11 |
12 |	alsoCapitalised :: [Text -> Bool]
13 |	alsoCapitalised = [\x -> foldl (\acc w -> acc && (isUpper $ T.head w))
14 |	                               True
15 |	                               (T.splitOn " " x)] + defaultValidators
16 |
17 |	main :: IO ()
18 |	main =
19 |	  case validate input defaultValidators of
20 |	    True -> putStrLn "It's time to open the door"
21 |	    _    -> putStrLn "Close the door behind me"
22 |	  where
23 |	    input = "Viktor Tsoi"

When we try to compile it, we're going to get an error that says exactly what is wrong. To be able to apply +, operands had to be classified as numbers via typeclass Num. This typeclass doesn't include lists of validator functions. Note that if we would really want to define addition on such values, we would be able to, by providing an appropriate instance of Num. But it's a rather horrible idea, so we'll fix the bug instead.

/tmp/hi.hs:13:19: error:
    No instance for (Num [Text -> Bool]) arising from a use of '+'

With this error, we quickly can replace line 15 with one using ++, the list concatenation operator:

15 |	                               (T.splitOn " " x)] ++ defaultValidators

When we run this one, we get the correct result!

Static checking allows us to catch many classes of errors like this one early, which saves a lot of money, since--as systems engineering teaches us--the cost of fixing a fault in a system grows exponentially as a function of how far it is from the requirement gathering stage in the lifecycle of a system.

Benefits of using type systems

Type systems come in different shapes and sizes: different type systems may be geared to eliminate different classes of incorrect programs. However, most of type systems--collaterally--eliminate programmers' errors such as incomplete case analyses, mismatched units, et cetera. For example, if we get rid of line 21 in the Haskell listing entirely, in a well-configured GHC (which treats warnings as errors and warns about everything "suspicious), we'll get the following error, indicating incomplete case analysis:

/tmp/hi.hs:19:3: error: [-Wincomplete-patterns, -Werror=incomplete-patterns]
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: False
   |
19 |   case validate input defaultValidators of
   |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

Of course, it's impossible to reap this benefit without a certain wit and discipline. I like to call that discipline "tight typing", many Haskellers call it "having as concrete data structures as possible" (it's an elaboration of mantra "abstract functions, concrete data". For instance, the DummyTag type you have seen earlier is loose, because it is used in the places of the model, because it is used to type incompatible terms. Quoting my colleague:

Nothing prevents you to subtract height from pressure if both of those are encoded as Float.

Type checkers are powerful refactoring tools. Anecdotally, at work, once we had to restructure approximately fifty modules, while separating part of those into a library and fusing arguments into structures in another part of those. The whole refactoring was done and released by one person in one working day. It would have been impossible without a type checker to ensure completeness of said refactoring. Another anecdote comes from my colleague:

When I worked for AlphaSheets.com (a startup "acquihired" by Google), I refactored the whole codebase, threading through an App monad instead of IO. I did this for a week in part due to painful and regular rebasing onto the main branch. But when the thing compiled, the only tests that failed were the ones covering a place that was stubbed with undefined, due to its implementation being open for discussion.

Notorious for a steep learning curve and perceived feature development slow-down, languages with type systems--granted a certain engineering savviness--can serve as amazing facilitators of rapid prototyping. The goal of prototyping is often figure out the best specification for the requirements at hand. Expressive type systems often allow to encode relationships between domain entities without writing complex business logic. If one views types as propositions and values of said types as proofs of said propositions, this approach makes a lot of sense. Of course, mileage can vary and it works better the less side-effects there are in a system, but remember that you can simulate side effects via type encodings too! For instance, instead of figuring out a proper way to do cryptography, we can encode an interface of a crypto-system and populate it with dummy functions, complying with it:

data PKC pass sk pk slip sig sigmsg enc plain cipher = PKC
  { -- | Initial key derivation function
    kdf :: pass -> (sk, pk, slip),
    -- | Rederive with kdf
    rekdf :: slip -> pass -> (sk, pk),
    -- | Sign data with key @sk@ and produce a detached @signature@
    -- possibly containing @pk@ for verification.
    sign :: sigmsg -> (sk, pk) -> signature pk,
    -- | Verifies @signature@ container's validity.
    verify :: sigmsg -> signature pk -> Bool,
    -- | Encrypt data of type @plain@ to the key @pk@ and produce @encrypted@
    -- containing @cipher@.
    encrypt :: plain -> pk -> encrypted pk cipher,
    -- | Decrypts @cipher@, contained in an @encrypted@ container into @plain@.
    decrypt :: encrypted pk cipher -> sk -> Maybe plain
  }

This is a model of public key cryptography. Populating this model with functions would (together with tests that verify correct behaviours and error handling of a cryptosystem) serve as a proof of possibility of a correct implementation of public key cryptography in Haskell and provide an interactive specification for doing so, perhaps even in another language! Let's provide a quick example of some functions slotting into that model.

data DummyTag = DummySK | DummyPK | DummySlip
  deriving (Show, Eq)

-- | Note that we abuse the fact that in dummy implementation
-- secret key and public key are both represented with a
-- DummyTagged type alias to embed secret key together with
-- the message.
newtype DummySigned msg key = DummySigned {sig :: (key, (key, msg))}
  deriving (Show, Eq)

-- ...

type DummyTagged = (DummyTag, ByteString)

-- ...

dummyKdf :: ByteString -> IO (Maybe (DummyTagged, DummyTagged, DummyTagged))
dummyKdf pass = pure $ Just ((DummyPK, pass), (DummySK, pass), (DummySlip, pass))

dummyRekdf :: DummyTagged -> ByteString -> Maybe (DummyTagged, DummyTagged)
dummyRekdf (DummySlip, x) pass =
  go (x == pass)
  where
    go True = Just ((DummyPK, pass), (DummySK, pass))
    go False = Nothing
dummyRekdf _ _ = error "DummySlip expected in the 1st argument"

-- ...

dummySign :: (DummyTagged, DummyTagged) -> msg -> DummySigned msg DummyTagged
dummySign (verificationKey@(DummyPK, _), signingKey@(DummySK, _)) blob =
  DummySigned (verificationKey, (signingKey, blob))
dummySign _ _ = error "The first argument has to be a tuple of DummySK and DummyPK"

-- | Note that we're not comparing public key with secret key
-- but rather compare embedded bytestrings which match if the keys
-- were derived from the same password by kdf
dummyVerify :: Eq a => DummySigned a DummyTagged -> a -> Bool
dummyVerify (DummySigned ((DummyPK, signedAs), ((DummySK, signedWith), signedWhat))) candidate =
  (signedAs == signedWith) && (candidate == signedWhat)

Now to keep prototyping, we just need to instantiate PKC data type with a collection of suitable functions. Later on, when we switch to a real cryptographic system, a value of type PKC we'll build with it shall serve as a proof to the claim that "public key cryptographic systems exist as modeled". We can and should verify that early, but not too early to slow the prototyping down. Furthermore, in a similar fashion a HTTP client-server interactions can be modeled and after the prototype is completed, one will end up with a runnable, compilable specification for the software they're about to write. What's fairly amazing is that if the company doesn't want to switch from Haskell to another language for the actual product, they can repurpose this prototype for incremental rewrite into an MVP!

These days, usage of higher-order and anonymous functions is prominent even in more conservative ecosystems like Java's. Type systems greatly assist in reasoning about the code's overall behaviour. In general, type systems are one of many tools for writing self-documenting code. An illustration for those, who (like me) keep forgetting the order of accumulator and an iterated value in reducers:

Prelude> :t foldl
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b

Yeah, if the language's ecosystem is tightly integrated with the underlying type system, amazing things are possible, from simple type signature lookups at your fingertips to full blown type signature search engines like Hoogle [8] and Serokell's hackage-search (which, somewhat ironically, is written in Rust) [9]. Conversely, if it evolved independently from type system, such as TypeScript extending JavaScript with success typings or Dialyzer extending Erlang and Elixir with success typings, likely, self-documenting benefits and improved discoverability will be way less pronounced, if at all.

Type systems also enable and encourage writing composable code, meaning that the programmers are nudged towards writing well-structured and modular code bases, no matter what is the unit of modularity: a Java class or a Haskell module. Of course, there are ways to write spaghetti code in any language, but it's harder when the whole ecosystem imposes structure.

Drawbacks of type systems

Type systems aren't entirely free. Both the users (programmers) and the computer itself have to do extra work to write a program that would be accepted by a language with a type system. Sometimes, that work takes non-trivial amounts of computational time, but in practice, it's seldom more computationally intense than, say, code generation via templates.

Also, not quite a drawback, but rather something many people don't understand about type systems. There's an "if it compiles, it works" meme, but it's extremely misleading. Type systems, by their static nature, can't prove presence of features of a program, only absence. But this yields a couple of actual drawbacks:

With type systems, you almost always pick your fights. Languages geared towards eliminating one kind of bad program behaviours won't eliminate another, perhaps, side-stepping it altogether by deferring it to the language runtime.

Scrutiny, thus power, of type systems is always in tension with expressiveness, which is understood as the measure of the programs that are well-behaved at runtime, which are rejected by the type checker.

Sometimes complex type checking algorithms rely on heuristics, which limit expressiveness in ways, unexpected by the user [7].

Bottom line

We'll end this section with a nice heuristic for architects to think about using type systems is to perform the following thought experiment:

Approximate how much would it cost to write, deploy and run a 100%-coverage random testing system, searching for runtime errors (in our JS example, it would be a test making sure that there is no validator crashed by a string and there is no call in the program that crashes is).

Approximate how much would it cost to train developers to use a "tight" type system, which would give the same sort of guarantees "for free" and in static fashion (many different kinds of tests are still needed in this approach, but the tests we've described in p.1 are given to us for free).

If cost 2 is even comparable to, let alone lower than, cost 1, going for using type systems to refuse incorrect programs is warranted. Quoting "The Toyota Way":

Focusing on quality actually reduced cost more than focusing only on cost.

Links

Formal methods

=> [1]: Formal Verification (in Embedded Systems Design), Prof. Anupam Basu, NPTEL | [2]: Formal Verification Methodologies for Nonlinear Analog Circuits, Dr. S. Steinhorst, Ph.D. thesis | [3]: EMFtoCSP: A tool for the lightweight verification of EMF models, C. A. Gonzalez et al. | [4]: Contract Programming, DLang website | [7]: Dependency Analysis of Haskell Declarations, Artem Kuznetsov

Proxy Information

Original URL: gemini://soviet.circumlunar.space/sweater/hindsight/2021-08-22-type-systems-pt-1.gmi
Status Code: Success (20)
Meta: text/gemini; lang=en
Capsule Response Time: 82.160736 milliseconds
Gemini-to-HTML Time: 2.791224 milliseconds

This content has been proxied by September (3851b).