=> Next: Innovative type systems: case studies of mainstream adoption | Previous: The Problem with Sid Meyer's Civilization
22 August 2021
We want our programming language environments at large to be able to tell well-behaved programs from those that behave poorly. It doesn't matter how it's achieved.
Rather interesting observation is that these discriminators should be reproducible, which calls for underlying formalisms. Furthermore, it's preferred that domain experts (JavaScript programmers, UML architects, embedded systems engineers) can reap benefits from those. That property is called "lightweight" in culture. When we put these considerations together, we see that all of these things, including JavaScript's runtime monitoring, which many people may deem as basic, are lightweight formal methods! Not scary at all.
Not all formal methods, however, are made for the same reason and not everything achievable with one can be achieved with another. To illustrate that, consider the following use-case: we build up an array of validation functions and then, at the validation site we call them one by one.
01 | let module1 = { 02 | defaultValidators: [ 03 | (x) => 2 == x.split(' ').length, 04 | ], 05 | validate: (input) => (fs) => 06 | fs.reduce((acc, f) => acc && f(input), true), 07 | }; 08 | 09 | let module2 = { 10 | alsoCapitalised: [ 11 | (x) => 12 | x.split(' ').reduce( 13 | (acc, x) => acc && (/[A-Z]/.test(x[0])), true 14 | ) 15 | ] + module1.defaultValidators, 16 | } 17 | 18 | let main = { 19 | main: (input) => { 20 | let validators = module2.alsoCapitalised; 21 | if (module1.validate(input)(validators)) { 22 | console.log("It's time to open the door"); 23 | } 24 | } 25 | } 26 | 27 | main.main("Viktor Tsoi");
When we run this code, the following error will be reported:
Uncaught TypeError: fs.reduce is not a function validate debugger eval code:6 main debugger eval code:21debugger eval code:27
The true place where the error happens is line 15. Getting there from lines 6 and 21 would probably require a little bit of debugging, especially in a real project. Indeed, the error happens due to nonsense operation +
over two arrays. Let's fix it:
15 | ].concat(module1.defaultValidators),
When we run the code again, we get the expected message in the log:
It's time to open the door
Let's compare runtime monitoring with a type system.
Here's code with the same error in Haskell.
01 | {-# LANGUAGE OverloadedStrings #-} 02 | import qualified Data.Text as T 03 | import Data.Text( Text ) 04 | import Data.Char( isUpper ) 05 | 06 | defaultValidators :: [Text -> Bool] 07 | defaultValidators = [\x -> 2 == (length $ T.splitOn " " x)] 08 | 09 | validate :: Text -> [Text -> Bool] -> Bool 10 | validate input fs = foldl (\acc f -> acc && f input) True fs 11 | 12 | alsoCapitalised :: [Text -> Bool] 13 | alsoCapitalised = [\x -> foldl (\acc w -> acc && (isUpper $ T.head w)) 14 | True 15 | (T.splitOn " " x)] + defaultValidators 16 | 17 | main :: IO () 18 | main = 19 | case validate input defaultValidators of 20 | True -> putStrLn "It's time to open the door" 21 | _ -> putStrLn "Close the door behind me" 22 | where 23 | input = "Viktor Tsoi"
When we try to compile it, we're going to get an error that says exactly what is wrong. To be able to apply +
, operands had to be classified as numbers via typeclass Num
. This typeclass doesn't include lists of validator functions. Note that if we would really want to define addition on such values, we would be able to, by providing an appropriate instance of Num
. But it's a rather horrible idea, so we'll fix the bug instead.
/tmp/hi.hs:13:19: error: No instance for (Num [Text -> Bool]) arising from a use of '+'
With this error, we quickly can replace line 15 with one using ++
, the list concatenation operator:
15 | (T.splitOn " " x)] ++ defaultValidators
When we run this one, we get the correct result!
Static checking allows us to catch many classes of errors like this one early, which saves a lot of money, since--as systems engineering teaches us--the cost of fixing a fault in a system grows exponentially as a function of how far it is from the requirement gathering stage in the lifecycle of a system.
Type systems come in different shapes and sizes: different type systems may be geared to eliminate different classes of incorrect programs. However, most of type systems--collaterally--eliminate programmers' errors such as incomplete case analyses, mismatched units, et cetera. For example, if we get rid of line 21 in the Haskell listing entirely, in a well-configured GHC (which treats warnings as errors and warns about everything "suspicious), we'll get the following error, indicating incomplete case analysis:
/tmp/hi.hs:19:3: error: [-Wincomplete-patterns, -Werror=incomplete-patterns] Pattern match(es) are non-exhaustive In a case alternative: Patterns not matched: False | 19 | case validate input defaultValidators of | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
Of course, it's impossible to reap this benefit without a certain wit and discipline. I like to call that discipline "tight typing", many Haskellers call it "having as concrete data structures as possible" (it's an elaboration of mantra "abstract functions, concrete data". For instance, the DummyTag
type you have seen earlier is loose, because it is used in the places of the model, because it is used to type incompatible terms. Quoting my colleague:
Nothing prevents you to subtract height from pressure if both of those are encoded as Float.
Type checkers are powerful refactoring tools. Anecdotally, at work, once we had to restructure approximately fifty modules, while separating part of those into a library and fusing arguments into structures in another part of those. The whole refactoring was done and released by one person in one working day. It would have been impossible without a type checker to ensure completeness of said refactoring. Another anecdote comes from my colleague:
When I worked for AlphaSheets.com (a startup "acquihired" by Google), I refactored the whole codebase, threading through an App monad instead of IO. I did this for a week in part due to painful and regular rebasing onto the main branch. But when the thing compiled, the only tests that failed were the ones covering a place that was stubbed with undefined, due to its implementation being open for discussion.
Notorious for a steep learning curve and perceived feature development slow-down, languages with type systems--granted a certain engineering savviness--can serve as amazing facilitators of rapid prototyping. The goal of prototyping is often figure out the best specification for the requirements at hand. Expressive type systems often allow to encode relationships between domain entities without writing complex business logic. If one views types as propositions and values of said types as proofs of said propositions, this approach makes a lot of sense. Of course, mileage can vary and it works better the less side-effects there are in a system, but remember that you can simulate side effects via type encodings too! For instance, instead of figuring out a proper way to do cryptography, we can encode an interface of a crypto-system and populate it with dummy functions, complying with it:
data PKC pass sk pk slip sig sigmsg enc plain cipher = PKC { -- | Initial key derivation function kdf :: pass -> (sk, pk, slip), -- | Rederive with kdf rekdf :: slip -> pass -> (sk, pk), -- | Sign data with key @sk@ and produce a detached @signature@ -- possibly containing @pk@ for verification. sign :: sigmsg -> (sk, pk) -> signature pk, -- | Verifies @signature@ container's validity. verify :: sigmsg -> signature pk -> Bool, -- | Encrypt data of type @plain@ to the key @pk@ and produce @encrypted@ -- containing @cipher@. encrypt :: plain -> pk -> encrypted pk cipher, -- | Decrypts @cipher@, contained in an @encrypted@ container into @plain@. decrypt :: encrypted pk cipher -> sk -> Maybe plain }
This is a model of public key cryptography. Populating this model with functions would (together with tests that verify correct behaviours and error handling of a cryptosystem) serve as a proof of possibility of a correct implementation of public key cryptography in Haskell and provide an interactive specification for doing so, perhaps even in another language! Let's provide a quick example of some functions slotting into that model.
data DummyTag = DummySK | DummyPK | DummySlip deriving (Show, Eq) -- | Note that we abuse the fact that in dummy implementation -- secret key and public key are both represented with a -- DummyTagged type alias to embed secret key together with -- the message. newtype DummySigned msg key = DummySigned {sig :: (key, (key, msg))} deriving (Show, Eq) -- ... type DummyTagged = (DummyTag, ByteString) -- ... dummyKdf :: ByteString -> IO (Maybe (DummyTagged, DummyTagged, DummyTagged)) dummyKdf pass = pure $ Just ((DummyPK, pass), (DummySK, pass), (DummySlip, pass)) dummyRekdf :: DummyTagged -> ByteString -> Maybe (DummyTagged, DummyTagged) dummyRekdf (DummySlip, x) pass = go (x == pass) where go True = Just ((DummyPK, pass), (DummySK, pass)) go False = Nothing dummyRekdf _ _ = error "DummySlip expected in the 1st argument" -- ... dummySign :: (DummyTagged, DummyTagged) -> msg -> DummySigned msg DummyTagged dummySign (verificationKey@(DummyPK, _), signingKey@(DummySK, _)) blob = DummySigned (verificationKey, (signingKey, blob)) dummySign _ _ = error "The first argument has to be a tuple of DummySK and DummyPK" -- | Note that we're not comparing public key with secret key -- but rather compare embedded bytestrings which match if the keys -- were derived from the same password by kdf dummyVerify :: Eq a => DummySigned a DummyTagged -> a -> Bool dummyVerify (DummySigned ((DummyPK, signedAs), ((DummySK, signedWith), signedWhat))) candidate = (signedAs == signedWith) && (candidate == signedWhat)
Now to keep prototyping, we just need to instantiate PKC data type with a collection of suitable functions. Later on, when we switch to a real cryptographic system, a value of type PKC we'll build with it shall serve as a proof to the claim that "public key cryptographic systems exist as modeled". We can and should verify that early, but not too early to slow the prototyping down. Furthermore, in a similar fashion a HTTP client-server interactions can be modeled and after the prototype is completed, one will end up with a runnable, compilable specification for the software they're about to write. What's fairly amazing is that if the company doesn't want to switch from Haskell to another language for the actual product, they can repurpose this prototype for incremental rewrite into an MVP!
These days, usage of higher-order and anonymous functions is prominent even in more conservative ecosystems like Java's. Type systems greatly assist in reasoning about the code's overall behaviour. In general, type systems are one of many tools for writing self-documenting code. An illustration for those, who (like me) keep forgetting the order of accumulator and an iterated value in reducers:
Prelude> :t foldl foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
Yeah, if the language's ecosystem is tightly integrated with the underlying type system, amazing things are possible, from simple type signature lookups at your fingertips to full blown type signature search engines like Hoogle [8] and Serokell's hackage-search (which, somewhat ironically, is written in Rust) [9]. Conversely, if it evolved independently from type system, such as TypeScript extending JavaScript with success typings or Dialyzer extending Erlang and Elixir with success typings, likely, self-documenting benefits and improved discoverability will be way less pronounced, if at all.
Type systems also enable and encourage writing composable code, meaning that the programmers are nudged towards writing well-structured and modular code bases, no matter what is the unit of modularity: a Java class or a Haskell module. Of course, there are ways to write spaghetti code in any language, but it's harder when the whole ecosystem imposes structure.
Type systems aren't entirely free. Both the users (programmers) and the computer itself have to do extra work to write a program that would be accepted by a language with a type system. Sometimes, that work takes non-trivial amounts of computational time, but in practice, it's seldom more computationally intense than, say, code generation via templates.
Also, not quite a drawback, but rather something many people don't understand about type systems. There's an "if it compiles, it works" meme, but it's extremely misleading. Type systems, by their static nature, can't prove presence of features of a program, only absence. But this yields a couple of actual drawbacks:
We'll end this section with a nice heuristic for architects to think about using type systems is to perform the following thought experiment:
If cost 2 is even comparable to, let alone lower than, cost 1, going for using type systems to refuse incorrect programs is warranted. Quoting "The Toyota Way":
Focusing on quality actually reduced cost more than focusing only on cost.
=> [1]: Formal Verification (in Embedded Systems Design), Prof. Anupam Basu, NPTEL | [2]: Formal Verification Methodologies for Nonlinear Analog Circuits, Dr. S. Steinhorst, Ph.D. thesis | [3]: EMFtoCSP: A tool for the lightweight verification of EMF models, C. A. Gonzalez et al. | [4]: Contract Programming, DLang website | [7]: Dependency Analysis of Haskell Declarations, Artem Kuznetsov This content has been proxied by September (3851b).Proxy Information
text/gemini; lang=en