Sunday, September 28, 2014

Themes from Strangeloop 2014

One of the many reasons making the Strangeloop conference special is the interdisciplinary perspective, taking on themes as diverse as core functional programming concepts - what could fit this description better than the infinite tower of interpreters seen on Nada Amin's keynote - to upcoming software deployment approaches.

Still, some common themes seem to have emerged. One candidate is the spreadsheet as inspiration for programming. One thinker that seems to have taken some inspiration for his work is Jonathan Edwards, that opened the Future of Programming workshop with a talk showcasing the latest version of his research language Subtext.  Earlier prototypes explored the idea of programming without names, directly linking tree nodes to each other via a kind of cut-and-paste mechanism. In its latest incarnation it appears to have evolved into a reactive language with a completely observable evaluation model, the entire evaluation tree is always available for exploration, and a two stage reactive architecture allows for relating input events to evaluation steps. User interface is auto generated, sharing the environment with the code, much like their older reactive cousing, the spreadsheet.

Kaya, a new language created by David Broderick, explores the spreadsheet metaphor in a more literal manner: what if spreadsheets and cells were composable, allowing for naturally nested structures? Moreover, what if we could query this structure in a SQL-like manner? The result is a small set of abstractions generating complex emergent behavior, including, as in Subtext, a generic user interface.

Data dependency graph driven evaluation is an important part of both modern functional reactive programming languages and of all spreadsheet packages since 1978's VisiCalc. We saw some of the first on Evan Czaplicki's highly approachable talk "Controlling Time And Space: Understanding The Many Formulations Of Frp". And a bit of the latter on Felienne Hermans's talk "Spreadsheets for Developers", sort of wrapping around the metaphor and looking to software engineering for inspiration to improve spreadsheet usage.

One of the great aspects of spreadsheets is the experience of direct manipulation of a live environment. This is at the crux of what Brett Victor has been demonstrating on many of his demos, showing how different programming could be if our tools were guided by this principle. Though he did not present, the idea of direct manipulation was present in Strangeloop in several of the talks.  Subtext's transparency and live reflection of code updates on the generated UI moves in this direction. Still on the Future of Programming workshop, Shadershop is an environment whose central concept seems to be directly manipulating real-valued functions by composing simpler functions while live inspecting the resulting plots. Stephen Wolfram's keynote was an entertaining demonstration of his latest product, the Wolfram Language.  Its appeal was due, among other reasons, to the interactive exploration environment, particularly the visual representation of non-textual data and the seamless jump from evaluating expressions to building small exploratory UIs. 

Czaplicki's talk discussed several of the decisions involved in designing Elm, his functional reactive programming language. I found noteworthy that many of those were taken in order to allow live update of running code and an awesome time-traveling debugger.

Taking a different perspective at the buzzword du Jour, reactive, is another candidate theme for this year's Strangeloop: the taming of callbacks. They were repeatedly mentioned as one evil to be banished from the world of programming, including on Joe Armstrong's keynote, "The mess we are in" and all the functional reactive programming content took aim at the construct. Not only functional, another gem from this year's Future of Programming workshop was the imperative reactive programming language Céu. Created by Fransico Sant'anna at PUC Rio - the home of the Lua programming language - Céu compiles an imperative language with embedded event based concurrency constructs down to a deterministic state machine in C.  Achieving, among other tricks, fully automated memory management without a garbage collector.

Befitting our age of microservices and commodity cloud computing, another interesting current was looking at modern approaches to testing distributed systems. Michael Nygard exemplified simulation testing - which can be characterized as property based testing in the large - with Simulant, a clojure framework to prepare, run, record events, make assertions and analyze the results of sophisticated black box tests. Kyle @aphyr Kingsbury delivered another amazing performance torturing distributed databases to their breaking point. Most interesting was the lengths he had to go to in order to control the combinatorial explosion of the state space and actually verify global ordering properties like linearizability.

Speaking of going to great lengths to torture database systems, we come to what might have been my favorite talk at the conference, by the FoundationDb team, "Testing Distributed Systems w/ Deterministic Simulation".  Like @aphyr's Jepsen, they control the external environemnt to inject failures while generating transactions and asserting the system maintains its properties. They take great care to mock out all sources of non-determisim, including time, random number generation, even extend C++ to add better behaved concurrency abstractions.

Tests run thousands of times each night, nondeterministic behavior is weeded out by running twice each set of parameters and checking outputs don't change. FoundationDb's team goes further than Jepsen in the types of failures they can inject; not only causing network partitions and removing entire nodes from the cluster, but also simulating network lag, data corruption, and even operator mistakes, like swapping data files between nodes! Of course the test harness itself could be buggy, failing to exercise certain failure conditions; to overcome this specter, they generate real hardware failures with programmable power supplies connected to physical servers (they report no bugs were found on FoundationDb with this strategy, but Linux and Zookeeper had defects surfaced - the latter isn't in use anymore).

What I particularly enjoyed from this talk was the attitude towards the certainty of failures in production. Building a database is a serious endeavor, data loss is simply not acceptable, and they understood this from the start.

Closing the conference in the perfect key was Carin Meier and Sam Aaron's keynote demonstrating the one true underlying theme: Our Shared Joy of Programming.

Sunday, October 21, 2012

Security and Objects

Mobile code is one of the great challenges for software security. Lets say  you are writing an email application. The idea that people could send little apps to each other in email messages might seem like a potentially interesting feature: users could build polls, schedule meetings, play games, share interactive documents. Kind of cool.

And if the platform you are building upon supports reflectively evaluating code, it could be as easy as something like this (in OO pseudocode):

    define load_message(message)
        ...
        eval(message.code)

Of course it can't be that easy. What if the code in the message does something like:

    new stdlib.io.File("/").delete()

The standard way to avoid the vulnerability is to put the code in a so-called sandbox. It sounds very secure, but in practice this usually amounts to gathering up a list of "dangerous" call sites and inserting in each some code to check if the caller has permission to proceed. So the implementation of delete would include code along the lines of:

    define delete()
        if VM.callStackContainsEvilCode?()
            raise YouShallNotPassException()

        ...

This is fraught with problems. It requires runtime support for inspecting the call stack and a system for declaring that certain code has some level of authorization while some other code has a lower level. Not to mention the busywork of going trough the code and peppering that little snippet over every suspect call site. If you miss one — say, for instance, a method that gets the addresses of all contacts on your email application — and you have a security bug on your hands.

A better way ?

Perhaps there is a better way. Take another look at the offending line: new stdlib.io.File("/").delete(). It is only able to call the dangerous delete() method because it has a reference to a file object pointing to the root of the filesystem. And it only has that reference because it could reach for the File class on a global namespace. What if there was no global namespace?

It might seem weird, but it's not that hard to imagine a programming system lacking a global namespace. Many object-oriented languages, following Smalltalk's lead, have a notion of a metaclass, an object that represents a class. Many of them (also following Smalltak) also get by without a "new" operator. Objects are created by calling a method — usually named new() — on the metaclass object.

We are very close now. The last step, unfortunately not taken by most common languages, is to avoid anchoring the metaclass object onto a global namespace. The result is that code can only create objects of the classes it holds a reference to. And it only has a reference if it is given one via a method or constructor parameter.

Proceeding recursively, we end up with a stratified program. There is an entry point that receives a reference to the entire standard library, and each call site decides how much authority to grant each callee. On our example, when we evaluate external code we can grant very little authority, meaning we can pass the evaluated code just a handful of references. Care must be taken so that none of them will direct or indirectly provide a way to create a File. In a way, object design becomes security policy.

And we get very fine-grained control over such policy. We could, for instance, grant loaded code authority to write on a designated directory just by passing it a reference to the Directory object for that directory. Our choices get even more interesting when we realize we can pass references to proxies instead of real objects in order to attenuate authority. Continuing with our example, hoping it doesn't get too contrived, we could build a proxy for the Directory that checks if callers exceed a given quota of disk space.

Research

I have mentioned above that most common languages don't fit this post's description. But there are languages that do, a prime example is E. In fact, there is a whole area of research for dealing with security in this manner, it's called "object capability security".

I'm not really a security guy, I got interested in the area due to the implications for language and system design. If you got interested, for any reason, please check out Mark Miller's work. He is the creator of the E language and the javascript-based Caja project. His thesis is very readable.