Above the Blue

Thursday, 11 May 2017

Investigating Some New Programming Languages

Pony

Having had some experience with the actor model and Akka.NET I had been reading up and watching the odd video segment on the Pony programming language that is currently under active development. Pony is described as

"an open-source, object-oriented, actor-model, capabilities-secure, high performance programming language."

Most of that description will be familiar to mainstream developers. "Actor-model" means it supports the actor model, that most developers will not be familiar with. "Capabilities-secure" is something a bit more mysterious, that I will get to in a minute. "High performance" in this context means "performance comparable to C/C++." It compiles ahead of time to native code, as contrasted with languages that target the .NET Common Language Runtime (CLR) or the Java Virtual Machine (JVM).

The actor model can be implemented in a number of programming languages, via frameworks. So, for example, on the JVM and .NET there is the Akka toolkit, which can be consumed by JVM- or .NET-supported languages, e.g., Java, Scala, C#, F#, etc. And you can create an actor framework in C++ if you want.

But, in the same way that you can create an object-oriented framework in C, while its being far easier to use a language that supports the concepts natively, e.g., C++; Pony is a language that supports actors natively. In that way, it is conceptually similar to Erlang, the difference being that the latter is a dynamic/static functional language while Pony is a static, native code OO language that supports a functional style if desired.

As a simple illustration of supporting actors natively, Pony has both class and actor types as language elements. Pony syntax looks like a cross between Eiffel and Scala, but closer to the former. As such, Pony code is very readable but the language itself is conceptually quite tough. (See "capabilities security," aka "reference capabilities," below.)

What is Pony trying to achieve? Well, firstly the appeal of the actor model is that it provides a higher level abstraction over concurrent and parallel computation and is the increasingly preferred approach for programming distributed applications in a multi-core world. So far, so good. Why not use something like Akka on the JVM? Pony originated in the financial sector where the developers were working on trading applications. These typically require both high performance and low latency. While the likes of Akka are used in that sector they still also use concurrent C++ since the stronger the real-time requirements the more something like C++ becomes necessary, as the JVM supports garbage collection and GC pauses can be a hindrance or even unacceptable in such systems. It would be nicer and less error-prone to make use of an actor model framework in C++ but there are no mature ones currently available.

The main advantage of the actor model compared to traditional concurrency approaches is its avoidance of deadlocks and (easier) avoidance of race conditions.

But Pony wanted to do better than this. It definitely wanted memory management but wanted to improve on traditional GC, so as to avoid the latency issue. It fine-grains GC to per-actor, so it's not a stop-the-world affair.

It also wanted to completely banish dead locks and race conditions and ensure this at compile time. This is what "capabilities-secure" is all about. It ensures the avoidance of deadlocks and race conditions by some subtle extensions to the type system. This is the main innovation of Pony.

As such, Pony makes a number of bold claims.

Here are a few...

It's type safe. Really type safe. There's a mathematical proof and everything.
It's memory safe. Ok, this comes with type safe, but it's still interesting. There are no dangling pointers, no buffer overruns, heck, the language doesn't even have the concept of null!
It's exception safe. There are no runtime exceptions. All exceptions have defined semantics, and they are always handled.
It's data-race free. Pony doesn't have locks or atomic operations or anything like that. Instead, the type system ensures at compile time that your concurrent program can never have data races. So you can write highly concurrent code and never get it wrong.
It's deadlock free. This one is easy, because Pony has no locks at all! So they definitely don't deadlock, because they don't exist.

The capabilities-security is by far the hardest feature of Pony for newcomers to grasp. It's rather like the difficulty in transitioning from procedural to object-oriented code, or from OO to functional. The Rust programming language, that I've barely looked at, has some similarly difficult concepts, partially addressing the same issues I think.

Pony is still at a very early stage of development. But there is a very readable tutorial. It is also usable fairly easily via Docker. In fact, it was my initial motivation for installing Docker a while back. There is also a Visual Studio Code extension for basic syntax highlighting, although it's not completely up-to-date, but better than nothing.

I don't know how far away from 1.0 Pony is at the moment but it's something to keep an eye on. It has some interesting ideas that I'm sure will gain some traction either with Pony or via adoption in other languages.

Go

From the Go FAQ

"Go is an attempt to combine the ease of programming of an interpreted, dynamically typed language with the efficiency and safety of a statically typed, compiled language."

Other goals were fast compilation and easy (or easier) concurrency for a distributed applications world. It was positioned originally as a possible alternative to C and C++ at least for certain tasks but in practice it has been picked up more by the dynamic languages crowd. So it has turned out to be an extra string in the bow for Python developers, who want more performance and scalability combined with concurrency.

Go shuns object orientation and generics, although the former is not quite true. It has objects but no formal inheritance but the modern philosophy is to favour composition over inheritance anyway, while having a shallow inheritance hierarchy. They say they are open to adding generics at a later stage.

Go is opinionated, uses a C-like syntax and enforces a programming style, specifically K & R, similar to Java and JavaScript. Departing from this is a compilation error. Unused variables and packages also generate compilation errors. These rules lead to very clean-looking code.

Go does not have exception handling (although there is a stop the world "panic," intended to be used when the application really can't proceed).

The normal way of handling errors is via return values. This is achieved quite conveniently via Go's multiple return values feature.

Go is way easier to learn than Pony. This is not a slight against Pony. Its goals are different. Go's approach to concurrency is similar to Pony's. It is based on message passing, as in the actor model, but is less formalised. But you can formalise an actor model on top of it. In fact, there is at least one such framework in development as I write.

However, it is still possible to create deadlocks in Go, unlike in Pony. But Go is able to detect deadlocks at runtime and terminate the program, explaining why.

Go uses allegedly very efficient garbage collection but it is not as fine-grained as Pony's per-actor GC.

Rust

Unlike with the other two languages I've yet to even dabble in Rust, though it keeps popping up in the tech press. Rust seems on the surface to occupy the same space as Go. It has some of the same concerns, e.g., safe, concurrent programming. But Rust is aimed much more squarely at systems programming, uses a sophisticated form of reference counting for memory management and appears to be a worthy alternative to systems-oriented C/C++. It appears to have something similar to Pony's reference capabilities, specifically its idea of reference ownership. But, at the time of writing, I know nothing about it.

Conclusion

Summing up, conceptually, it appears that Go and Rust intersect in some areas, but Rust has a different rationale. Rust and Pony intersect in some areas but Pony has a different rationale. E.g., Rust and Pony both aimed at eliminating data races via safe referencing. Go isn’t quite as thoroughgoing in this respect, although it does make it easier to tame them compared to traditional approaches.. But Go is aimed at fast compile times and simplicity. Rust and Pony aren’t. But all three are native and comparable to C/C++ in raw performance.

Tuesday, 5 July 2016

Backing Up Files To Cloud Storage

I have an application that backs up files to cloud storage such as OneDrive. Manually it is easy to do this on a PC using Windows Explorer. Just copy and paste files of interest to the local OneDrive folder. How could I automate this? If I just wanted to back up files in a fairly inefficient manner I could write a .NET console application that does simple file copy operations on folders of interest.

But, unlike for my local backups, I didn’t necessarily want all files to be readable. I found a free encryption application that was also programmable from C#. However, this is restricted to encrypting folders not files. It is easy to get around this. Programmatically zip up the folder and encrypt the zip file instead.

Having done that, then programmatically copy the encrypted zip to the OneDrive folder. I can then use Windows Task Scheduler to run the application at regular intervals.

Local Backup

I currently have three backups scheduled. One of them is a differential backup using SyncToy. So it detects the changes since the last backup and just does those. So far my cloud backup backs up everything every time. Not very efficient. But also, as I’m backing up over the internet, it’s unnecessarily eating into my data allowance.

Comparing Zip Files

I found a tool, ZipDiff, that compares zip files looking for differences. For each zipped folder I can run this and then only backup when something has changed. I might still have a big backup as each zip file can itself be quite big but it’s better than unnecessarily backing up several zipped files when nothing has changed.

Parallel Operation

Roughly speaking, for each folder, I need to

Zip
Encrypt (optionally)
Backup

This is easily parallelisable (embarrassingly parallel, as they say). So I can use a parallel for loop. Handling errors requires some care though. One scenario is that certain types of file cause the zip operation to fail if the file is in use. Microsoft Word document is one such type. However, I wanted the algorithm to continue processing other folders in such cases instead of terminating. This requires a loop that looks like below.

try
{
    BackupEncryptedToOneDrive(sourceFolderPathsForEncryption);
}
catch (AggregateException ae)
{
    LogAggregateErrors(ae);
}

private static void BackupEncryptedToOneDrive(IEnumerable<string> sourceFolderPathsForEncryption)
{
    Console.WriteLine(LogMessageParts.FoldersForEncryption);
    Logger.Info(LogMessageParts.FoldersForEncryption);
    Console.WriteLine(Environment.NewLine);

    var exceptions = new ConcurrentQueue<Exception>();

    Parallel.ForEach(sourceFolderPathsForEncryption, path =>
    {
        try
        {
            Console.WriteLine(LogMessageParts.Processing, path);
            Logger.Info(LogMessageParts.Processing, path);

            if (TryCreateZip(path))
            {
                Encrypt(path);
                BackupToOneDrive(path);
            }
            else
            {
                string noChangesDetected = string.Format("No changes detected in {0}...", path);
                Console.WriteLine(noChangesDetected);
                Logger.Info(noChangesDetected);
            }
        }
        catch (Exception ex)
        {
            exceptions.Enqueue(ex);
        }
    });

    Console.WriteLine(Environment.NewLine);

    if (exceptions.Any())
        throw new AggregateException(exceptions);
}

private static void LogAggregateErrors(AggregateException ae)
{
    ae = ae.Flatten(); // flatten tree to process exceptions at the leaves
    foreach (var ex in ae.InnerExceptions) LogError(ex);
}

The idea here is that we queue up the exceptions from each parallel iteration, wrap them up in an AggregateException and then unwrap and log them at the top level. So a failure in one parallel iteration still allows the others to run to completion.

Thursday, 29 October 2015

Exploring Akka.NET for Concurrency and Distributed Computing

Akka.NET is described as “a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on .NET & Mono.” It is a port of the Akka framework for the JVM written in Scala. Its initial release was in April 2015, not long after Microsoft’s similar cloud-oriented Project Orleans (February 2015). Orleans is described as “a framework that provides a straightforward approach to building distributed high-scale computing applications, without the need to learn and apply complex concurrency or other scaling patterns.”
Each of these frameworks is based on the Actor Model of concurrency of which more later.

Background

I first heard of Akka via a polyglot developer colleague who has extensive experience of both Java and .NET. He happened to get into some Scala development and was fortunate enough to get some experience with Akka. Later on I started encountering various references to .NET Actor frameworks/libraries, almost all in their very early stages. In February 2014 I came across a link to Roger Johansson’s Pigeon project in Github that later became Akka.NET. A year later via my F# Weekly feed I saw that Akka.NET was in beta, so I browsed to the site and was amazed at how much information was there. There was also a Visual Studio Nuget package that I tried and it “just worked,” no faffing around with configuration. That’s not always the case with open source projects. Then a few weeks after that it reached 1.0.

The Actor Model of Concurrency

The Actor Model in computer science is “a mathematical model of concurrent computation that treats ‘actors’ as the universal primitives of concurrent computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.”
The Actor Model was invented by Carl Hewitt in 1973 and you can find him explaining the basic ideas at Microsoft’s Channel 9. This is also available on YouTube should you wish to view it there.
“According to Carl Hewitt, unlike previous models of computation, the Actor model was inspired by physics, including general relativity and quantum mechanics.”
Wow! But don't worry. You don't need to understand general relativity and quantum mechanics to get started!
One way of thinking about the Actor Model is by analogy to garbage collection or other automated memory management schemes. You can view garbage collection as providing a high-level abstraction over manual memory management. Similarly you can view the Actor Model as providing a high-level abstraction over manual thread management and synchronization. The reason why the Actor Model is attracting a lot of attention now is due to the rise of multiple processors and multi-cores combined with the growth of the internet and highly distributed computing. Actor-based frameworks such as Akka and Orleans are more easily able to handle these scenarios, freeing the developer to concentrate on solving business problems rather than getting bogged down in “low-level” concurrency issues.

Akka.NET

Akka.NET provides an actor system that the user typically arranges into a hierarchy (tree) of actors that communicate with each other via immutable messages. Actors supervise the actors directly below them in the tree and are responsible for handling their failures. When an actor crashes, its parent can either restart or stop it, or escalate the failure up the hierarchy of actors. It is this that enables “self-healing” – fault tolerance and resilience.
Each actor has its own state that is not shared with other actors. Actors send messages to other actors asynchronously so that they don’t block. Actors process received messages one at a time. They can also determine how to respond to the next message received. This is called switchable behaviour. Supervision and switchable behaviours are two of the “killer” features of the Actor Model.
Well, that’s the basic idea. There are a lot more features available but I hope this gives you a flavour. Apart from the Akka.NET site you can also find some excellent, well-written blog posts by Petabridge (one of the creators of the framework). They also provide a free online Bootcamp. If you have a subscription to Pluralsight then, at the time of writing, there are four excellent courses on Akka.NET.

Monday, 16 February 2015

JavaScript Server-Side Logging with JSNlog

Web applications have become increasingly JavaScript-heavy in recent years as we’ve moved to richer and much more responsive web applications. It’s fine debugging JavaScript errors in the browser during development but what about in deployed applications? JSNlog is an open source framework that enables this and can be used in combination with standard .NET logging frameworks such as NLog, log4Net and Elmah. Below I show an example of how to use it with NLog.

Installing NLog

NLog has an installer that’s worth running once, as it supplies some Visual Studio item templates and a code snippet for declaring a logger instance.

private static NLog.Logger logger = NLog.LogManager.GetCurrentClassLogger();

But it’s not essential. You can install it via NuGet. You will need to run both of these commands.

Install-Package NLog

Install-Package NLog.Config

The latter adds a config file (NLog.Config). This is where you declare your log files and logging rules. For example

<targets>
  <!-- add your targets here -->
  <target name="logfile" xsi:type="File" fileName="${basedir}/file.txt" />
</targets> 
<rules>
  <!-- add your logging rules here -->
  <logger name="*" minlevel="Info" writeTo="logfile" />
</rules>

Logging a Message From NLog

Suppose we have a ASP.NET MVC application. After setting up the above in the Home controller edit it like this.

using NLog;

namespace WebApplicationNLog2.Controllers
{
    public class HomeController : Controller
    {
        private static Logger logger = LogManager.GetCurrentClassLogger();

        public ActionResult Index()
        {
            logger.Info("Sample trace message");
            return View();
        }
}

Then a message is written to the file file.txt in the project folder. It will look something like this.

2015-02-13 12:32:22.5442|INFO|WebApplicationNLog2.Controllers.HomeController|Sample trace message

Installing JSNlog

There is a specific NuGet package to go with the logging framework we happen to be using. So for this example it is:

Install-Package JSNLog.NLog

This installs the dependent package JSNlog among others and also updates the Web.Config as required.

Logging JavaScript

Let’s place some arbitrary JavaScript in the Home controller’s Index view.

First we need to configure JSNlog by placing this line before any script tag that uses JSNlog.

@Html.Raw(JSNLog.JavascriptLogging.Configure())

In a real application we would most likely place this in _Layout.cshtml. Now we can start logging.

<script type="text/javascript">
    JL().info("This is a log message");
</script>

Then a message is written to the file file.txt in the project folder. It will look something like this.

2015-02-16 11:27:55.7520|INFO|ClientRoot|This is a log message

All of the logging levels and layout rules that are configurable in frameworks such as NLog and log4net are carried over to the logging of JavaScript in the same way.

Thursday, 9 February 2012

Web Browser Process Statistics Using Windows PowerShell

I use a number of web browsers on my Windows PC. One of them is Google Chrome, which I have been using from not long after its initial release. From Wikipedia: “A multi-process architecture is implemented in Chrome where, by default, a separate process is allocated to each site instance and plugin.” This makes it awkward to work out its memory consumption. It is in fact possible to obtain this information from Chrome itself, though I only discovered that quite recently. Chrome has its own task manager with which you can report such statistics. Tools –> Task Manager –> Stats for nerds displays the results in a tab called About Memory. It also reports stats for other running browsers. Here are some stats from the top of the About Memory tab:

Notice that here it only reports the usage for the Chrome processes minus plugins and extensions. To get the total figure you need to view the figure at the bottom.

Windows PowerShell is also able to calculate the total memory consumption by summing up all the processes named Chrome:

$p = (Get-Process Chrome | Measure-Object -Sum WorkingSet).Sum / 1024

Write-Host "Total = "$p" K"

This produces a similar result (consumption fluctuates from moment to moment):

Total = 621508 K

Chrome’s About Memory also produces stats for Firefox. However, once again this excludes plugins and extensions. So Chrome doesn’t help us out here. We can write similar code for Firefox but this time we also need to include another process called plugin-container of which there may be zero or more depending on whether the current Firefox instance has had to start one up or not (i.e., whether user has happened to run Flash or a PDF reader). The code for this is slightly more involved:

$f = (Get-Process Firefox | Measure-Object -Sum WorkingSet).Sum / 1024

Write-Host "Firefox Total = "$f" K"

$p = (Get-Process "plugin-container" -ErrorAction SilentlyContinue | Measure-Object -Sum WorkingSet).Sum / 1024

Write-Host "Plugin Container Total = "$p" K"

$c = $f + $p

Write-Host "Combined Total = "$c" K"

The first part is the same except for substituting Firefox for Chrome. Then we define another variable for summing up the plugin-container processes. Adding the two variables together gives us the total consumption.

Firefox Total = 556116 K
Plugin Container Total = 45536 K
Combined Total = 601652 K

But notice there’s some extra code we had to use:

-ErrorAction SilentlyContinue

This is required because if there are no active plugin-container processes PowerShell will report an error. The SilentlyContinue argument does what it says.

The current release of PowerShell is v2.0. It is included by default in Windows 7 and Windows Server 2008 R2. It is also available as a free download for Windows XP SP3, Vista and Servers 2003 and 2008.

Microsoft’s package manager, NuGet, for Visual Studio 2010 also makes use of PowerShell in its console window. PowerShell comes with a basic script editor supplied by Microsoft but there are more powerful IDEs out there. A good one is PowerGUI, which also has excellent IntelliSense amongst other capabilities. It also has an add-in for Visual Studio 2010 if desired.

Monday, 23 January 2012

New Year, New Language

Functional programming languages are all the rage at the moment. They’re well-suited to parallel programming and the multi-core world. On the Microsoft .NET platform we have F#. I’ve made one or two attempts at learning F# before but lost heart once the going got tough. This time around I’ve decided to make more of an effort. I’ve found that it helps to try more than one learning source as they differ in the degree of explanation they give for each concept.

Thus far I am consulting primarily F# Programming, Real World Functional Programming (online partial version of the book) and MSDN’s F# Language Reference.

I’ve been thinking about whether the learning-curve from procedural to object-oriented programming is greater than that from OO (or procedural) to functional.

I think the harder part about going from procedural to OO was not the mechanics but OO design perhaps. Whereas with functional I think even the mechanics are quite difficult.

However, it could be that I’ve just forgotten how difficult the procedural to OO transition was!

One initial difficulty with F#, especially for those coming from a C-syntax background, is F#’s syntax. It does look quite alien. Syntax itself should not be that big a deal but when combined with new concepts it does add to the mental load, especially once examples start to get elaborate.

A similar language on the Java JVM is Scala. Its syntax is a cross between C-syntax and Ruby/Python’s. I looked briefly at Scala some time ago and it does seem more accessible initially. Though once you get beyond the basics it becomes as scary as F#! A colleague of mine who’s been using Scala commercially for many months tells me it’s a matter of practice. Blogger Labels: Functional,.NET,Microsoft,F#,Java,Scala,Ruby,Python

Thursday, 7 July 2011

Reactive Extensions 1.0 Stable is Released

Some months ago Microsoft made Reactive Extensions (Rx) an officially supported product and moved it out of Dev Labs to its new site. On June 29th it was officially released as version 1.0. It now also has some very accessible starter documentation in MSDN. Until now documentation has been scattered between videos, blogs, hands on labs and the MSDN Rx forum.

Rx is also consumable from LINQPad. LINQPad subscribes to the observables that you dump. For example the example below writes “Hello World” every second but stops after the first five. If we removed the call to Take(5) it would run for ever. In that case you can stop it by hitting the Stop button in LINQPad.