Above the Blue

Thursday, 11 May 2017

Investigating Some New Programming Languages

Pony

Having had some experience with the actor model and Akka.NET I had been reading up and watching the odd video segment on the Pony programming language that is currently under active development. Pony is described as

"an open-source, object-oriented, actor-model, capabilities-secure, high performance programming language."

Most of that description will be familiar to mainstream developers. "Actor-model" means it supports the actor model, that most developers will not be familiar with. "Capabilities-secure" is something a bit more mysterious, that I will get to in a minute. "High performance" in this context means "performance comparable to C/C++." It compiles ahead of time to native code, as contrasted with languages that target the .NET Common Language Runtime (CLR) or the Java Virtual Machine (JVM).

The actor model can be implemented in a number of programming languages, via frameworks. So, for example, on the JVM and .NET there is the Akka toolkit, which can be consumed by JVM- or .NET-supported languages, e.g., Java, Scala, C#, F#, etc. And you can create an actor framework in C++ if you want.

But, in the same way that you can create an object-oriented framework in C, while its being far easier to use a language that supports the concepts natively, e.g., C++; Pony is a language that supports actors natively. In that way, it is conceptually similar to Erlang, the difference being that the latter is a dynamic/static functional language while Pony is a static, native code OO language that supports a functional style if desired.

As a simple illustration of supporting actors natively, Pony has both class and actor types as language elements. Pony syntax looks like a cross between Eiffel and Scala, but closer to the former. As such, Pony code is very readable but the language itself is conceptually quite tough. (See "capabilities security," aka "reference capabilities," below.)

What is Pony trying to achieve? Well, firstly the appeal of the actor model is that it provides a higher level abstraction over concurrent and parallel computation and is the increasingly preferred approach for programming distributed applications in a multi-core world. So far, so good. Why not use something like Akka on the JVM? Pony originated in the financial sector where the developers were working on trading applications. These typically require both high performance and low latency. While the likes of Akka are used in that sector they still also use concurrent C++ since the stronger the real-time requirements the more something like C++ becomes necessary, as the JVM supports garbage collection and GC pauses can be a hindrance or even unacceptable in such systems. It would be nicer and less error-prone to make use of an actor model framework in C++ but there are no mature ones currently available.

The main advantage of the actor model compared to traditional concurrency approaches is its avoidance of deadlocks and (easier) avoidance of race conditions.

But Pony wanted to do better than this. It definitely wanted memory management but wanted to improve on traditional GC, so as to avoid the latency issue. It fine-grains GC to per-actor, so it's not a stop-the-world affair.

It also wanted to completely banish dead locks and race conditions and ensure this at compile time. This is what "capabilities-secure" is all about. It ensures the avoidance of deadlocks and race conditions by some subtle extensions to the type system. This is the main innovation of Pony.

As such, Pony makes a number of bold claims.

Here are a few...

It's type safe. Really type safe. There's a mathematical proof and everything.
It's memory safe. Ok, this comes with type safe, but it's still interesting. There are no dangling pointers, no buffer overruns, heck, the language doesn't even have the concept of null!
It's exception safe. There are no runtime exceptions. All exceptions have defined semantics, and they are always handled.
It's data-race free. Pony doesn't have locks or atomic operations or anything like that. Instead, the type system ensures at compile time that your concurrent program can never have data races. So you can write highly concurrent code and never get it wrong.
It's deadlock free. This one is easy, because Pony has no locks at all! So they definitely don't deadlock, because they don't exist.

The capabilities-security is by far the hardest feature of Pony for newcomers to grasp. It's rather like the difficulty in transitioning from procedural to object-oriented code, or from OO to functional. The Rust programming language, that I've barely looked at, has some similarly difficult concepts, partially addressing the same issues I think.

Pony is still at a very early stage of development. But there is a very readable tutorial. It is also usable fairly easily via Docker. In fact, it was my initial motivation for installing Docker a while back. There is also a Visual Studio Code extension for basic syntax highlighting, although it's not completely up-to-date, but better than nothing.

I don't know how far away from 1.0 Pony is at the moment but it's something to keep an eye on. It has some interesting ideas that I'm sure will gain some traction either with Pony or via adoption in other languages.

Go

From the Go FAQ

"Go is an attempt to combine the ease of programming of an interpreted, dynamically typed language with the efficiency and safety of a statically typed, compiled language."

Other goals were fast compilation and easy (or easier) concurrency for a distributed applications world. It was positioned originally as a possible alternative to C and C++ at least for certain tasks but in practice it has been picked up more by the dynamic languages crowd. So it has turned out to be an extra string in the bow for Python developers, who want more performance and scalability combined with concurrency.

Go shuns object orientation and generics, although the former is not quite true. It has objects but no formal inheritance but the modern philosophy is to favour composition over inheritance anyway, while having a shallow inheritance hierarchy. They say they are open to adding generics at a later stage.

Go is opinionated, uses a C-like syntax and enforces a programming style, specifically K & R, similar to Java and JavaScript. Departing from this is a compilation error. Unused variables and packages also generate compilation errors. These rules lead to very clean-looking code.

Go does not have exception handling (although there is a stop the world "panic," intended to be used when the application really can't proceed).

The normal way of handling errors is via return values. This is achieved quite conveniently via Go's multiple return values feature.

Go is way easier to learn than Pony. This is not a slight against Pony. Its goals are different. Go's approach to concurrency is similar to Pony's. It is based on message passing, as in the actor model, but is less formalised. But you can formalise an actor model on top of it. In fact, there is at least one such framework in development as I write.

However, it is still possible to create deadlocks in Go, unlike in Pony. But Go is able to detect deadlocks at runtime and terminate the program, explaining why.

Go uses allegedly very efficient garbage collection but it is not as fine-grained as Pony's per-actor GC.

Rust

Unlike with the other two languages I've yet to even dabble in Rust, though it keeps popping up in the tech press. Rust seems on the surface to occupy the same space as Go. It has some of the same concerns, e.g., safe, concurrent programming. But Rust is aimed much more squarely at systems programming, uses a sophisticated form of reference counting for memory management and appears to be a worthy alternative to systems-oriented C/C++. It appears to have something similar to Pony's reference capabilities, specifically its idea of reference ownership. But, at the time of writing, I know nothing about it.

Conclusion

Summing up, conceptually, it appears that Go and Rust intersect in some areas, but Rust has a different rationale. Rust and Pony intersect in some areas but Pony has a different rationale. E.g., Rust and Pony both aimed at eliminating data races via safe referencing. Go isn’t quite as thoroughgoing in this respect, although it does make it easier to tame them compared to traditional approaches.. But Go is aimed at fast compile times and simplicity. Rust and Pony aren’t. But all three are native and comparable to C/C++ in raw performance.

Tuesday, 5 July 2016

Backing Up Files To Cloud Storage

I have an application that backs up files to cloud storage such as OneDrive. Manually it is easy to do this on a PC using Windows Explorer. Just copy and paste files of interest to the local OneDrive folder. How could I automate this? If I just wanted to back up files in a fairly inefficient manner I could write a .NET console application that does simple file copy operations on folders of interest.

But, unlike for my local backups, I didn’t necessarily want all files to be readable. I found a free encryption application that was also programmable from C#. However, this is restricted to encrypting folders not files. It is easy to get around this. Programmatically zip up the folder and encrypt the zip file instead.

Having done that, then programmatically copy the encrypted zip to the OneDrive folder. I can then use Windows Task Scheduler to run the application at regular intervals.

Local Backup

I currently have three backups scheduled. One of them is a differential backup using SyncToy. So it detects the changes since the last backup and just does those. So far my cloud backup backs up everything every time. Not very efficient. But also, as I’m backing up over the internet, it’s unnecessarily eating into my data allowance.

Comparing Zip Files

I found a tool, ZipDiff, that compares zip files looking for differences. For each zipped folder I can run this and then only backup when something has changed. I might still have a big backup as each zip file can itself be quite big but it’s better than unnecessarily backing up several zipped files when nothing has changed.

Parallel Operation

Roughly speaking, for each folder, I need to

Zip
Encrypt (optionally)
Backup

This is easily parallelisable (embarrassingly parallel, as they say). So I can use a parallel for loop. Handling errors requires some care though. One scenario is that certain types of file cause the zip operation to fail if the file is in use. Microsoft Word document is one such type. However, I wanted the algorithm to continue processing other folders in such cases instead of terminating. This requires a loop that looks like below.

try
{
    BackupEncryptedToOneDrive(sourceFolderPathsForEncryption);
}
catch (AggregateException ae)
{
    LogAggregateErrors(ae);
}

private static void BackupEncryptedToOneDrive(IEnumerable<string> sourceFolderPathsForEncryption)
{
    Console.WriteLine(LogMessageParts.FoldersForEncryption);
    Logger.Info(LogMessageParts.FoldersForEncryption);
    Console.WriteLine(Environment.NewLine);

    var exceptions = new ConcurrentQueue<Exception>();

    Parallel.ForEach(sourceFolderPathsForEncryption, path =>
    {
        try
        {
            Console.WriteLine(LogMessageParts.Processing, path);
            Logger.Info(LogMessageParts.Processing, path);

            if (TryCreateZip(path))
            {
                Encrypt(path);
                BackupToOneDrive(path);
            }
            else
            {
                string noChangesDetected = string.Format("No changes detected in {0}...", path);
                Console.WriteLine(noChangesDetected);
                Logger.Info(noChangesDetected);
            }
        }
        catch (Exception ex)
        {
            exceptions.Enqueue(ex);
        }
    });

    Console.WriteLine(Environment.NewLine);

    if (exceptions.Any())
        throw new AggregateException(exceptions);
}

private static void LogAggregateErrors(AggregateException ae)
{
    ae = ae.Flatten(); // flatten tree to process exceptions at the leaves
    foreach (var ex in ae.InnerExceptions) LogError(ex);
}

The idea here is that we queue up the exceptions from each parallel iteration, wrap them up in an AggregateException and then unwrap and log them at the top level. So a failure in one parallel iteration still allows the others to run to completion.

Thursday, 29 October 2015

Exploring Akka.NET for Concurrency and Distributed Computing

Akka.NET is described as “a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on .NET & Mono.” It is a port of the Akka framework for the JVM written in Scala. Its initial release was in April 2015, not long after Microsoft’s similar cloud-oriented Project Orleans (February 2015). Orleans is described as “a framework that provides a straightforward approach to building distributed high-scale computing applications, without the need to learn and apply complex concurrency or other scaling patterns.”
Each of these frameworks is based on the Actor Model of concurrency of which more later.

Background

I first heard of Akka via a polyglot developer colleague who has extensive experience of both Java and .NET. He happened to get into some Scala development and was fortunate enough to get some experience with Akka. Later on I started encountering various references to .NET Actor frameworks/libraries, almost all in their very early stages. In February 2014 I came across a link to Roger Johansson’s Pigeon project in Github that later became Akka.NET. A year later via my F# Weekly feed I saw that Akka.NET was in beta, so I browsed to the site and was amazed at how much information was there. There was also a Visual Studio Nuget package that I tried and it “just worked,” no faffing around with configuration. That’s not always the case with open source projects. Then a few weeks after that it reached 1.0.

The Actor Model of Concurrency

The Actor Model in computer science is “a mathematical model of concurrent computation that treats ‘actors’ as the universal primitives of concurrent computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.”
The Actor Model was invented by Carl Hewitt in 1973 and you can find him explaining the basic ideas at Microsoft’s Channel 9. This is also available on YouTube should you wish to view it there.
“According to Carl Hewitt, unlike previous models of computation, the Actor model was inspired by physics, including general relativity and quantum mechanics.”
Wow! But don't worry. You don't need to understand general relativity and quantum mechanics to get started!
One way of thinking about the Actor Model is by analogy to garbage collection or other automated memory management schemes. You can view garbage collection as providing a high-level abstraction over manual memory management. Similarly you can view the Actor Model as providing a high-level abstraction over manual thread management and synchronization. The reason why the Actor Model is attracting a lot of attention now is due to the rise of multiple processors and multi-cores combined with the growth of the internet and highly distributed computing. Actor-based frameworks such as Akka and Orleans are more easily able to handle these scenarios, freeing the developer to concentrate on solving business problems rather than getting bogged down in “low-level” concurrency issues.

Akka.NET

Akka.NET provides an actor system that the user typically arranges into a hierarchy (tree) of actors that communicate with each other via immutable messages. Actors supervise the actors directly below them in the tree and are responsible for handling their failures. When an actor crashes, its parent can either restart or stop it, or escalate the failure up the hierarchy of actors. It is this that enables “self-healing” – fault tolerance and resilience.
Each actor has its own state that is not shared with other actors. Actors send messages to other actors asynchronously so that they don’t block. Actors process received messages one at a time. They can also determine how to respond to the next message received. This is called switchable behaviour. Supervision and switchable behaviours are two of the “killer” features of the Actor Model.
Well, that’s the basic idea. There are a lot more features available but I hope this gives you a flavour. Apart from the Akka.NET site you can also find some excellent, well-written blog posts by Petabridge (one of the creators of the framework). They also provide a free online Bootcamp. If you have a subscription to Pluralsight then, at the time of writing, there are four excellent courses on Akka.NET.

Monday, 16 February 2015

JavaScript Server-Side Logging with JSNlog

Web applications have become increasingly JavaScript-heavy in recent years as we’ve moved to richer and much more responsive web applications. It’s fine debugging JavaScript errors in the browser during development but what about in deployed applications? JSNlog is an open source framework that enables this and can be used in combination with standard .NET logging frameworks such as NLog, log4Net and Elmah. Below I show an example of how to use it with NLog.

Installing NLog

NLog has an installer that’s worth running once, as it supplies some Visual Studio item templates and a code snippet for declaring a logger instance.

private static NLog.Logger logger = NLog.LogManager.GetCurrentClassLogger();

But it’s not essential. You can install it via NuGet. You will need to run both of these commands.

Install-Package NLog

Install-Package NLog.Config

The latter adds a config file (NLog.Config). This is where you declare your log files and logging rules. For example

<targets>
  <!-- add your targets here -->
  <target name="logfile" xsi:type="File" fileName="${basedir}/file.txt" />
</targets> 
<rules>
  <!-- add your logging rules here -->
  <logger name="*" minlevel="Info" writeTo="logfile" />
</rules>

Logging a Message From NLog

Suppose we have a ASP.NET MVC application. After setting up the above in the Home controller edit it like this.

using NLog;

namespace WebApplicationNLog2.Controllers
{
    public class HomeController : Controller
    {
        private static Logger logger = LogManager.GetCurrentClassLogger();

        public ActionResult Index()
        {
            logger.Info("Sample trace message");
            return View();
        }
}

Then a message is written to the file file.txt in the project folder. It will look something like this.

2015-02-13 12:32:22.5442|INFO|WebApplicationNLog2.Controllers.HomeController|Sample trace message

Installing JSNlog

There is a specific NuGet package to go with the logging framework we happen to be using. So for this example it is:

Install-Package JSNLog.NLog

This installs the dependent package JSNlog among others and also updates the Web.Config as required.

Logging JavaScript

Let’s place some arbitrary JavaScript in the Home controller’s Index view.

First we need to configure JSNlog by placing this line before any script tag that uses JSNlog.

@Html.Raw(JSNLog.JavascriptLogging.Configure())

In a real application we would most likely place this in _Layout.cshtml. Now we can start logging.

<script type="text/javascript">
    JL().info("This is a log message");
</script>

Then a message is written to the file file.txt in the project folder. It will look something like this.

2015-02-16 11:27:55.7520|INFO|ClientRoot|This is a log message

All of the logging levels and layout rules that are configurable in frameworks such as NLog and log4net are carried over to the logging of JavaScript in the same way.

Thursday, 9 February 2012

Web Browser Process Statistics Using Windows PowerShell

I use a number of web browsers on my Windows PC. One of them is Google Chrome, which I have been using from not long after its initial release. From Wikipedia: “A multi-process architecture is implemented in Chrome where, by default, a separate process is allocated to each site instance and plugin.” This makes it awkward to work out its memory consumption. It is in fact possible to obtain this information from Chrome itself, though I only discovered that quite recently. Chrome has its own task manager with which you can report such statistics. Tools –> Task Manager –> Stats for nerds displays the results in a tab called About Memory. It also reports stats for other running browsers. Here are some stats from the top of the About Memory tab:

Notice that here it only reports the usage for the Chrome processes minus plugins and extensions. To get the total figure you need to view the figure at the bottom.

Windows PowerShell is also able to calculate the total memory consumption by summing up all the processes named Chrome:

$p = (Get-Process Chrome | Measure-Object -Sum WorkingSet).Sum / 1024

Write-Host "Total = "$p" K"

This produces a similar result (consumption fluctuates from moment to moment):

Total = 621508 K

Chrome’s About Memory also produces stats for Firefox. However, once again this excludes plugins and extensions. So Chrome doesn’t help us out here. We can write similar code for Firefox but this time we also need to include another process called plugin-container of which there may be zero or more depending on whether the current Firefox instance has had to start one up or not (i.e., whether user has happened to run Flash or a PDF reader). The code for this is slightly more involved:

$f = (Get-Process Firefox | Measure-Object -Sum WorkingSet).Sum / 1024

Write-Host "Firefox Total = "$f" K"

$p = (Get-Process "plugin-container" -ErrorAction SilentlyContinue | Measure-Object -Sum WorkingSet).Sum / 1024

Write-Host "Plugin Container Total = "$p" K"

$c = $f + $p

Write-Host "Combined Total = "$c" K"

The first part is the same except for substituting Firefox for Chrome. Then we define another variable for summing up the plugin-container processes. Adding the two variables together gives us the total consumption.

Firefox Total = 556116 K
Plugin Container Total = 45536 K
Combined Total = 601652 K

But notice there’s some extra code we had to use:

-ErrorAction SilentlyContinue

This is required because if there are no active plugin-container processes PowerShell will report an error. The SilentlyContinue argument does what it says.

The current release of PowerShell is v2.0. It is included by default in Windows 7 and Windows Server 2008 R2. It is also available as a free download for Windows XP SP3, Vista and Servers 2003 and 2008.

Microsoft’s package manager, NuGet, for Visual Studio 2010 also makes use of PowerShell in its console window. PowerShell comes with a basic script editor supplied by Microsoft but there are more powerful IDEs out there. A good one is PowerGUI, which also has excellent IntelliSense amongst other capabilities. It also has an add-in for Visual Studio 2010 if desired.

Monday, 23 January 2012

New Year, New Language

Functional programming languages are all the rage at the moment. They’re well-suited to parallel programming and the multi-core world. On the Microsoft .NET platform we have F#. I’ve made one or two attempts at learning F# before but lost heart once the going got tough. This time around I’ve decided to make more of an effort. I’ve found that it helps to try more than one learning source as they differ in the degree of explanation they give for each concept.

Thus far I am consulting primarily F# Programming, Real World Functional Programming (online partial version of the book) and MSDN’s F# Language Reference.

I’ve been thinking about whether the learning-curve from procedural to object-oriented programming is greater than that from OO (or procedural) to functional.

I think the harder part about going from procedural to OO was not the mechanics but OO design perhaps. Whereas with functional I think even the mechanics are quite difficult.

However, it could be that I’ve just forgotten how difficult the procedural to OO transition was!

One initial difficulty with F#, especially for those coming from a C-syntax background, is F#’s syntax. It does look quite alien. Syntax itself should not be that big a deal but when combined with new concepts it does add to the mental load, especially once examples start to get elaborate.

A similar language on the Java JVM is Scala. Its syntax is a cross between C-syntax and Ruby/Python’s. I looked briefly at Scala some time ago and it does seem more accessible initially. Though once you get beyond the basics it becomes as scary as F#! A colleague of mine who’s been using Scala commercially for many months tells me it’s a matter of practice. Blogger Labels: Functional,.NET,Microsoft,F#,Java,Scala,Ruby,Python

Thursday, 7 July 2011

Reactive Extensions 1.0 Stable is Released

Some months ago Microsoft made Reactive Extensions (Rx) an officially supported product and moved it out of Dev Labs to its new site. On June 29th it was officially released as version 1.0. It now also has some very accessible starter documentation in MSDN. Until now documentation has been scattered between videos, blogs, hands on labs and the MSDN Rx forum.

Rx is also consumable from LINQPad. LINQPad subscribes to the observables that you dump. For example the example below writes “Hello World” every second but stops after the first five. If we removed the call to Take(5) it would run for ever. In that case you can stop it by hitting the Stop button in LINQPad.

Thursday, 30 June 2011

Leveraging LINQPad’s Object Visualisation in Visual Studio

When you run a query in LINQPad it produces formatted output on the results tab that you can optionally export to Word, Excel or HTML. Here I show two techniques for visualising any arbitrary .NET object.

The first way is non-invasive, i.e., requires no change to your Visual Studio solution or compiled assemblies.

The second way is invasive but allows you to export any arbitrary .NET object to HTML by leveraging LINQPad’s Dump() extension method.

Technique 1: Add a Visual Studio LINQPad Debugger Visualizer

Download the LINQPad Visualizer from Google Code and follow the instructions from the point where it says “If using the download do this:.”

You should also remember to unblock the DLLs after downloading. Consider this simplified code with fields made public for brevity.

class User
{
    public int UserId;
    public List<Course> Courses;
    public List<Student> Students;
}

class Course
{
    public string CourseCode;
}

class Student
{
    public int StudentId;
}

Set up a collection of Users.

var users = new List<User> {
    new User {
        UserId = 1,
        Courses = new List<Course> {
            new Course {
                CourseCode = "ECU120"
            },
            new Course {
                CourseCode = "ECU121"
            },
            new Course {
                CourseCode = "ECU122"
            },
        },
        Students = new List<Student> {
            new Student {
                StudentId = 1
            },
            new Student {
                StudentId = 2
            },
        }
    },
    new User {
        UserId = 2,
        Courses = new List<Course> {
            new Course {
                CourseCode = "ECU124"
            },
            new Course {
                CourseCode = "ECU125"
            },
            new Course {
                CourseCode = "ECU126"
            },
        },
        Students = new List<Student> {
            new Student {
                StudentId = 1
            },
            new Student {
                StudentId = 3
            },
        }
    },
    new User {
        UserId = 3,
        Courses = new List<Course> {
            new Course {
                CourseCode = "BESU022"
            },
            new Course {
                CourseCode = "BESU023"
            },
            new Course {
                CourseCode = "ECT034"
            },
        },
        Students = new List<Student> {
            new Student {
                StudentId = 1
            },
            new Student {
                StudentId = 2
            },
        }
    },
};

users.Dump();

Then we can debug into it.

To do this set a watch for your object of interest. Here we are inspecting the users collection. In order to view its contents we need to enter

new System.WeakReference(users)

in the Name column as shown.

Clicking on the dropdown for Value shows two LINQPad debugger visualizers. The default visualizer is a JSON visualizer that allows us to see the contained values of objects that haven’t been marked as Serializable. Clicking on the visualizer pops up a Windows Form containing the object’s values.

In this case we lose the type information but we can still see all the values neatly laid out.

However, if we mark the User, Student and Course classes as Serializable…

[Serializable]
class User
{
    public int UserId;
    public List<Course> Courses;
    public List<Student> Students;
}

[Serializable]
class Course
{
    public string CourseCode;
}

[Serializable]
class Student
{
    public int StudentId;
}

Then select the lower LINQPad visualizer we get the type information as well and with neater layout.

Technique 2: Add a Reference to LINQPad.exe

For this technique we add a reference to LINQPad.exe that allows us to leverage LINQPad’s Dump() extension method for any object…

/// <summary>
/// LINQPad extension methods.
/// </summary>
public static class LinqPadExtensions
{
    /// <summary>
    /// Writes object properties to HTML 
    /// and displays them in default browser.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="o"></param>
    /// <param name="heading"></param>
    public static void Dump<T>(
        this T o, 
        string heading = null
    )
    {
        string localUrl = 
            Path.GetTempFileName() + ".html";
        using (
            var writer = 
                LINQPad.Util.CreateXhtmlWriter(true)
        )
        {
            if (!String.IsNullOrWhiteSpace(heading)) 
                writer.Write(heading);
            writer.Write(o);
            File.WriteAllText(localUrl, writer.ToString());
        }
        Process.Start(localUrl);
    }
}

Then we can write users.Dump() as in the example above. On execution this writes the object’s values to HTML and launches the default web browser. This produces the same results as the typed version above.

This method has been adapted from an example on StackOverflow. One issue to be aware of is that your project should target the full .NET 4, not the Client Profile.

Thursday, 18 November 2010

ASP.NET MVC and LINQ to SQL – LINQPad to the Rescue

Recently I was rewriting a broken algorithm in some ASP.NET MVC code. For part of the fix I needed to write some LINQ to SQL. I needed to do a comparison against a nullable field but I was thinking in C# not in SQL. Here is a simplified version of the real problem. Suppose we have a table of customers where LastName is a nullable field. We want to return customers whose last name is NULL.

string lastName = null;

var customers =
    from c in Customers
    where c.LastName == lastName
    select c;

This was unexpectedly returning no values. The correct solution is

var customers =
    from c in Customers
    where c.LastName == null
    select c;

The reason was due to the “IS NULL vs. = NULL” issue in the underlying SQL. This was easy to check by launching the excellent LINQPad application that allows you to view the underlying SQL after running a LINQ statement. I’d just assumed that from the C#/LINQ side it would automatically generate the appropriate code.

Comparing directly against null from the C# generates

SELECT [t0].[CustomerID], [t0].[FirstName], [t0].[LastName]

FROM [Customers] AS [t0]

WHERE [t0].[LastName] IS NULL

Comparing against a variable that’s null generates

DECLARE @p0 VarChar(1000) = null

SELECT [t0].[CustomerID], [t0].[FirstName], [t0].[LastName]

FROM [Customers] AS [t0]

WHERE [t0].[LastName] = @p0

In my real problem I was comparing against a nullable ID column and I wanted to return both null and non-null matches in one clause of a compound where. So I wanted to do a more elaborate version of this.

string name = (lastName != null) ? "Fred" : null;

var customers =
    from c in Customers
    where c.LastName == name
    select c;

But I was forced to use an if-else instead.

if (lastName != null)
{
    var customers =
        from c in Customers
        where c.LastName == lastName
        select c;
    // ...more code
}
else
{
    var customers =
        from c in Customers
        where c.LastName == null
        select c;
    // ...more code
}

I couldn’t think of a more concise way around this. Anyway, at least I solved the bigger problem of which this was a part. :)

Wednesday, 25 August 2010

JavaScript Unit Testing with QUnit

Recently, I’ve been trying to learn more about the Reactive Extensions library (currently still in Microsoft DevLabs) which I’ve blogged about previously. Since then Microsoft have pushed out a JavaScript version of the library and published a couple of Hands-on Labs for both the .NET and JavaScript versions. I thought I’d start looking at the JavaScript version as well. However, since the lab makes use of the jQuery JavaScript library it helps to be familiar with it too. So I’ve spent some time with jQuery as well. This in turn led me to realise that I needed to learn more about JavaScript in general.

In following Matthew Podwysocki’s Rx for JavaScript tutorials (not got very far yet) I noticed that he makes use of QUnit, a JavaScript unit testing library that is an offshoot of jQuery. So I made a diversion into this. It turns out that there are a number of unit testing libraries for JavaScript, but I’d not bothered to look into them before. Although QUnit is an offshoot of jQuery, and is what jQuery itself uses to test jQuery, it can be used for generic JavaScript. I have a number of JavaScript utility methods so after an initial play with QUnit I thought I’d try creating a few tests. Lo and behold I discovered a bug in the first method I tried. Ah, the benefits of unit testing.

There’s a one-page runnable example on the home page and a couple of tutorial links. Following the lead of the second tutorial I created a web site in Visual Studio, added the tester page and then separate “project under test” and unit tests JavaScript files.

<script type="text/javascript" src="Scripts/qunit.js"></script> 
<!-- Library -->  
<script type="text/javascript" src="Scripts/utility.js"></script>
<!-- Tests -->  
<script type="text/javascript" src="Scripts/tests.js"></script>

I have a method called isInteger(text) that checks whether a text entry is a [positive] integer. Its unit tests are:

/// <reference path="utility.js" />
module('isInteger Tests');

test('isInteger_positiveInteger_true', function () {
  var text = '32';
  ok(isInteger(text), text + ' is an integer');
})

test('isInteger_negativeInteger_false', function () {
  var text = '-32';
  ok(!isInteger(text), text + ' is not an integer');
})

test('isInteger_zero_true', function () {
  var text = '0';
  ok(isInteger(text), text + ' is an integer');
})

test('isInteger_decimal_false', function () {
  var text = '32.3';
  ok(!isInteger(text), text + ' is not an integer');
})

test('isInteger_alpha_false', function () {
  var text = 'abc';
  ok(!isInteger(text), text + ' is not an integer');
})

test('isInteger_alphanumeric_false', function () {
  var text = 'abc123';
  ok(!isInteger(text), text + ' is not an integer');
})

test('isInteger_whitespace_false', function () {
  var text = ' ';
  ok(!isInteger(text), 'Whitespace text is not an integer');
})

test('isInteger_empty_false', function () {
  var text = '';
  ok(!isInteger(text), text + 'Empty text is not an integer');
})

QUnit produces out put like this (showing tests 1 to 5 for isInteger with some tests collapsed):

This is just the simplest usage of QUnit. It has some more assertions and also an asynchronous testing capability. It is this that Matthew Podwysocki uses in one of his Rx for JavaScript blog posts.

Friday, 11 June 2010

A Useful New String Method in .NET 4.0

In .NET 2.0 the String class introduced a method, IsNullOrEmpty, that allows you to check whether a string is null or the Empty string. However, sometimes you also want to disallow a string that contains just white space. I created a static utility method called IsNullOrEmptyOrBlank that additionally checked for white space.

public static bool IsNullOrEmptyOrBlank(string s)
{
    return String.IsNullOrEmpty(s) || s.Trim().Length == 0;
}

In .NET 3.5 I turned this into an extension method.

public static bool IsNullOrEmptyOrBlank(this string s)
{
    return String.IsNullOrEmpty(s) || s.Trim().Length == 0;
}

Then to my pleasure I noticed that in .NET 4.0 Microsoft has fixed this oversight and added this functionality to the String class. The method is IsNullOrWhiteSpace. The description is: “Indicates whether a specified string is null, empty, or consists only of white-space characters.”

Kudos to Microsoft. :-)

Monday, 7 June 2010

Finding the Missing Number in a Sequence

This is one of those algorithm problems that come up in programmer interview tests from time to time. Suppose you receive a sequence of numbers from 1 to 100 in random order but one is missing. How do you find the missing number?

The laborious way would be to sort the numbers in ascending order and then loop until the difference in two successive numbers is 2. But this would be very slow if we had to sort 10 million numbers.

A slick solution I encountered was to make use of the formula for the sum of an arithmetic progression which, for a sequence of n consecutive numbers starting from 1, is n(n+1)/2. The full formula is n(2a + (n- 1)d)/2 where a is the first term and d is the common difference. If we substitute a = 1 and d = 1 then it simplifies to the first formula.

To find the missing number we simply add up the numbers from the random sequence and then subtract that sum from the sum given by the arithmetic progression formula. No sorting involved.

I thought I’d try this out using my unique random number generator class. This class takes any range of numbers and randomly shuffles them. So we can take a range of consecutive numbers from 1 to n and generate a random set. Then we can remove one at random and find it using the foregoing technique.

// Generate ordered sequence of numbers
int start = 1;
int count = 10;
var orderedNumbers = 
    Enumerable.Range(start, count);

// Print ordered numbers
Console.WriteLine("Ordered Numbers");
orderedNumbers
    .ToList()
    .ForEach(x => Console.Write(x + " "));
Console.WriteLine();

// Randomly shuffle them
var randomNumbers = new HashSet<int>();
var g = new UniqueRandomNumberGenerator(orderedNumbers);

// Print random numbers
Console.WriteLine("Random Numbers");
while (g.RemainingNumbersCount > 0)
{
    int number = g.NewRandomNumber();
    randomNumbers.Add(number);
    Console.Write(number + " ");
}
Console.WriteLine();

// Verify we have same numbers after random shuffle
Debug.Assert(randomNumbers.SetEquals(orderedNumbers));

// Randomly generate a number to remove
var rand = new Random();
int removedNumber = 
    rand.Next(1, orderedNumbers.Count() + 1);
Console.WriteLine(
    "Number to remove is " + removedNumber
);

// Remove it from the random numbers
randomNumbers.Remove(removedNumber);

// Print random numbers with number removed
Console.WriteLine(
    "Random Numbers with {0} missing", removedNumber
);
randomNumbers
    .ToList()
    .ForEach(x => Console.Write(x + " "));
Console.WriteLine();

// Sum random numbers
int randomNumbersSum = 0;
var enumerator = 
    randomNumbers.GetEnumerator();
while (enumerator.MoveNext())
{
    int number = enumerator.Current;
    randomNumbersSum += number;
}

// Sum ordered numbers using arithmetic progression formula
int n = orderedNumbers.Count();
int orderedNumbersSum = n * (n + 1) / 2;

// The missing number is the difference between 
// the sums of the ordered and random numbers
Console.WriteLine("Searching for missing number...");
int missingNumber = orderedNumbersSum - randomNumbersSum;
Console.WriteLine("Missing number is " + missingNumber);

I could have computed the sums using IEnumerable.Sum() on the two sets of numbers but I wanted to illustrate the algorithms.

Typical output for 10 numbers is:

Ordered Numbers

1 2 3 4 5 6 7 8 9 10

Random Numbers

10 3 5 9 1 7 4 2 8 6

Number to remove is 3

Random Numbers with 3 missing

10 5 9 1 7 4 2 8 6

Searching for missing number...

Missing number is 3

Tuesday, 30 March 2010

Reactive Extensions, Silverlight and WCF

Here I will illustrate how Reactive Extensions (Rx) can be applied to asynchronous calls to a WCF web service in a Silverlight application.

We start with a simple customer database consisting of a single table of first and last names.

ID First Last

1   Fred    McFarlane
4   Susan   McFarlane
6   David   Green
10 Joe     Bloggs

Then we set up a WCF service to return a collection of customers. I used LINQ to SQL to generate the CustomersDemo DataContext instance.

[ServiceContract(Namespace = "")]
[AspNetCompatibilityRequirements(
    RequirementsMode = 
    AspNetCompatibilityRequirementsMode.Allowed)
]
public class CustomersService
{
    private string connection =
        @"[My connection]";

    [OperationContract]
    public List<Customers> GetCustomers()
    {
        CustomersDemo db = new CustomersDemo(connection);
        var customers = from customer in db.Customers
                        select customer;
        return customers.ToList();
    }
}

Asynchronous Access Without Rx

To consume the service in the Silverlight client…

Make the asynchronous call in the Page constructor.

public Page()
{
    InitializeComponent();

    CustomersServiceClient client = 
        new CustomersServiceClient();
    client.GetCustomersCompleted += 
        new EventHandler<GetCustomersCompletedEventArgs>(
            client_GetCustomersCompleted
        );
    client.GetCustomersAsync();
}

Handle the completed event and display customers in a data grid.

void client_GetCustomersCompleted(
    object sender, 
    GetCustomersCompletedEventArgs e
)
{
    if (e.Error == null)
    {
        var customers = e.Result;
        dg.ItemsSource = customers;
    }
}

Results:

Asynchronous Access With Rx

In this case we dispense with the completed handler and centralise the code in the Page constructor. Instead of having a separate handler we create an IObservable from the GetCustomersCompleted event using the Observable.FromEvent method. The appended Take(1) is for returning a single value from the start of the observable sequence. It also implicitly unsubscribes the observer.

public Page()
{
    InitializeComponent();

    CustomersServiceClient client = 
        new CustomersServiceClient();
    IObservable<IEvent<GetCustomersCompletedEventArgs>> observable =
        Observable.FromEvent<GetCustomersCompletedEventArgs>(
            client, "GetCustomersCompleted"
        ).Take(1);

    observable.Subscribe(
        e =>
        {
            if (e.EventArgs.Error == null)
            {
                var customers = e.EventArgs.Result;
                dg.ItemsSource = customers;
            }
        });
    client.GetCustomersAsync();
}

This produces the same results:

Concurrent File Processing Using Reactive Extensions

This follows on from my first post on Reactive Extensions for .NET back in January. We now consider a simple scenario of performing a number of operations on a file concurrently. This example was inspired by a post from functional programming guru, Matthew Podwysocki. Here I just flesh out his suggestion at the end of that post.

Here is a simple text file – test.txt – consisting of two lines (there is a new line after the first period)

We're proud to announce the availability of Reactive Extensions for JavaScript.
This port brings the power of Reactive programming to JavaScript.

I use a short file so we can easily check the results.

Now let’s perform three concurrent asynchronous operations on it.

Count the words
Count the letters
Count the vowels

This seems quite simple but was in fact a bit trickier than meets the eye. The first thing to do is load the contents of the file into an enumerable collection. We do this so that we can subsequently convert the IEnumerable into an IObservable to which we’ll subscribe three times, once for each operation. There are various ways of reading the file but it turned out to be easiest to read it into a single item collection.

Then create an observable

List<string> text = new List<string>();
text.Add(File.ReadAllText(@"test.txt"));

var observable = text.ToObservable();

Count the words

private static void CountWords(string text)
{
    string[] separator = 
        new string[] { " ", ",", ".", "\r\n"};

    string[] words =
        text.Split(
            separator,
            StringSplitOptions.RemoveEmptyEntries
        );
    int count = words.Count();
    Console.WriteLine("Number of words = {0}", count);
}

Count the letters

private static void CountLetters(string text)
{
    string[] separator = 
        new string[] { " ", ",", ".", "\r\n" };

    string[] words =
        text.Split(
            separator,
            StringSplitOptions.RemoveEmptyEntries
        );
    int count = words.Sum(word => word.Length);
    Console.WriteLine("Number of letters = {0}", count);
}

Count the vowels

private static void CountVowels(string text)
{
    char[] vowels = {'a', 'e', 'i', 'o', 'u'}; 
    char[] chars = text.ToCharArray();
    int count = 
        chars.Count(
            c => vowels.Contains(Char.ToLowerInvariant(c))
        );
    Console.WriteLine("Number of vowels = {0}", count);
}

Putting it all together…

// Read file and store as single item string collection
List<string> text = new List<string>();
text.Add(File.ReadAllText(@"test.txt"));

// Create an observable from our collection
var observable = text.ToObservable();

// Print some file stats concurrently
using (observable.Subscribe(CountWords))
using (observable.Subscribe(CountLetters))
using (observable.Subscribe(CountVowels))

Results:

Number of words = 21
Number of letters = 123
Number of vowels = 47