Tuesday, 5 July 2016

Backing Up Files To Cloud Storage

I have an application that backs up files to cloud storage such as OneDrive. Manually it is easy to do this on a PC using Windows Explorer. Just copy and paste files of interest to the local OneDrive folder. How could I automate this? If I just wanted to back up files in a fairly inefficient manner I could write a .NET console application that does simple file copy operations on folders of interest.

But, unlike for my local backups, I didn’t necessarily want all files to be readable. I found a free encryption application that was also programmable from C#. However, this is restricted to encrypting folders not files. It is easy to get around this. Programmatically zip up the folder and encrypt the zip file instead.

Having done that, then programmatically copy the encrypted zip to the OneDrive folder. I can then use Windows Task Scheduler to run the application at regular intervals.

Local Backup

 

I currently have three backups scheduled. One of them is a differential backup using SyncToy. So it detects the changes since the last backup and just does those. So far my cloud backup  backs up everything every time. Not very efficient. But also, as I’m backing up over the internet,  it’s unnecessarily eating into my data allowance.

Comparing Zip Files

 

I found a tool, ZipDiff, that compares zip files looking for differences. For each zipped folder I can run this and then only backup when something has changed. I might still have a big backup as each zip file can itself be quite big but it’s better than unnecessarily backing up several zipped files when nothing has changed.

Parallel Operation

 

Roughly speaking, for each folder, I need to
  1. Zip
  2. Encrypt (optionally)
  3. Backup
This is easily parallelisable (embarrassingly parallel, as they say). So I can use a parallel for loop. Handling errors requires some care though. One scenario is that certain types of file cause the zip operation to fail if the file is in use. Microsoft Word document is one such type. However, I wanted the algorithm to continue processing other folders in such cases  instead of terminating. This requires a loop that looks like below.
try
{
    BackupEncryptedToOneDrive(sourceFolderPathsForEncryption);
}
catch (AggregateException ae)
{
    LogAggregateErrors(ae);
}
private static void BackupEncryptedToOneDrive(IEnumerable<string> sourceFolderPathsForEncryption)
{
    Console.WriteLine(LogMessageParts.FoldersForEncryption);
    Logger.Info(LogMessageParts.FoldersForEncryption);
    Console.WriteLine(Environment.NewLine);

    var exceptions = new ConcurrentQueue<Exception>();

    Parallel.ForEach(sourceFolderPathsForEncryption, path =>
    {
        try
        {
            Console.WriteLine(LogMessageParts.Processing, path);
            Logger.Info(LogMessageParts.Processing, path);

            if (TryCreateZip(path))
            {
                Encrypt(path);
                BackupToOneDrive(path);
            }
            else
            {
                string noChangesDetected = string.Format("No changes detected in {0}...", path);
                Console.WriteLine(noChangesDetected);
                Logger.Info(noChangesDetected);
            }
        }
        catch (Exception ex)
        {
            exceptions.Enqueue(ex);
        }
    });

    Console.WriteLine(Environment.NewLine);

    if (exceptions.Any())
        throw new AggregateException(exceptions);
}

private static void LogAggregateErrors(AggregateException ae)
{
    ae = ae.Flatten(); // flatten tree to process exceptions at the leaves
    foreach (var ex in ae.InnerExceptions) LogError(ex);
}

The idea here is that we queue up the exceptions from each parallel iteration, wrap them up in an AggregateException and then unwrap and log them at the top level. So a failure in one parallel iteration still allows the others to run to completion.