Automatic Recovery & Restart in .Net application

NET%20LogoThe goal here is to improve the reliability of a .net client application by managing cases where things goes wrong and the universe of your product fall on himself, for example due to memory corruption, unhandled exception, Stack/Memory/<place your stuff-Overflow and so on.

The Issue

In the current situation, when you want to know if things goes wrong there’s natively nothing really in place. You can register for unhandled exception, but if for some reason your application hangs more than 60 seconds and come in the famous “Not Responding” state… well there’s not a lot of things to do. When it occurs, we could want to be able to backup some data’s, cleanly closed some connections, alert the backend, etc

The Answer

The answer is provided by the Windows API Codepack. This codepack available through NuGet package work on windows vista and higher (tested on Vista, 7 and 8, haven’t test on 8.1, ut it should too) and expose some P/Invoke calls to kernel32.dll functions (so be carefull too, when used, it’s out of the “nice pink unicorn .net land”). Once you’ve added the NuGet package reference you gain access to two “events”, Recovery and Restart. Here’s a method to register to theses.

protected override void OnStartup(StartupEventArgs e)
{
    RegisterARR();
    base.OnStartup(e);
}

private void RegisterARR()
{
    if (CoreHelpers.RunningOnVista)
    {
        // register for Application Restart
        ApplicationRestartRecoveryManager.RegisterForApplicationRestart(new RestartSettings(string.Empty, RestartRestrictions.None));

        // register for Application Recovery
        ApplicationRestartRecoveryManager.RegisterForApplicationRecovery(new RecoverySettings(new RecoveryData(PerformRecovery, null), 5000));
    }
}

For my part I put the code in the App.cs and call the register method within the OnStartup. If it’s “running on vista” which means “running on higher version than vista”, we first register to the restart event. When registering for restart we provide an instance of RestartSettings. The first parameter it gets is the command line arguments that will be used for the restart, in case we want to define some special parameters. The second parameters is an enum that allows us to restrict the restart in some cases with 5 defined states.

  • None: No restart restrictions
    NotOnCrash: Do not restart the process if it terminates due to an unhandled exception.
  • NotOnHang: Do not restart the process if it terminates due to the application not responding.
  • NotOnPatch: Do not restart the process if it terminates due to the installation of an update.
  • NotOnReboot: Do not restart the process if the computer is restarted as the result of an update.

The second thing we do is register for recovery. This means that if the application will need a restart (from the same reasons as before), what function do we want to run to allow later recovery.

When registering for recovery we supply an instance of RecoverySettings. The first parameter it gets is the RecoveryData object, which wraps a delegate to be called and some state parameter that will be passed (in this example, null). The second parameter is the keep alive interval, which will be explained shortly. The recovery function should obey some rules in order to avoid the application getting stuck (again) in the recovery function. You must call ApplicationRecoveryInProgress every few miliseconds (in the example, KeepAliveInterval = 5000). This tells the ARR mechanism, “I know it takes some time, but don’t worry, I’m still alive and working on the recovery stuff”.

private int PerformRecovery(object parameter)
{
    try
    {
        ApplicationRestartRecoveryManager.ApplicationRecoveryInProgress();
        // Perform recovery here
        ApplicationRestartRecoveryManager.ApplicationRecoveryFinished(true);
    }
    catch
    {
        ApplicationRestartRecoveryManager.ApplicationRecoveryFinished(false);
    }
    return 0;
}

And finally we just need to define an unregister method, called after the Recovery & Restart to end the process and called from the OnExit event handler of the application. (Yeah, sometimes things goes well… sometimes…)

private void UnregisterApplicationRecoveryAndRestart()
{
    if (CoreHelpers.RunningOnVista)
    {
        ApplicationRestartRecoveryManager.UnregisterApplicationRestart();
        ApplicationRestartRecoveryManager.UnregisterApplicationRecovery();
    }
}

Conclusion

That’s it, now you can face the death and tackle the hell of that kind of situation. But I want to precise some things to conclude. This mechanism doesn’t prevent your application to fall in an inconsistent state. So it means, if this mechanism is called, something really bad happened and you can’t guarantee the consistency of the memory, you should always rely on other mechanism to manage the persistence scenario when things goes wrong, work with little transaction, cache datas, provide validity flags on the database records etc. Most of the time I use it to :

1. Send a memory dump to an ftp server for further debugging
2. Report the issue in a tracking system
3. Dump the memory, check the file on reboot and try to recover the lost documents/datas and always ask the user to check the validity if he want to re-inject them in the system or avoid him to re-inject them based on automatic validation.
3. Clean connections, context, etc… We all knows server with badly managed sessions, not responding due to a massive client crash with persistent connection for whatever reason or a faulty batch/transaction process hanging due to the same reasons…

Catch you next time and keep it bug free !

Published by Emmanuel Istace

Musician, Software developer and guitar instructor. https://emmanuelistace.be/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: