Category Archives: DotNet

Integrate Syncrosoft Metro Studio 2 in your development process.

syncmslogoAs a lot of WPF developer, I love the free Metro Studio from Synchrosoft. Most of my WPF projects, professional or not, use it at a moment or another. But, out of the box, the integration between Metro Studio and Visual Studio is not great. At the end of this article you’ll be able to regenerate dynamically your resource file on project build. Also you’ll be able to do this on a build server as we don’t need Metro Studio to be installed on the build machine.

The issue

Here’s a usual situation:

  • Create a project in Metro Studio and pick icons.
  • Configure some basic styling stuff.
  • Copy paste the generated output to visual studio.
  • Adapt the templates to match the needs or just extract the path information
  • Tweak the styling.

There’s a lot of issues when working that way. If you want theming, if you want to add/remove/update icons, if you want to change the overall style… The generate xaml feature of MetroStudio is nice but it’s a real trap too. In fact, the only useful information from MetroStudio are the path. So our goal will be to retrieve them and manage on our side all the styling.

The solution

First, let’s take a look at how Metro Studio manage his project. Here’s a “.metrop” xml structure :

<IconProject Version="2.0" Name="LogiX">
    <Icon Name="AddApplicationPath" GroupName="Office" Data="M21.686602,15.029519C20.406802,... And so on" HasCharacterMap="false" IsDirty="false" ExportCommand="MetroGraphicsPackage.IconCommand">
        <Settings FontFamily="Webdings" Character=">" 
				  IconBrush="#FFFFFFFF" 
				  FlipCommand="MetroGraphicsPackage.IconCommand" 
				  FlipX="1" 
				  FlipY="1" 
				  ContentHeight="26" 
				  ContentWidth="26" 
				  SizeIndex="4" 
				  MainWidth="48" 
				  Angle="0" 
				  MainHeight="48" 
				  Padding="11" 
				  CustomSize="48" 
				  MaximumPadding="15" 
				  BackgroundBrush="#FF67C2EA" 
				  IconShape="Square" 
				  IsBackgroundVisible="true" />
    </Icon>
</IconProject> 

A metrop file is a basic xml file containing all the informations about an icon. Including the path data.

To retrieve them and produce a resource dictionary with them decided to use XSLT.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
	<xsl:template match="/">
		<ResourceDictionary xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:system="clr-namespace:System;assembly=mscorlib">
			<xsl:for-each select="/IconProject/Icon">
				<system:String>
					<xsl:attribute name="x:Key">
						<xsl:value-of select="@Name"/>
					</xsl:attribute>
					<xsl:value-of select="@Data" />
				</system:String>
			</xsl:for-each>
		</ResourceDictionary>
	</xsl:template>
</xsl:stylesheet>

The path is stored in a string object, much more flexible than in a path. If you want to generate entire controls or use other attribute from the metrop file, you just need to adapt the xslt file. But you should be careful IMHO. If you decide to generate more complex code with the xslt you’ll need someone to maintain it too in the future. As it’s a WPF application the other persons working on the project will be able to maintain the styles using the path in string but not always able to maintain complex XLST sheets. Also it can be hard to debug and the debug can quickly require additional tools like Oxygen XML Developer or similar.

To apply the stylesheet to the metrop file I use msxsl.exe, a lightweight (25kb) tool from Microsoft allowing you to apply xslt to xml file. And to call the tool I simply add the following prebuild event line to execute it :

"$(ProjectDir)/Resources/Tools/msxsl.exe"
"$(ProjectDir)/Resources/Icons/LogiX.metrop"
"$(ProjectDir)/Resources/Icons/Generate.xslt"
-o "$(ProjectDir)/Resources/Icons/IconsPath.xaml"

In a generic way :

"<Path to msxsl.exe>" "<Metrop file path>" "<Xslt file path>" -o "<Output xaml file>"

Now, everytime you do modification in metro studio you’ll just need to rebuild the project.

Also don’t forget to include your output xaml file in your solution. For my part I usually also include the metrop file and add the generated xaml as a dependent generated item. To do this simply edit your csproj file like the following. For the parent metrop file, just include it as none action (in my case located in “Resources\Icons\LogiX.metrop”):

<None Include="Resources\Icons\LogiX.metrop" />

And here’s the entry for the generated xaml page, changes are in Bold. Basically we say to visual studio that this is an autogenerated item and the parent is LogiX.metrop :

<Page Include="Resources\Icons\IconsPath.xaml">
      <SubType>Designer</SubType>
      <AutoGen>True</AutoGen>
      <DependentUpon>LogiX.metrop</DependentUpon>
      <Generator>MSBuild:Compile</Generator>
</Page>

And here’s the result in the solution explorer:

screendepautogen

Also, to complete this easy integration, you can add metrop studio as a custom editor for the metrop files, allowing you to directly open the project from visual studio. Right click on the metrop file, “Open With…”, “Add”, browse MetroStudio executable, press “Ok”, now select the newly created item in the list and press “Set as default” and then Ok, now when you double click metrop files they’re opened with MetroStudio.

Example

Here’s a template for a button. Basically the button is filled with the icon. To use it, you’ll need to set the style and the path as the content of the button.


<Style TargetType="{x:Type Button}" x:Key="MetroIconButtonStyle">
	<Setter Property="Template">
		<Setter.Value>
			<ControlTemplate TargetType="Button">
				<Viewbox>
					<Grid Width="48" Height="48">
						<Grid>
							<Rectangle Fill="{DynamicResource IconBackground}" x:Name="bg"/>
						</Grid>
						<ContentPresenter x:Name="contentPresenter" Width="30" Height="30" />
					</Grid>
				</Viewbox>
				<ControlTemplate.Triggers>
					<Trigger Property="IsMouseOver" Value="True">
						<Setter TargetName="contentPresenter" Property="Opacity" Value="1" />
						<Setter TargetName="bg" Property="Opacity" Value=".7" />
						<Setter TargetName="bg" Property="Fill" Value="{DynamicResource IconBackgroundOver}" />
					</Trigger>
					<Trigger Property="IsMouseOver" Value="False">
						<Setter TargetName="contentPresenter" Property="Opacity" Value=".7" />
						<Setter TargetName="bg" Property="Opacity" Value="1" />
						<Setter TargetName="bg" Property="Fill" Value="{DynamicResource IconBackground}" />
					</Trigger>
					<Trigger Property="IsPressed" Value="True">
						<Setter TargetName="contentPresenter" Property="Opacity" Value="1" />
						<Setter TargetName="bg" Property="Opacity" Value="1" />
						<Setter TargetName="bg" Property="Fill" Value="{DynamicResource IconBackgroundPressed}" />
					</Trigger>
				</ControlTemplate.Triggers>
			</ControlTemplate>
		</Setter.Value>
	</Setter>
	<Setter Property="ContentTemplate">
		<Setter.Value>
			<DataTemplate>                
				<Path Data="{Binding}" Stretch="Uniform" Fill="{DynamicResource IconForeground}" RenderTransformOrigin="0.5,0.5" />
			</DataTemplate>
		</Setter.Value>
	</Setter>
</Style>

Usage

<Button Content="{DynamicResource AddApplicationIconPath}" Style="{StaticResource MetroIconButtonStyle}" />

Once it’s done…

Possible improvement would be to create a real MSBuild custom Task and integrate it to be able to use mscsl.exe as a custom generate build instead of a pre build event, but I’m not sure about the real benefit of the solution compared to the actual one, does it worth the price? It’s up to you J

That’s all! If you have questions to not hesitate to contact me.

Read serial input in C#

NET%20LogoHere’s a simple code snippet to read the serial inputs in C#. I had to use this code several times when working with arduino like devices.

Hope you’ll enjoy it.
Catch you next time and keep it bug free !

    using System;
    using System.IO.Ports;

    class Program
    {
        [STAThread]
        static void Main(string[] args)
        {
            new Program();
        }

        // Create the serial port with basic settings
        // port name, bauds, parity, packet size and stop bits.
        private SerialPort port = new SerialPort("COM3", 115200, Parity.None, 8, StopBits.Two);

        private Program()
        {
            // Called when data received through serial
            port.DataReceived += new SerialDataReceivedEventHandler(port_DataReceived);

            // Can crash if the port is already used, haven't found a way to detect it...
            try
            {
                // Start to listen
                port.Open();
            }
            catch(Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            // Just avoid console to return
            Console.ReadKey();
        }

        private void port_DataReceived(object sender, SerialDataReceivedEventArgs e)
        {
            // Write the incoming data
            Console.Write(port.ReadExisting());
        }
    }

Truly random algorithm in C# – The chaotic user

NET%20LogoIf you’ve already use System.Random in a fine tuned app you’ve probably notice something strange. You get the same random values across sessions. Why ?
If you take a look at the code behind System.Random you’ll notice that it’s mainly based on CPU thick. If the same number of thick elapse each time you do the new/next you potentially get the same random number.
How can we put a kind of truly random, non predictable and chaotic randomness in the algo ? User mouse !
Indeed that’s perhaps the only chaotic stuff in a computer.
Some cryptographic algos use this and it’s not so hard to implement. Indeed it will work only on client app with access to the mouse object and require the user to move the mouse (or the algo will hang).

Here’s the pseudo-code algo I’ll implement :

define RealRandom := Limit += number
	sampleRate = 10 < Random < 20
	numberOfSample = 1 < Random < 10

	for(i < numberOfSample)
		// HERE COMES THE CHAOS !
		samples[i] = MousePositionY * MousePositionX;
		Wait MousePosition != datas[numberOfSample]
		Add 1 to i
		Wait sampleRate
	end

	foreach sample in samples
		returnValue += sample % limit
	end

	return returnValue % limit
end

And here’s the C# implementation.

public static class TrueRandom
{
    public static int NextRandom(int limit)
    {
        int SampleRate = (new Random()).Next(10, 20);
        int NumberOfSample = (new Random()).Next(1, 10);
        long result = 0;

        for(int i = 0; i < NumberOfSample; i++)
        {
            long sample = 0;
            do
            {
                sample = Cursor.Position.X * Cursor.Position.Y;
                Thread.Sleep(10);
            }
            while (sample == Cursor.Position.X * Cursor.Position.Y);
            result += sample % limit;
            Thread.Sleep(SampleRate);
        }

        return (int)result % limit; // Can't cause issue as limit is an integer so the modulo result will always be lower than Int32.MaxValue
    }
}

Catch you next time and keep it bug free !

The inline if statement in C# : Surprise !

NET%20LogoHi, I got a strange issue today I posted on stackoverflow.

The Issue

Here’s the situation, I’ve abstracted the name and logic to focus on the issue. Got 3 types, A, B and C. B & C have implicit operators defined to convert to A object.

public class A
{
    public static implicit operator A(B input){ /* Convert B to A */ }
    public static implicit operator A(C input) { /* Convert C to A*/ }
}

public class B { }
public class C { }

Then, when I do this, the code compile and work fine :

A myObject = null;
if (condition)
    myObject = new B();
else
    myObject = new C();

But when I write the same logic with an inline if, I got an error :

A myObject = condition ? new B() : new C();
Type of conditional expression cannot be determined because there is no implicit conversion between 'B' and 'C'

WHY ? Why does it work in classical if statement and not in inline if statement ?

The Answer

1775769

Here’s his answer :

Absolutely. The type of a conditional operator expression must either be the type of the second operand or the type of the third operand. (And if those two types aren’t the same, exactly one of the types has to be implicitly convertible to the other.) The compiler doesn’t try to find a “lower common denominator” type, and the use of the result isn’t important either (the compiler doesn’t “notice” that you’re assigning the result to a variable of type A).

You can fix this yourself by just explicitly casting either operand:

A myObject = condition ? (A) new B() : new C();

or

A myObject = condition ? new B() : (A) new C();

Note that this isn’t limited to user-defined conversion operators; the same is true for simple reference conversions based on derived classes:

Button x = new Button();
String y = "foo";
object z = condition ? x : y; // Compile-time error

See section 7.14 of the C# specification for more details.

Behind the scene

But the answer wasn’t totally complete for me. So I took my lovely decompiler and anaylyze what’s generated in the case of an inline and classical if. The code is “self explanatory” and I’ve commented each MSIL instruction to make it readable.

static void ClassicalIf(bool condition)
{
    int i = 0;
    if (condition)
        i = 1;
    else
        i = 2;
}

static void InlineIf(bool condition)
{
    int i = condition ? 1 : 2;
}

For the inline if :

.method private hidebysig static void InlineIf(bool condition) cil managed
{
    .maxstack 1
    .locals init (
        [0] int32 i)
    L_0000: nop 
    L_0001: ldarg.0         -- Load argument '0' onto the stack
    L_0002: brtrue.s L_0007 -- Branch to L_0007 if value is non-zero
    L_0004: ldc.i4.2        -- Push 2 onto the stack
    L_0005: br.s L_0008     -- Branch to L_0008
    L_0007: ldc.i4.1        -- Push 1 onto the stack
    L_0008: nop 
    L_0009: stloc.0         -- Pop from stack into local variable 0
    L_000a: ret 
}

And here’s the one for the “normal” if :

.method private hidebysig static void ClassicalIf(bool condition) cil managed
{
    .maxstack 2
    .locals init (
        [0] int32 i,
        [1] bool CS$4$0000) -- Additional bool for if
    L_0000: nop 
    L_0001: ldc.i4.0        -- Push 0 onto the stack
    L_0002: stloc.0         -- Pop from stack into local variable '0'
    L_0003: ldarg.0         -- Load argument '0' onto the stack
    L_0004: ldc.i4.0        -- Push 0 onto the stack 
    L_0005: ceq             -- Push 1 if value1 equals value2 (on stack), else push 0.
    L_0007: stloc.1         -- Pop from stack into local variable '1'
    L_0008: ldloc.1         -- Load local variable '1' onto stack.
    L_0009: brtrue.s L_000f -- Branch to L_000f if value is non-zero
    L_000b: ldc.i4.1        -- Push 1 onto the stack 
    L_000c: stloc.0         -- Pop from stack into local variable '0'
    L_000d: br.s L_0011     -- Branch to L_0011
    L_000f: ldc.i4.2        -- Push 2 onto the stack 
    L_0010: stloc.0         -- Pop from stack into local variable '0'
    L_0011: ret 
}

Now same thing with an additional cast :

static void ClassicalIf(bool condition)
{
    object i = 0;
    if (condition)
        i = new object();
    else
        i = 2;
}
    
static void InlineIf(bool condition)
{
    object i = condition ? new object() : 2;
}

For the inline if :

.method private hidebysig static void InlineIf(bool condition) cil managed
{
    .maxstack 1
    .locals init (
        [0] object i)
    L_0000: nop 				
    L_0001: ldarg.0 			-- Load argument '0' onto the stack
    L_0002: brtrue.s L_000c     -- Branch to L_000c if value is non-zero (true)
    L_0004: ldc.i4.2            -- Push 2 onto the stack
    L_0005: box int32           -- Boxing
    L_000a: br.s L_0011         -- Branch to L_0008
    L_000c: newobj instance void-- Create new object  
    L_0011: nop                                                     
    L_0012: stloc.0 			-- Pop from stack into local variable 0
    L_0013: ret 
}

And here’s the one for the “normal” if :

.method private hidebysig static void ClassicalIf(bool condition) cil managed
{
    .maxstack 2
    .locals init (
        [0] object i,
        [1] bool CS$4$0000)
    L_0000: nop 					-- Push 0 onto the stack
    L_0001: ldc.i4.0                -- Pop from stack into local variable '0'
    L_0002: box int32               -- Boxing
    L_0007: stloc.0                 -- Pop from stack into local variable 0
    L_0008: ldarg.0                 -- Load argument '0' onto the stack
    L_0009: ldc.i4.0                -- Push 0 onto the stack 
    L_000a: ceq                     -- Push 1 if value1 equals value2 (on stack), else push 0.
    L_000c: stloc.1                 -- Pop from stack into local variable '1'
    L_000d: ldloc.1                 -- Load local variable '1' onto stack.
    L_000e: brtrue.s L_0018         -- Branch to L_0018 if value is non-zero (true)
    L_0010: newobj instance void [ms-- Create new object
    L_0015: stloc.0                 -- Pop from stack into local variable '0'
    L_0016: br.s L_001f             -- Branch to L_001f
    L_0018: ldc.i4.2                -- Push 2 onto the stack 
    L_0019: box int32               -- Boxing
    L_001e: stloc.0 				-- Pop from stack into local variable '0'
    L_001f: ret 
}

Conclusion

1) So I will also try to use inline if as possible. Less instructions (so CPU) and less memory usage.
2) Also it’s now crystal clear when we see how an inline if is converted to MSIL 🙂

Catch you next time and keep it bug free !

Automatic Recovery & Restart in .Net application

NET%20LogoThe goal here is to improve the reliability of a .net client application by managing cases where things goes wrong and the universe of your product fall on himself, for example due to memory corruption, unhandled exception, Stack/Memory/<place your stuff-Overflow and so on.

The Issue

In the current situation, when you want to know if things goes wrong there’s natively nothing really in place. You can register for unhandled exception, but if for some reason your application hangs more than 60 seconds and come in the famous “Not Responding” state… well there’s not a lot of things to do. When it occurs, we could want to be able to backup some data’s, cleanly closed some connections, alert the backend, etc

The Answer

The answer is provided by the Windows API Codepack. This codepack available through NuGet package work on windows vista and higher (tested on Vista, 7 and 8, haven’t test on 8.1, ut it should too) and expose some P/Invoke calls to kernel32.dll functions (so be carefull too, when used, it’s out of the “nice pink unicorn .net land”). Once you’ve added the NuGet package reference you gain access to two “events”, Recovery and Restart. Here’s a method to register to theses.

protected override void OnStartup(StartupEventArgs e)
{
    RegisterARR();
    base.OnStartup(e);
}

private void RegisterARR()
{
    if (CoreHelpers.RunningOnVista)
    {
        // register for Application Restart
        ApplicationRestartRecoveryManager.RegisterForApplicationRestart(new RestartSettings(string.Empty, RestartRestrictions.None));

        // register for Application Recovery
        ApplicationRestartRecoveryManager.RegisterForApplicationRecovery(new RecoverySettings(new RecoveryData(PerformRecovery, null), 5000));
    }
}

For my part I put the code in the App.cs and call the register method within the OnStartup. If it’s “running on vista” which means “running on higher version than vista”, we first register to the restart event. When registering for restart we provide an instance of RestartSettings. The first parameter it gets is the command line arguments that will be used for the restart, in case we want to define some special parameters. The second parameters is an enum that allows us to restrict the restart in some cases with 5 defined states.

  • None: No restart restrictions
    NotOnCrash: Do not restart the process if it terminates due to an unhandled exception.
  • NotOnHang: Do not restart the process if it terminates due to the application not responding.
  • NotOnPatch: Do not restart the process if it terminates due to the installation of an update.
  • NotOnReboot: Do not restart the process if the computer is restarted as the result of an update.

The second thing we do is register for recovery. This means that if the application will need a restart (from the same reasons as before), what function do we want to run to allow later recovery.

When registering for recovery we supply an instance of RecoverySettings. The first parameter it gets is the RecoveryData object, which wraps a delegate to be called and some state parameter that will be passed (in this example, null). The second parameter is the keep alive interval, which will be explained shortly. The recovery function should obey some rules in order to avoid the application getting stuck (again) in the recovery function. You must call ApplicationRecoveryInProgress every few miliseconds (in the example, KeepAliveInterval = 5000). This tells the ARR mechanism, “I know it takes some time, but don’t worry, I’m still alive and working on the recovery stuff”.

private int PerformRecovery(object parameter)
{
    try
    {
        ApplicationRestartRecoveryManager.ApplicationRecoveryInProgress();
        // Perform recovery here
        ApplicationRestartRecoveryManager.ApplicationRecoveryFinished(true);
    }
    catch
    {
        ApplicationRestartRecoveryManager.ApplicationRecoveryFinished(false);
    }
    return 0;
}

And finally we just need to define an unregister method, called after the Recovery & Restart to end the process and called from the OnExit event handler of the application. (Yeah, sometimes things goes well… sometimes…)

private void UnregisterApplicationRecoveryAndRestart()
{
    if (CoreHelpers.RunningOnVista)
    {
        ApplicationRestartRecoveryManager.UnregisterApplicationRestart();
        ApplicationRestartRecoveryManager.UnregisterApplicationRecovery();
    }
}

Conclusion

That’s it, now you can face the death and tackle the hell of that kind of situation. But I want to precise some things to conclude. This mechanism doesn’t prevent your application to fall in an inconsistent state. So it means, if this mechanism is called, something really bad happened and you can’t guarantee the consistency of the memory, you should always rely on other mechanism to manage the persistence scenario when things goes wrong, work with little transaction, cache datas, provide validity flags on the database records etc. Most of the time I use it to :

1. Send a memory dump to an ftp server for further debugging
2. Report the issue in a tracking system
3. Dump the memory, check the file on reboot and try to recover the lost documents/datas and always ask the user to check the validity if he want to re-inject them in the system or avoid him to re-inject them based on automatic validation.
3. Clean connections, context, etc… We all knows server with badly managed sessions, not responding due to a massive client crash with persistent connection for whatever reason or a faulty batch/transaction process hanging due to the same reasons…

Catch you next time and keep it bug free !

Noninvasive global software mocking with registry

RegEditI recently had to do something unusual for me.

The issue

In production-like environment, I had to “mock” a third party application called by mine. My application call the third party app just like as we do when we run something from command prompt with some arguments. The problem was a little bug in theses arguments, but I can’t update directly my application and had to find a workaround until I can. Also I would like at the same time to debug a bit what’s done by logging the passed arguments. So here’s the initial situation :

exereplace1

The solution

Here’s what I want to do to solve the problem :

exereplace2

Adding an intermediate application that my application will call and then forward the call to the third party app. A kind of software man-in-the-middle. But as I can’t modify my app I just can’t ask her to point to another place and the call target is not soft coded. I finally, after a day of search, found a solution, this good ol’ registry !

If you go to :

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options

You can create a registry key with the name of your app, in my case it’s ThirdPartyApplication.exe and then create a string value called Debugger with the name of the application to run instead. Each time we’ll call ThirdPartyApplication.exe, the mentioned application will be executed with the normal target as arguments. I can then call my third party app and patch the transmitted values until I can patch my application and remove the registry key.

Also it can be a good joke to do to some colleague you (don’t) like… Let’s say, replacing explorer.exe by mspaint… or making iexplore to call iexplore, resulting in a beautiful call loop… I’m sure you have enough imagination to do something a bit fun for 2 minutes the monday morning with that 😉 But never forget, the shorter jokes are the better…

Catch you next time and keep it bug free !

Extract data from HTML page with XPath and Linq

As developer, we usually need to extract data from a html web page in our projects, but most of the time we try to find another solution due to the complexity of the task. Indeed, the classical approach would be to use regex or similar and try to find out our informations in the page structure. But there’s something really helpfull, the HTMLAgilityPack. I’ll give you 2 example, the first one for a basic overview of the HTMLAgilityPack and a second one with much more details about the process I follow for that kind of tasks.

So, HTMLAgilityPack allow us to download a webpage and then consult the content with his methods or with the provided XPath API. To install it, just download the NuGet package from VS2010 >= or grab the project from the codeplex page.

Once done, you can use it to parse and extract data from html page.

using HtmlAgilityPack;

Then you can use it to download the page :

string url = "http://www.simpleweb.org/";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);

Or you can use LoadHtml(string htmlCode) on the HtmlDocument object to work on raw html code contained in memory.

Once done, you can write XPath query to retreive data. For this example, I want to retreive the url of the wiki (second link in the left menu).
According to the google chrome debugger, the quicker and optimzed XPath to this property is : (thank you chrome :p)

//*[@id="top_option"]/div[2]/span/a

For those not familiar with XPath, it’s really simple. This query just mean, from the root (//) all tags (*) with id attribute set to “top option” ([@id=”top_option”]) then select the second div (div[2]) then select span and within the span select link (/span/a). That give us the following Linq query :

var result = from link in htmlDoc.DocumentNode.SelectNodes("//*[@id='top_option']/div[2]/span/a")
                         select link.Attributes["href"].Value;

The intellisense will be once again your best friends as this assembly is really well designed and intuitive. So here I select all the nodes who match the XPath query and then select the href attribute value.

Let’s take another example, I want to get an object tree which represent the whole menu. Just before that, I would like to mention something about XPath. This is a great query language… if you know it ! I mean, you can write really powerful but complex query using it, and I always think about the next one who’ll need to understand and/or update the query. So I usually prefer if I have to share it or in a collaborative context to write multiple steps/subqueries to make it more “developer friendly”. Even some simple query can seems really complex for someone not familiar with it. That said, let’s write the query to rebuild the full menu tree of the same page. This time I’ll also share the steps I follow for that kind of task.

First, analyze the page structure. We can see that the menu is contained within a td (/html/body/table/tbody/tr[3]/td/table/tbody/tr/td[1]). This td contain 7 div, but only the 3 first one interest us. To know that, I use the Chrome debugger, when you place your move hover a tag, Chrome will highlight in the page the corresponding area.

<div style="float:left;" id="top_option" class="aTopLink">
	<div>
		<span><a class="aTopLink" href="/">Home</a></span>
	</div>
	<div>
		<span><a class="aTopLink" href="/wiki/">Wiki</a></span>
	</div>
</div>
<div style="float:left;" id="my_menu" class="sdmenu">
    <div class="collapsed">
        <span>MIBs</span>
		<a href="/ietf/mibs/">Browser</a>
                <a href="/ietf/mibs/validate/">MIB Validation</a>
                <a href="/ietf/mibs/byEncoding?encoding=txt">Plain text</a>
                <a href="/ietf/mibs/byEncoding?encoding=highlighting">Syntax Highlighting</a>
                <a href="/ietf/mibs/byEncoding?encoding=html">HTML encoding</a>
		<a href="/ietf/enterprise/">Vendor MIBs</a>
		<a href="/ietf/mibs/search/">Search</a>
            </div>
	    <div class="collapsed">
		<span>RFCs</span>
		<a href="/ietf/rfcs/rfcbynumber.php" class="current">By Number</a>
		<a href="/ietf/rfcs/rfcbytopic.php">By Topic</a>
		<a href="/ietf/rfcs/rfcbystatus.php">By Status</a>
		<a href="/ietf/rfcs/rfcbymodule.php">By Module</a>
                <a href="/ietf/rfcs/complete/">Complete</a>
		<a href="/ietf/rfcs/">Search</a>
	    </div>
            <div class="collapsed">
                <span>Software</span>
                <a href="/software/select_obj.php?orderBy=package&amp;oldorderBy=package&amp;orderDir=asc&amp;free=free">Freely available</a>
                <a href="/software/select_obj.php?orderBy=package&amp;oldorderBy=package&amp;orderDir=asc&amp;com=com">Commercial</a>
                <a href="/software/">Search</a>
            </div>
            <div class="collapsed">
                <span>CFP/Conferences</span>
                <a href="/wiki/Events">Conferences</a>
                <a href="/wiki/Cfp">Call for Papers</a>
            </div>
            <div class="collapsed">
                <span>Bibliography</span>
                <a href="/wiki/Books">Books</a>
                <a href="/wiki/Journals">Journals</a>
                <a href="/wiki/Theses">Theses</a>
            </div>
            <div class="collapsed">
                <span>Tutorials</span>
                <a href="/wiki/Internet_Management_Tutorials">Slides</a>
                <a href="/wiki/Video_tutorials_on_Internet_management">Podcasts</a>
                <a href="/wiki/Exercises_in_Internet_Management">Exercises</a>
                <a href="/wiki/Other_tutorials">Other tutorials</a>
		<a href="/tutorials/demo/snmp/v1/">SNMPv1 demo</a>
		<a href="/tutorials/demo/snmp/v2c/">SNMPv2c demo</a>
		<a href="/tutorials/demo/snmp/v3/">SNMPv3 demo</a>
    </div>
</div>
<div style="float:left;" id="bottom_option" class="aTopLink">
    <div>
        <span><a class="aTopLink" href="/wiki/Podcasts">Podcasts</a></span>
    </div>
    <div>
        <span><a class="aTopLink" href="http://www.simpleweb.org/wiki/Traces">Traffic Traces</a></span>
    </div>
    <div>
        <span><a class="aTopLink" href="/ifip/">IFIP WG6.6</a></span>
    </div>
    <div>
        <span><a class="aTopLink" href="mailto:simpleweb@simpleweb.org">Contact</a></span>
    </div>
</div>

To represent it we will have the following object model in C# :

public class Node
{
    public List<Node> Childs { get; set; }
    public string Name { get; set; }
    public string Url { get; set; }
}

And the query will be the following :

var result = (from topNode in htmlDoc.DocumentNode.SelectNodes("//div[@id='top_option']/div/span/a")
                select new Node()
                            {
                                Name = topNode.InnerText,
                                Url = topNode.Attributes["href"].Value
                            })
                            .Union(
                from bottomNode in htmlDoc.DocumentNode.SelectNodes("//div[@id='bottom_option']/div/span/a")
                select new Node()
                        {
                            Name = bottomNode.InnerText,
                            Url = bottomNode.Attributes["href"].Value
                        })
                        .Union(
            from groupNode in htmlDoc.DocumentNode.SelectNodes("//*[@id='my_menu']/div[*]")
            let groupChilds = (from nodes in groupNode.SelectNodes("a")
                                select new Node()
                                            {
                                                Name = nodes.InnerText,
                                                Url = nodes.Attributes["href"].Value
                                            })
            select new Node()
                        {
                            Name = groupNode.SelectSingleNode("span").InnerText,
                            Childs = groupChilds.ToList()
                        });
                                    
DisplayMenu(result.ToList());

We use 3 query with union. As said, I tried to decompose a bit the query, the two first are the same, select the links and create nodes with the innerText of the a tag and the href attribute. Then we have the centre of the menu with hierarchy. For this, we select each div representing a group, then we create a first list of node like for the two first request based on the link contained within the div, and we create the node object with the list we’ve just created as child’s object list and the span content as Name.

As you can see, we can easily write complex requests in no time (this one took me about 30minutes to write, test and include in this article). HTMLAgilityPack is a great tool that allow us to use XPath and finally address a great answer to the critical html data parsing issue !

See you next time and keep it bug free ! 🙂