Friday, May 29, 2009

XML Serialization With Subclasses

The .NET XML serialization libraries are a huge time saver for quickly implementing configuration files and small local data stores. All it takes is a few attribute markup tags to allow for quick saving and loading of an object graph to disk.

One of the shortcomings of this approach is how .NET handles the serialization of subclasses. To illustrate this, lets take a quick look at a configuration file I developed for an data processing service I was working on recently. The service allowed the user to configure jobs that ran on a scheduled basis, performed configurable data searches against a remote database, transformed the data, and than ran the data through one or more output plug-ins. These plug-ins allowed perform a variety of different actions on the retrieved data. Each output plug-in needed different configuration object, since it did different things. The service architecture allowed us to write new plug-ins as needed to perform custom integration tasks for specific clients without needed to release a new version of the app.

The architecture of the output system is pretty simple: configuration classes all implement the same IOutputPluginConfig interface and we used a simple factory class to instantiate the corresponding IOutputPlugin classes from their saved config. This means that each persistent job configuration class had a property like this:



   1:  public SerializableList<IOutputConfig> Outputs

   2:  {

   3:       get { return m_Outputs; }

   4:       set { m_Outputs = value; }

   5:  }

   6:  private SerializableList<IOutputConfig> m_Outputs;



This list might contain several different output configuration types.


Now, .NET will serialize this list of IOutputConfig objects to disk just fine, but on loading the configuration it will fail, since it doesn't know what classes to create for each saved XML output configuration element.

There is a built-in mechanism to resolve this problem; using the XmlArrayItemAttribute you can specified all the subclasses that should be deserialized.



   1:  [XmlArrayItem(typeof(EmailOutputConfig)),

   2:  XmlArrayItem(typeof(DiskOutputConfig)),

   3:  XmlArrayItem(typeof(SharepointOutputConfig)),]

   4:  public List<IOutputConfig> Outputs;



This didn't really suit my need for two reasons:
  1. Its messy and breaks encapsulation. We are leaking knowledge of the subclasses upwards, which is bad design. New implementations of outputs would have to modify the attribute in the library classes.
  2. Since we are implementing new functionality via plugin-ins (which are loaded dynamically), we don't know the full list of possible subclasses at compile time and thus can't list them in the attribute, even if we were willing to hold our noses and do so.
We could also work around this by having our subclasses implement the IXmlSerializable interface, but this would require mucking about with XML readers and writers for every configuration type we implemented, which takes time and thus negates a lot of the benefits of this approach.

The approach I used was to implement two custom classes, ISerializableList and ISerializableDictionary. These classes implement IXmlSerializable to wrap each of the subclasses in an tab. This tag records the type of the saved object so we know what type to feed an XmlSerialzer when restoring the data.



   1:  <Outputs>

   2:  <Item type="APP.Output.DiskOutputConfig, APPDAC">

   3:    <Disk>

   4:      <BasePath>c:\test\exrs\two\</BasePath>

   5:      <PathPattern>{DATESTAMP}_{TIMESTAMP}</PathPattern>

   6:      <OverwriteFile>true</OverwriteFile>

   7:    </Disk>

   8:  </Item>

   9:  <Item type="APP.Output.DiskOutputConfig, APPDAC">

  10:    <Email>

  11:      <SMTPServer>127.0.0.1</SMTPServer>

  12:      <Recipient>a@test.com</Recipient>

  13:      <Recipient>b@test.com</Recipient>

  14:      <FilePattern>{DATESTAMP}_{TIMESTAMP}</FilePattern>

  15:    </Email>

  16:  </Item>

  17:  </Outputs>



One of the points to note is that I am stripping the assembly version info from the saved type name. This was to avoid version changes in the asembly causing load errors. For major schema changes I would need to implement another set of classes anyway, so as to be able to have both loaded at once for translation.

Here is the full code for both classes and well as a link to a project you can use directly.



   1:  [Serializable]

   2:  public class SerializableList<TValue>

   3:  : List<TValue>, IXmlSerializable

   4:  {

   5:      #region IXmlSerializable Members

   6:   

   7:      public XmlSchema GetSchema()

   8:      {

   9:          return null;

  10:      }

  11:   

  12:      public void ReadXml(XmlReader reader)

  13:      {

  14:          bool wasEmpty = reader.IsEmptyElement;

  15:          reader.Read();

  16:          if (wasEmpty)

  17:          return;

  18:   

  19:          while (reader.NodeType != XmlNodeType.EndElement)

  20:          {

  21:          string StateTypeDescriptor = reader.GetAttribute("type");

  22:          Type StateType = Type.GetType(StateTypeDescriptor);

  23:   

  24:          reader.ReadStartElement();

  25:          XmlSerializer valueSerializer = new XmlSerializer(StateType);

  26:          this.Add((TValue)valueSerializer.Deserialize(reader));

  27:   

  28:          reader.ReadEndElement();

  29:          reader.MoveToContent();

  30:          }

  31:          reader.ReadEndElement();

  32:      }

  33:   

  34:      public void WriteXml(XmlWriter writer)

  35:      {

  36:          foreach (TValue item in this)

  37:          {

  38:          Type ValueType = item.GetType();

  39:          XmlSerializer valueSerializer = new XmlSerializer(ValueType);

  40:          string SubElementName = "Item";

  41:   

  42:          writer.WriteStartElement(SubElementName);

  43:   

  44:          writer.WriteStartAttribute("type");

  45:          writer.WriteString(Serialization.GetTypeName(ValueType));

  46:          writer.WriteEndAttribute();

  47:   

  48:          valueSerializer.Serialize(writer, item);

  49:   

  50:          writer.WriteEndElement();

  51:          }

  52:      }

  53:   

  54:      #endregion

  55:   

  56:  }





   1:  [Serializable]

   2:  public class SerializableDictionary<TKey, TValue>

   3:  : Dictionary<TKey, TValue>, IXmlSerializable

   4:  {

   5:      #region IXmlSerializable Members

   6:      public XmlSchema GetSchema()

   7:      {

   8:          return null;

   9:      }

  10:   

  11:      public void ReadXml(XmlReader reader)

  12:      {

  13:          XmlSerializer keySerializer = new XmlSerializer(typeof(TKey));

  14:   

  15:          bool wasEmpty = reader.IsEmptyElement;

  16:          reader.Read();

  17:          if (wasEmpty)

  18:          return;

  19:   

  20:          while (reader.NodeType != System.Xml.XmlNodeType.EndElement)

  21:          {

  22:          string StateTypeDescriptor = reader.GetAttribute("type");

  23:          Type StateType = Type.GetType(StateTypeDescriptor);

  24:          XmlSerializer valueSerializer = new XmlSerializer(StateType);

  25:   

  26:          reader.ReadToFollowing("key");

  27:          reader.ReadStartElement("key");

  28:          TKey key = (TKey)keySerializer.Deserialize(reader);

  29:          reader.ReadEndElement();

  30:   

  31:          reader.ReadStartElement("value");

  32:          TValue value = (TValue)valueSerializer.Deserialize(reader);

  33:          reader.ReadEndElement();

  34:   

  35:          this.Add(key, value);

  36:          reader.ReadEndElement();

  37:          }

  38:   

  39:          reader.ReadEndElement();

  40:      }

  41:   

  42:      public void WriteXml(System.Xml.XmlWriter writer)

  43:      {

  44:          // the keys can't be subclassed,only the values can

  45:          XmlSerializer keySerializer = new XmlSerializer(typeof(TKey));

  46:   

  47:          foreach (TKey key in this.Keys)

  48:          {

  49:          TValue value = this[key];

  50:          Type ValueType = this[key].GetType();

  51:          XmlSerializer valueSerializer = new XmlSerializer(ValueType);

  52:   

  53:          writer.WriteStartElement("item");

  54:          writer.WriteStartAttribute("type");

  55:          writer.WriteString(Serialization.GetTypeName(ValueType));

  56:          writer.WriteEndAttribute();

  57:   

  58:          // serialize the key

  59:          writer.WriteStartElement("key");

  60:          keySerializer.Serialize(writer, key);

  61:          writer.WriteEndElement();

  62:          writer.WriteStartElement("value");

  63:          valueSerializer.Serialize(writer, value);

  64:          writer.WriteEndElement();

  65:   

  66:          writer.WriteEndElement();

  67:          }

  68:      }

  69:      #endregion

  70:  }

  71:   





   1:  namespace Pragmatix.Serialization

   2:  {

   3:      public class Serialization

   4:      {

   5:   

   6:          public static string GetTypeName(Type t)

   7:          {

   8:              string ClassName = t.FullName;

   9:              string SimpleAssembly = t.Assembly.FullName.Split(new string[] {","}, StringSplitOptions.RemoveEmptyEntries)[0];

  10:              return ClassName + ", " + SimpleAssembly;

  11:          }

  12:      }

  13:  }



Source code

2 comments:

Eric said...

Hi,

Just wondering where Serialization.GetTypeName() is? Is this a static helper class that is not shown in the example, or is it in some deep dark corner of .Net that I have not yet explored?
:-)

Thanks

Brian said...

Sorry about that, that is a custom class.

I'm modifying the class name to avoid outputting the full assembly name, with version info. Basically I want:

Name.Space.ClassName, Assembly

not

Name.Space.ClassName, Assembly, Version=x.x.x, Culture=neutral, PublicKeyToken=yyyyyyyyyyy.

I wanted to be able to version my assemblies without worrying about conflicts when loading the class, especially for dynamically loaded code like plug-ins.

I've added the class to the blog post.