Fast, flexible, and version-tolerant serializer for .NET
There are many existing serialization libraries and formats which are efficient, fast, and support schema evolution, so why create this?
Existing serialization libraries which support version tolerance tend to restrict how data is modelled, usually by providing a very restricted type system which supports few of the features found in common type systems, features such as:
Hagar is a new serialization library which does supports these features, is fast & compact, supports schema evolution, and requires minimal input from the developer.
[W W W] [S S] [F F F]
where:
W
is a wire type bit.S
is a schema type bit.F
is a field identifier bit.[1 1 1] [E E] [X X X]
where:
E
is an extended wire type bit.X
is reserved for use in the context of the extended wire type.Tag Schema FieldId FieldData
.int64
is encoded as a Varint
and that float32
is encoded as a fixed 32-bit field. Instead, the serializer can determine that a long
is encoded as VarInt
, Fixed32
, or Fixed64
at runtime depending on which takes up the least space./// <summary>
/// Represents a 3-bit wire type, shifted into position
/// </summary>
public enum WireType : byte
{
VarInt = 0b000 << 5, // Followed by a VarInt
TagDelimited = 0b001 << 5, // Followed by field specifiers, then an Extended tag with EndTagDelimited as the extended wire type.
LengthPrefixed = 0b010 << 5, // Followed by VarInt length representing the number of bytes which follow.
Fixed32 = 0b011 << 5, // Followed by 4 bytes
Fixed64 = 0b100 << 5, // Followed by 8 bytes
Reference = 0b110 << 5, // Followed by a VarInt reference to a previously defined object. Note that the SchemaType and type specification must still be included.
Extended = 0b111 << 5, // This is a control tag. The schema type and embedded field id are invalid. The remaining 5 bits are used for control information.
}
public enum SchemaType : byte
{
Expected = 0b00 << 3, // This value has the type expected by the current schema.
WellKnown = 0b01 << 3, // This value is an instance of a well-known type. Followed by a VarInt type id.
Encoded = 0b10 << 3, // This value is of a named type. Followed by an encoded type name.
Referenced = 0b11 << 3, // This value is of a type which was previously specified. Followed by a VarInt indicating which previous type is being reused.
}
public enum ExtendedWireType : byte
{
EndTagDelimited = 0b00 << 3, // This tag marks the end of a tag-delimited object. Field id is invalid.
EndBaseFields = 0b01 << 3, // This tag marks the end of a base object in a tag-delimited object.
}
If a type has base types, the fields of the base types are serialized before the subtype fields. Between the base type fields and its sub type is an EndBaseFields
tag. This allows base types and sub types to have overlapping field ids without ambiguity. Therefore object encoding follows this pattern: [StartTagDelimited] [Base Fields]* [EndBaseFields] [Sub Type fields]* [EndTagDelimited]
.
Third-party serializers such as ProtoBuf, Bond, .NET’s BinaryFormatter, JSON.NET, etc, are supported by serializing using a serializer-specific type id and including the payload via the length-prefixed wire type. This has the advantage of supporting any number of well-known serializers and does not require double-encoding the concrete type, since the external serializer is responsible for that.
Allowing arbitrary types to be specified in a serialized payload is a vector for security vulnerabilities. Because of this, all types should be checked against a whitelist.
Version Tolerance is supported provided the developer follows a set of rules when modifying types. If the developer is familiar with systems such as ProtoBuf and Bond, then these rules will come as no surprise.
class
& struct
)Base
class can declare a field with id 0
and a different field can be declared by Sub : Base
with the same id, 0
.int
& uint
are invalid.int
to long
or ulong
to ushort
are supported.ulong
to ushort
are only supported if the value at runtime is less than ushort.MaxValue
.double
to float
are only supported if the runtime value is between float.MinValue
and float.MaxValue
.decimal
, which has a narrower range than both double
and float
.WellKnown
type or a TypeCodec
is used to translate between the old and new name.Packages are published to nuget.org: https://www.nuget.org/packages?q=hagar
Running build.ps1
will build and locally publish packages for testing purposes