Message serialization

IPv8 gives you as much control as possible over the messages you send over the Internet. The Overlay (or Community) class lets you send arbitrary strings over the (UDP) endpoint. However efficient this may be, having non-standardized string construction for each message of your overlay can distract from the overall overlay design. This is the age-old dichotomy of maintainable versus performant code.

The basic class for serializing objects from and to strings/network packets is the Serializer (ipv8/messaging/serialization.py). Though the Serializer is extensible, you will mostly only need the default serializer default_serializer. You can use the Serializer with classes of the following types:

Serializable classes
class	path	description
Serializable	ipv8/messaging/serialization.py	Base class for all things serializable. Should support the instance method to_pack_list() and the class method from_unpack_list().
Payload	ipv8/messaging/payload.py	Extension of the Serializable class with logic for pretty printing.
VariablePayload	ipv8/messaging/lazy_payload.py	Less verbose way to specify Payloads, at the cost of performance.
dataclass	ipv8/messaging/payload_dataclass.py	Use dataclasses to send messages, at the cost of control and performance.

Other than the dataclass, each of these serializable classes specifies a list of primitive data types it will serialize to and from. The primitive data types are specified in the data types Section. Each serializable class has to specify the following class members (dataclass does this automatically):

Serializable class members
member	description
format_list	A list containing valid data type primitive names.
names	Only for VariablePayload classes, the instance fields to bind the data types to.

As an example, we will now define four completely wire-format compatible messages using the four classes. Each of the messages will serialize to a (four byte) unsigned integer followed by an (two byte) unsigned short. If the dataclass had used normal int types, these would have been two signed 8-byte integers instead. Each instance will have two fields: field1 and field2 corresponding to the integer and short.

class MySerializable(Serializable):
    format_list = ["I", "H"]

    def __init__(self, field1: int, field2: int) -> None:
        self.field1 = field1
        self.field2 = field2

    def to_pack_list(self) -> list[tuple]:
        return [("I", self.field1),
                ("H", self.field2)]

    @classmethod
    def from_unpack_list(cls: type[MySerializable],
                         field1: int, field2: int) -> MySerializable:
        return cls(field1, field2)


class MyPayload(Payload):
    format_list = ["I", "H"]

    def __init__(self, field1: int, field2: int) -> None:
        self.field1 = field1
        self.field2 = field2

    def to_pack_list(self) -> list[tuple]:
        return [("I", self.field1),
                ("H", self.field2)]

    @classmethod
    def from_unpack_list(cls: type[MyPayload],
                         field1: int, field2: int) -> MyPayload:
        return cls(field1, field2)


class MyVariablePayload(VariablePayload):
    format_list = ["I", "H"]
    names = ["field1", "field2"]


@vp_compile
class MyCVariablePayload(VariablePayload):
    format_list = ["I", "H"]
    names = ["field1", "field2"]


I = type_from_format("I")
H = type_from_format("H")


@dataclass
class MyDataclassPayload(DataClassPayload):
    field1: I
    field2: H

To show some of the differences, let’s check out the output of the following script using these definitions:

serializable1 = MySerializable(1, 2)
serializable2 = MyPayload(1, 2)
serializable3 = MyVariablePayload(1, 2)
serializable4 = MyCVariablePayload(1, 2)
serializable5 = MyDataclassPayload(1, 2)

print("As string:")
print(serializable1)
print(serializable2)
print(serializable3)
print(serializable4)
print(serializable5)

As string:
<__main__.MySerializable object at 0x7f732a23c1f0>
MyPayload
| field1: 1
| field2: 2
MyVariablePayload
| field1: 1
| field2: 2
MyCVariablePayload
| field1: 1
| field2: 2
MyDataclassPayload
| field1: 1
| field2: 2

Datatypes

Next to the unsigned integer and unsigned short data types, the default Serializer has many more data types to offer. The following table lists all data types available by default, all values are big-endian and most follow the default Python struct format. A Serializer can be extended with additional data types by calling serializer.add_packer(name, packer), where packer represent the object responsible for (un)packing the data type. The most commonly used packer is DefaultStruct, which can be used with arbitrary struct formats (for example serializer.add_packer("I", DefaultStruct(">I"))).

Available data types
member	bytes	unserialized type
?	1	boolean
B	1	unsigned byte
BBH	4	[unsigned byte, unsigned byte, unsigned short]
BH	3	[unsigned byte, unsigned short]
c	1	signed byte
f	4	signed float
d	8	signed double
H	2	unsigned short
HH	4	[unsigned short, unsigned short]
I	4	unsigned integer
l	4	signed long
LL	8	[unsigned long, unsigned long]
q	8	signed long long
Q	8	unsigned long long
QH	10	[unsigned long long, unsigned short]
QL	12	[unsigned long long, unsigned long]
QQHHBH	23	[unsigned long long, unsigned long long, unsigned short, unsigned short, unsigned byte, unsigned long]
ccB	3	[signed byte, signed byte, unsigned byte]
4SH	6	[str (length 4), unsigned short]
20s	20	str (length 20)
32s	20	str (length 32)
64s	20	str (length 64)
74s	20	str (length 74)
c20s	21	[unsigned byte, str (length 20)]
bits	1	[bit 0, bit 1, bit 2, bit 3, bit 4, bit 5, bit 6, bit 7]
ipv4	6	[str (length 7-15), unsigned short]
raw	?	str (length ?)
varlenBx2	1 + ? * 2	[str (length = 2), ... ] (length < 256)
varlenH	2 + ?	str (length ? < 65356)
varlenHutf8	2 + ?	str (encoded length ? < 65356)
varlenHx20	2 + ? * 20	[str (length = 20), ... ] (length < 65356)
varlenH-list	1 + ? * (2 + ??)	[str (length < 65356)] (length < 256)
varlenI	4 + ?	str (length < 4294967295)
doublevarlenH	2 + ?	str (length ? < 65356)
payload	2 + ?	Serializable
payload-list	?	[Serializable]
arrayH-?	2 + ? * 1	[bool]
arrayH-q	2 + ? * 8	[int]
arrayH-d	2 + ? * 8	[float]

Some of these data types represent common usage of serializable classes:

Common data types
member	description
4SH	(IP, port) tuples
20s	SHA-1 hashes
32s	libnacl signatures
64s	libnacl public keys
74s	libnacl public keys with prefix

Special instances are the raw and payload data types.

raw: can only be used as the last element in a format list as it will consume the remainder of the input string (avoid if possible).
payload: will nest another Serializable instance into this instance. When used, the format_list should specify the class of the nested Serializable and the to_pack_list() output should give a tuple of ("payload", the_nested_instance). The VariablePayload automatically infers the to_pack_list() for you. See the NestedPayload class definition for more info.

The ez_pack family for Community classes

All subclasses of the EZPackOverlay class (most commonly subclasses of the Community class) have a short-cut for serializing messages belonging to the particular overlay. This standardizes the prefix and message ids of overlays. Concretely, it uses the first 23 bytes of each packet to handle versioning and routing (demultiplexing) packets to the correct overlay.

The ezr_pack method of EZPackOverlay subclasses takes an (integer) message number and a variable amount of Serializable instances. Optionally you can choose to not have the message signed (supply the sig=True or sig=False keyword argument for respectively a signature or no signature over the packet).

The lazy_wrapper and lazy_wrapper_unsigned decorators can then respectively be used for unserializing payloads which are signed or not signed. Simply supply the payload classes you wish to unserialize to, to the decorator.

As some internal messages and deprecated messages use some of the message range, you have the messages identifiers from 0 through 234 available for your custom message definitions. Once you register the message handler and have the appropriate decorator on the specified handler method your overlay can communicate with the Internet. In practice, given a COMMUNITY_ID and the payload definitions MyMessagePayload1 and MyMessagePayload2, this will look something like this example (see the overlay tutorial for a complete runnable example):

class MyCommunity(Community):
    community_id = COMMUNITY_ID

    def __init__(self, settings: CommunitySettings) -> None:
        super().__init__(settings)

        self.add_message_handler(1, self.on_message)

    @lazy_wrapper(MyMessagePayload1, MyMessagePayload2)
    def on_message(self, peer: Peer, payload1: MyMessagePayload1,
                   payload2: MyMessagePayload2) -> None:
        print("Got a message from:", peer)
        print("The message includes the first payload:\n", payload1)
        print("The message includes the second payload:\n", payload2)

    def send_message(self, peer: Peer) -> None:
        packet = self.ezr_pack(1, MyMessagePayload1(), MyMessagePayload2())

It is recommended (but not obligatory) to have single payload messages store the message identifier inside the Payload.msg_id field, as this improves readability:

    self.add_message_handler(MyMessage1, self.on_message1)
    self.add_message_handler(MyMessage2, self.on_message2)
    self.ez_send(peer, MyMessage1(42))
    self.ez_send(peer, MyMessage2(7))

If you are using the @dataclass wrapper you can specify the message identifier through parameterization of the DataClassPayload base class instead. For example, class MyPayload(DataClassPayload[42]): would set the message identifier to 42.

Of course, IPv8 also ships with various Community subclasses of its own, if you need inspiration.

Using external serialization options

IPv8 is compatible with pretty much all third-party message serialization packages. However, before hooking one of these packages into IPv8 you may want to ask yourself whether you have fallen victim to marketing hype. After all, XML is the one unifying standard we will never switch away from, right? Oh wait, no, it’s JSON. My bad, it’s Protobuf. Or was it ASN.1? You get the point. In this world, only the core IPv8 serialization format remains constant.

There are three main ways to hook in external serialization: per message, per Serializer and per Community. The three methods can be freely mixed.

Custom serialization per message

If you only want to use custom serialization for (part of) a single overlay message, you can use VariablePayload field modification (this also works for dataclass payloads). This method involves implementing the methods fix_pack_<your field name> and fix_unpack_<your field name> for the fields of your message that use custom serialization. Check out the following example:

@vp_compile
class VPMessageKeepDict(VariablePayload):
    msg_id = 1
    format_list = ["varlenH"]
    names = ["dictionary"]

    def fix_pack_dictionary(self, the_dictionary: dict) -> bytes:
        return json.dumps(the_dictionary).encode()

    @classmethod
    def fix_unpack_dictionary(cls: type[VPMessageKeepDict],
                              serialized_dictionary: bytes) -> dict:
        return json.loads(serialized_dictionary.decode())


@dataclass
class DCMessageKeepDict(DataClassPayload[2]):
    dictionary: str

    def fix_pack_dictionary(self, the_dictionary: dict) -> str:
        return json.dumps(the_dictionary)

    @classmethod
    def fix_unpack_dictionary(cls: type[DCMessageKeepDict],
                              serialized_dictionary: str) -> dict:
        return json.loads(serialized_dictionary)

In both classes we create a message with a single field dictionary. To pack this field, we use json.dumps() to create a string representation of the dictionary. When loading a message, json.loads() is used to create a dictionary from the serialized data. Instead of json you could also use any serialization of your liking.

Using the same transformations for all fields makes your payloads very lengthy. In this case, you may want to look into specifying a custom serialization format.

Custom serialization formats

It is possible to specify new formats by adding packing formats to a Serializer instance. You can easily do so by overwriting your Community.get_serializer() method. This Serializer is sandboxed per Community instance, so you don’t have to worry about breaking other instances. Check out the following example and note that the message is now much smaller at the expense of having to define a custom (complicated) packing format.

@vp_compile
class Message(VariablePayload):
    msg_id = 1
    format_list = ["json", "json", "json", "json"]
    names = ["d1", "d2", "d3", "d4"]


class PackerJSON(Packer):

    def pack(self, data: Any) -> bytes:
        packed = json.dumps(data).encode()
        size = struct.pack(">H", len(packed))
        return size + packed

    def unpack(self, data: bytes, offset: int,
               unpack_list: list, *args: Any) -> int:
        size, = struct.unpack_from(">H", data, offset)

        json_data_start = offset + 2
        json_data_end = json_data_start + size

        serialized = data[json_data_start:json_data_end]
        unpack_list.append(json.loads(serialized))

        return json_data_end


class MyCommunity(Community):

    def get_serializer(self) -> Serializer:
        serializer = super().get_serializer()
        serializer.add_packer("json", PackerJSON())
        return serializer

The line serializer.add_packer('json', PackerJSON()) adds the new format json that is used in Message. In fact, any further message added to this Community can now use the json format. However, you may also note some additional complexity in the PackerJSON class.

Our custom packer PackerJSON implements two required methods: pack() and unpack(). The former serializes data using custom serialization (json.dumps() in this case). We use a big-endian unsigned short (">H") to determine the length of the serialized JSON data. The unpack() method creates JSON objects from the serialized data, returning the new offset in the data stream and adding the object ot the unpack_list list.

Custom Community data handling

It is possible to circumvent IPv8 message formats altogether. In its most extreme form, you can overwrite Community.on_packet(packet) to inspect all raw data sent to your Community instance. The packet is a tuple of (source_address, data). You can write raw data back to an address using self.endpoint.send(address, data).

If you want to mix with other messages, you should use the message byte. The following example shows how to use JSON serialization without any IPv8 serialization. Note that we need to do our own signature checks now.

    community_id = os.urandom(20)

    def __init__(self, settings: CommunitySettings) -> None:
        super().__init__(settings)
        self.event = None
        self.add_message_handler(1, self.on_message)

    def send_message(self, peer: Peer) -> None:
        message = json.dumps({"key": "value", "key2": "value2"})
        public_key = to_hex(self.my_peer.public_key.key_to_bin())
        signature = to_hex(self.my_peer.key.signature(message.encode()))

        signed_message = json.dumps({"message": message,
                                     "public_key": public_key,
                                     "signature": signature}).encode()
        self.endpoint.send(peer.address,
                           self.get_prefix() + b"\x01" + signed_message)

    def on_message(self, source_address: Address, data: bytes) -> None:
        # Account for 1 byte message id
        header_length = len(self.get_prefix()) + 1
        # Strip the IPv8 multiplexing data
        received = json.loads(data[header_length:])

        public_key = self.crypto.key_from_public_bin(unhexlify(received["public_key"]))
        valid = self.crypto.is_valid_signature(public_key,
                                               received["message"].encode(),
                                               unhexlify(received["signature"]))
        self.logger.info("Received message %s from %s, the signature is %s!",
                         received["message"], source_address, valid)

Nested Payloads

It is possible to put a Payload inside another Payload. We call these nested payloads. You can specify them by using the "payload" datatype and setting the Payload class in the format list. For a VariablePayload this looks like the following example.

class A(VariablePayload):
    format_list = ["I", "H"]
    names = ["foo", "bar"]


class B(VariablePayload):
    format_list = [A, "H"]  # Note that we pass the class A
    names = ["a", "baz"]

For dataclass payloads this nesting is supported by simply specifying nested classes as follows.

@dataclass
class Message(DataClassPayload[1]):
    @dataclass
    class Item:
        foo: int
        bar: int

    item: Item
    items: [Item]  # Yes, you can even make this a list!
    baz: int