Message serialization
IPv8 gives you as much control as possible over the messages you send over the Internet.
The Overlay
(or Community
) class lets you send arbitrary strings over the (UDP) endpoint
.
However efficient this may be, having non-standardized string contruction for each message of your overlay can distract from the overal overlay design.
This is the age-old dichotomy of maintainable versus performant code.
The basic class for serializing objects from and to strings/network packets is the Serializer
(ipv8/messaging/serialization.py
).
Though the Serializer
is extensible, you will mostly only need the default serializer default_serializer
.
You can use the Serializer
with classes of the following types:
class |
path |
description |
---|---|---|
Serializable |
ipv8/messaging/serialization.py |
Base class for all things serializable. Should support the instance method to_pack_list() and the class method from_unpack_list(). |
Payload |
ipv8/messaging/payload.py |
Extension of the Serializable class with logic for pretty printing. |
VariablePayload |
ipv8/messaging/lazy_payload.py |
Less verbose way to specify Payloads, at the cost of performance. |
dataclass |
ipv8/messaging/payload_dataclass.py |
Use dataclasses to send messages, at the cost of control and performance. |
Other than the dataclass
, each of these serializable classes specifies a list of primitive data types it will serialize to and from.
The primitive data types are specified in the data types Section.
Each serializable class has to specify the following class members (dataclass
does this automatically):
member |
description |
---|---|
format_list |
A list containing valid data type primitive names. |
names |
Only for VariablePayload classes, the instance fields to bind the data types to. |
As an example, we will now define four completely wire-format compatible messages using the four classes.
Each of the messages will serialize to a (four byte) unsigned integer followed by an (two byte) unsigned short.
If the dataclass
had used normal int
types, these would have been two signed 8-byte integers instead.
Each instance will have two fields: field1
and field2
corresponding to the integer and short.
class MySerializable(Serializable):
format_list = ['I', 'H']
def __init__(self, field1: int, field2: int) -> None:
self.field1 = field1
self.field2 = field2
def to_pack_list(self) -> list[tuple]:
return [('I', self.field1),
('H', self.field2)]
@classmethod
def from_unpack_list(cls: type[MySerializable],
field1: int, field2: int) -> MySerializable:
return cls(field1, field2)
class MyPayload(Payload):
format_list = ['I', 'H']
def __init__(self, field1: int, field2: int) -> None:
self.field1 = field1
self.field2 = field2
def to_pack_list(self) -> list[tuple]:
return [('I', self.field1),
('H', self.field2)]
@classmethod
def from_unpack_list(cls: type[MyPayload],
field1: int, field2: int) -> MyPayload:
return cls(field1, field2)
class MyVariablePayload(VariablePayload):
format_list = ['I', 'H']
names = ['field1', 'field2']
@vp_compile
class MyCVariablePayload(VariablePayload):
format_list = ['I', 'H']
names = ['field1', 'field2']
I = type_from_format('I')
H = type_from_format('H')
@dataclass
class MyDataclassPayload:
field1: I
field2: H
To show some of the differences, let’s check out the output of the following script using these definitions:
serializable1 = MySerializable(1, 2)
serializable2 = MyPayload(1, 2)
serializable3 = MyVariablePayload(1, 2)
serializable4 = MyCVariablePayload(1, 2)
serializable5 = MyDataclassPayload(1, 2)
print("As string:")
print(serializable1)
print(serializable2)
print(serializable3)
print(serializable4)
print(serializable5)
As string:
<__main__.MySerializable object at 0x7f732a23c1f0>
MyPayload
| field1: 1
| field2: 2
MyVariablePayload
| field1: 1
| field2: 2
MyCVariablePayload
| field1: 1
| field2: 2
MyDataclassPayload
| field1: 1
| field2: 2
Datatypes
Next to the unsigned integer and unsigned short data types, the default Serializer has many more data types to offer.
The following table lists all data types available by default, all values are big-endian and most follow the default Python struct
format.
A Serializer
can be extended with additional data types by calling serializer.add_packer(name, packer)
, where packer
represent the object responsible for (un)packing the data type. The most commonly used packer is DefaultStruct
, which can be used with arbitrary struct
formats (for example serializer.add_packer("I", DefaultStruct(">I"))
).
member |
bytes |
unserialized type |
---|---|---|
? |
1 |
boolean |
B |
1 |
unsigned byte |
BBH |
4 |
[unsigned byte, unsigned byte, unsigned short] |
BH |
3 |
[unsigned byte, unsigned short] |
c |
1 |
signed byte |
f |
4 |
signed float |
d |
8 |
signed double |
H |
2 |
unsigned short |
HH |
4 |
[unsigned short, unsigned short] |
I |
4 |
unsigned integer |
l |
4 |
signed long |
LL |
8 |
[unsigned long, unsigned long] |
q |
8 |
signed long long |
Q |
8 |
unsigned long long |
QH |
10 |
[unsigned long long, unsigned short] |
QL |
12 |
[unsigned long long, unsigned long] |
QQHHBH |
23 |
[unsigned long long, unsigned long long, unsigned short, unsigned short, unsigned byte, unsigned long] |
ccB |
3 |
[signed byte, signed byte, unsigned byte] |
4SH |
6 |
[str (length 4), unsigned short] |
20s |
20 |
str (length 20) |
32s |
20 |
str (length 32) |
64s |
20 |
str (length 64) |
74s |
20 |
str (length 74) |
c20s |
21 |
[unsigned byte, str (length 20)] |
bits |
1 |
[bit 0, bit 1, bit 2, bit 3, bit 4, bit 5, bit 6, bit 7] |
ipv4 |
6 |
[str (length 7-15), unsigned short] |
raw |
? |
str (length ?) |
varlenBx2 |
1 + ? * 2 |
[str (length = 2), ... ] (length < 256) |
varlenH |
2 + ? |
str (length ? < 65356) |
varlenHutf8 |
2 + ? |
str (encoded length ? < 65356) |
varlenHx20 |
2 + ? * 20 |
[str (length = 20), ... ] (length < 65356) |
varlenH-list |
1 + ? * (2 + ??) |
[str (length < 65356)] (length < 256) |
varlenI |
4 + ? |
str (length < 4294967295) |
doublevarlenH |
2 + ? |
str (length ? < 65356) |
payload |
2 + ? |
Serializable |
payload-list |
? |
[Serializable] |
arrayH-? |
2 + ? * 1 |
[bool] |
arrayH-q |
2 + ? * 8 |
[int] |
arrayH-d |
2 + ? * 8 |
[float] |
Some of these data types represent common usage of serializable classes:
member |
description |
---|---|
4SH |
(IP, port) tuples |
20s |
SHA-1 hashes |
32s |
libnacl signatures |
64s |
libnacl public keys |
74s |
libnacl public keys with prefix |
Special instances are the raw
and payload
data types.
raw
: can only be used as the last element in a format list as it will consume the remainder of the input string (avoid if possible).payload
: will nest anotherSerializable
instance into this instance. When used, theformat_list
should specify the class of the nestedSerializable
and theto_pack_list()
output should give a tuple of("payload", the_nested_instance)
. TheVariablePayload
automatically infers theto_pack_list()
for you. See theNestedPayload
class definition for more info.
The ez_pack family for Community classes
All subclasses of the EZPackOverlay
class (most commonly subclasses of the Community
class) have a short-cut for serializing messages belonging to the particular overlay.
This standardizes the prefix and message ids of overlays.
Concretely, it uses the first 23 bytes of each packet to handle versioning and routing (demultiplexing) packets to the correct overlay.
The ezr_pack
method of EZPackOverlay
subclasses takes an (integer) message number and a variable amount of Serializable
instances.
Optionally you can choose to not have the message signed (supply the sig=True
or sig=False
keyword argument for respectively a signature or no signature over the packet).
The lazy_wrapper
and lazy_wrapper_unsigned
decorators can then respectively be used for unserializing payloads which are signed or not signed.
Simply supply the payload classes you wish to unserialize to, to the decorator.
As some internal messages and deprecated messages use some of the message range, you have the messages identifiers from 0 through 234 available for your custom message definitions.
Once you register the message handler and have the appropriate decorator on the specified handler method your overlay can communicate with the Internet.
In practice, given a COMMUNITY_ID
and the payload definitions MyMessagePayload1
and MyMessagePayload2
, this will look something like this example (see the overlay tutorial for a complete runnable example):
class MyCommunity(Community):
community_id = COMMUNITY_ID
def __init__(self, settings: CommunitySettings) -> None:
super().__init__(settings)
self.add_message_handler(1, self.on_message)
@lazy_wrapper(MyMessagePayload1, MyMessagePayload2)
def on_message(self, peer: Peer, payload1: MyMessagePayload1,
payload2: MyMessagePayload2) -> None:
print("Got a message from:", peer)
print("The message includes the first payload:\n", payload1)
print("The message includes the second payload:\n", payload2)
def send_message(self, peer: Peer) -> None:
packet = self.ezr_pack(1, MyMessagePayload1(), MyMessagePayload2())
It is recommended (but not obligatory) to have single payload messages store the message identifier inside the Payload.msg_id
field, as this improves readability:
self.add_message_handler(MyMessage1, self.on_message1)
self.add_message_handler(MyMessage2, self.on_message2)
self.ez_send(peer, MyMessage1(42))
self.ez_send(peer, MyMessage2(7))
If you are using the @dataclass
wrapper you can specify the message identifier through an argument instead.
For example, @dataclass(msg_id=42)
would set the message identifier to 42
.
Of course, IPv8 also ships with various Community
subclasses of its own, if you need inspiration.
Using external serialization options
IPv8 is compatible with pretty much all third-party message serialization packages.
However, before hooking one of these packages into IPv8 you may want to ask yourself whether you have fallen victim to marketing hype.
After all, XML
is the one unifying standard we will never switch away from, right?
Oh wait, no, it’s JSON
.
My bad, it’s Protobuf
.
Or was it ASN.1
?
You get the point.
In this world, only the core IPv8
serialization format remains constant.
There are three main ways to hook in external serialization: per message, per Serializer and per Community. The three methods can be freely mixed.
Custom serialization per message
If you only want to use custom seralization for (part of) a single overlay message, you can use VariablePayload
field modification (this also works for dataclass payloads).
This method involves implementing the methods fix_pack_<your field name>
and fix_unpack_<your field name>
for the fields of your message that use custom serialization.
Check out the following example:
@vp_compile
class VPMessageKeepDict(VariablePayload):
msg_id = 1
format_list = ['varlenH']
names = ["dictionary"]
def fix_pack_dictionary(self, the_dictionary: dict) -> bytes:
return json.dumps(the_dictionary).encode()
@classmethod
def fix_unpack_dictionary(cls: type[VPMessageKeepDict],
serialized_dictionary: bytes) -> dict:
return json.loads(serialized_dictionary.decode())
@dataclass(msg_id=2)
class DCMessageKeepDict:
dictionary: str
def fix_pack_dictionary(self, the_dictionary: dict) -> str:
return json.dumps(the_dictionary)
@classmethod
def fix_unpack_dictionary(cls: type[DCMessageKeepDict],
serialized_dictionary: str) -> dict:
return json.loads(serialized_dictionary)
In both classes we create a message with a single field dictionary
.
To pack this field, we use json.dumps()
to create a string representation of the dictionary.
When loading a message, json.loads()
is used to create a dictionary from the serialized data.
Instead of json
you could also use any serialization of your liking.
Using the same transformations for all fields makes your payloads very lengthy. In this case, you may want to look into specifying a custom serialization format.
Custom serialization formats
It is possible to specify new formats by adding packing formats to a Serializer
instance.
You can easily do so by overwriting your Community.get_serializer()
method.
This Serializer
is sandboxed per Community
instance, so you don’t have to worry about breaking other instances.
Check out the following example and note that the message is now much smaller at the expense of having to define a custom (complicated) packing format.
@vp_compile
class Message(VariablePayload):
msg_id = 1
format_list = ['json', 'json', 'json', 'json']
names = ["d1", "d2", "d3", "d4"]
class PackerJSON(Packer):
def pack(self, data: Any) -> bytes:
packed = json.dumps(data).encode()
size = struct.pack(">H", len(packed))
return size + packed
def unpack(self, data: bytes, offset: int,
unpack_list: list, *args: Any) -> int:
size, = struct.unpack_from(">H", data, offset)
json_data_start = offset + 2
json_data_end = json_data_start + size
serialized = data[json_data_start:json_data_end]
unpack_list.append(json.loads(serialized))
return json_data_end
class MyCommunity(Community):
def get_serializer(self) -> Serializer:
serializer = super().get_serializer()
serializer.add_packer('json', PackerJSON())
return serializer
The line serializer.add_packer('json', PackerJSON())
adds the new format json
that is used in Message
.
In fact, any further message added to this Community
can now use the json
format.
However, you may also note some additional complexity in the PackerJSON
class.
Our custom packer PackerJSON
implements two required methods: pack()
and unpack()
.
The former serializes data using custom serialization (json.dumps()
in this case).
We use a big-endian unsigned short (">H"
) to determine the length of the serialized JSON data.
The unpack()
method creates JSON objects from the serialized data, returning the new offset in the data
stream and adding the object ot the unpack_list
list.
Custom Community data handling
It is possible to circumvent IPv8 message formats altogether.
In its most extreme form, you can overwrite Community.on_packet(packet)
to inspect all raw data sent to your Community
instance.
The packet
is a tuple of (source_address, data)
.
You can write raw data back to an address using self.endpoint.send(address, data)
.
If you want to mix with other messages, you should use the message byte. The following example shows how to use JSON serialization without any IPv8 serialization. Note that we need to do our own signature checks now.
class MyCommunity(Community):
community_id = os.urandom(20)
def __init__(self, settings: CommunitySettings) -> None:
super().__init__(settings)
self.event = None
self.add_message_handler(1, self.on_message)
def send_message(self, peer: Peer) -> None:
message = json.dumps({"key": "value", "key2": "value2"})
public_key = to_hex(self.my_peer.public_key.key_to_bin())
signature = to_hex(self.my_peer.key.signature(message.encode()))
signed_message = json.dumps({"message": message,
"public_key": public_key,
"signature": signature}).encode()
self.endpoint.send(peer.address,
self.get_prefix() + b'\x01' + signed_message)
def on_message(self, source_address: Address, data: bytes) -> None:
# Account for 1 byte message id
header_length = len(self.get_prefix()) + 1
# Strip the IPv8 multiplexing data
received = json.loads(data[header_length:])
public_key = self.crypto.key_from_public_bin(unhexlify(received["public_key"]))
valid = self.crypto.is_valid_signature(public_key,
received["message"].encode(),
unhexlify(received["signature"]))
self.logger.info("Received message %s from %s, the signature is %s!",
received['message'], source_address, valid)
Nested Payloads
It is possible to put a Payload
inside another Payload
.
We call these nested payloads.
You can specify them by using the "payload"
datatype and setting the Payload
class in the format list.
For a VariablePayload
this looks like the following example.
class A(VariablePayload):
format_list = ['I', 'H']
names = ["foo", "bar"]
class B(VariablePayload):
format_list = [A, 'H'] # Note that we pass the class A
names = ["a", "baz"]
For dataclass payloads this nesting is supported by simply specifying nested classes as follows.
@dataclass(msg_id=1)
class Message:
@dataclass
class Item:
foo: int
bar: int
item: Item
items: [Item] # Yes, you can even make this a list!
baz: int