construct: add adapter Utf8Adapter to safely interpret utf8 text
Uninitialized Files, File records or fields in a File record or File
usually contain a string of 0xff bytes. This becomes a problem when the
content is normally encoded/decoded as utf8 since by the construct
parser. The parser will throw an expection when it tries to decode the
0xff string as utf8. This is especially a serious problem in pySim-trace
where an execption stops the parser.
Let's fix this by interpreting a string of 0xff as an empty string.
Related: OS#6094
Change-Id: Id114096ccb8b7ff8fcc91e1ef3002526afa09cb7
diff --git a/pySim/construct.py b/pySim/construct.py
index ab44a63..af96b49 100644
--- a/pySim/construct.py
+++ b/pySim/construct.py
@@ -6,6 +6,7 @@
from construct.lib import integertypes
from pySim.utils import b2h, h2b, swap_nibbles
import gsm0338
+import codecs
"""Utility code related to the integration of the 'construct' declarative parser."""
@@ -34,6 +35,18 @@
def _encode(self, obj, context, path):
return h2b(obj)
+class Utf8Adapter(Adapter):
+ """convert a bytes() type that contains utf8 encoded text to human readable text."""
+
+ def _decode(self, obj, context, path):
+ # In case the string contains only 0xff bytes we interpret it as an empty string
+ if obj == b'\xff' * len(obj):
+ return ""
+ return codecs.decode(obj, "utf-8")
+
+ def _encode(self, obj, context, path):
+ return codecs.encode(obj, "utf-8")
+
class BcdAdapter(Adapter):
"""convert a bytes() type to a string of BCD nibbles."""