| OLD | NEW |
| (Empty) |
| 1 <html> | |
| 2 <title> | |
| 3 PyASN1 codecs | |
| 4 </title> | |
| 5 <head> | |
| 6 </head> | |
| 7 <body> | |
| 8 <center> | |
| 9 <table width=60%> | |
| 10 <tr> | |
| 11 <td> | |
| 12 <h3> | |
| 13 2. PyASN1 Codecs | |
| 14 </h3> | |
| 15 | |
| 16 <p> | |
| 17 In ASN.1 context, | |
| 18 <a href=http://en.wikipedia.org/wiki/Codec>codec</a> | |
| 19 is a program that transforms between concrete data structures and a stream | |
| 20 of octets, suitable for transmission over the wire. This serialized form of | |
| 21 data is sometimes called <i>substrate</i> or <i>essence</i>. | |
| 22 </p> | |
| 23 | |
| 24 <p> | |
| 25 In pyasn1 implementation, substrate takes shape of Python 3 bytes or | |
| 26 Python 2 string objects. | |
| 27 </p> | |
| 28 | |
| 29 <p> | |
| 30 One of the properties of a codec is its ability to cope with incomplete | |
| 31 data and/or substrate what implies codec to be stateful. In other words, | |
| 32 when decoder runs out of substrate and data item being recovered is still | |
| 33 incomplete, stateful codec would suspend and complete data item recovery | |
| 34 whenever the rest of substrate becomes available. Similarly, stateful encoder | |
| 35 would encode data items in multiple steps waiting for source data to | |
| 36 arrive. Codec restartability is especially important when application deals | |
| 37 with large volumes of data and/or runs on low RAM. For an interesting | |
| 38 discussion on codecs options and design choices, refer to | |
| 39 <a href=http://directory.apache.org/subprojects/asn1/>Apache ASN.1 project</a> | |
| 40 . | |
| 41 </p> | |
| 42 | |
| 43 <p> | |
| 44 As of this writing, codecs implemented in pyasn1 are all stateless, mostly | |
| 45 to keep the code simple. | |
| 46 </p> | |
| 47 | |
| 48 <p> | |
| 49 The pyasn1 package currently supports | |
| 50 <a href=http://en.wikipedia.org/wiki/Basic_encoding_rules>BER</a> codec and | |
| 51 its variations -- | |
| 52 <a href=http://en.wikipedia.org/wiki/Canonical_encoding_rules>CER</a> and | |
| 53 <a href=http://en.wikipedia.org/wiki/Distinguished_encoding_rules>DER</a>. | |
| 54 More ASN.1 codecs are planned for implementation in the future. | |
| 55 </p> | |
| 56 | |
| 57 <a name="2.1"></a> | |
| 58 <h4> | |
| 59 2.1 Encoders | |
| 60 </h4> | |
| 61 | |
| 62 <p> | |
| 63 Encoder is used for transforming pyasn1 value objects into substrate. Only | |
| 64 pyasn1 value objects could be serialized, attempts to process pyasn1 type | |
| 65 objects will cause encoder failure. | |
| 66 </p> | |
| 67 | |
| 68 <p> | |
| 69 The following code will create a pyasn1 Integer object and serialize it with | |
| 70 BER encoder: | |
| 71 </p> | |
| 72 | |
| 73 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 74 <pre> | |
| 75 >>> from pyasn1.type import univ | |
| 76 >>> from pyasn1.codec.ber import encoder | |
| 77 >>> encoder.encode(univ.Integer(123456)) | |
| 78 b'\x02\x03\x01\xe2@' | |
| 79 >>> | |
| 80 </pre> | |
| 81 </td></tr></table> | |
| 82 | |
| 83 <p> | |
| 84 BER standard also defines a so-called <i>indefinite length</i> encoding form | |
| 85 which makes large data items processing more memory efficient. It is mostly | |
| 86 useful when encoder does not have the whole value all at once and the | |
| 87 length of the value can not be determined at the beginning of encoding. | |
| 88 </p> | |
| 89 | |
| 90 <p> | |
| 91 <i>Constructed encoding</i> is another feature of BER closely related to the | |
| 92 indefinite length form. In essence, a large scalar value (such as ASN.1 | |
| 93 character BitString type) could be chopped into smaller chunks by encoder | |
| 94 and transmitted incrementally to limit memory consumption. Unlike indefinite | |
| 95 length case, the length of the whole value must be known in advance when | |
| 96 using constructed, definite length encoding form. | |
| 97 </p> | |
| 98 | |
| 99 <p> | |
| 100 Since pyasn1 codecs are not restartable, pyasn1 encoder may only encode data | |
| 101 item all at once. However, even in this case, generating indefinite length | |
| 102 encoding may help a low-memory receiver, running a restartable decoder, | |
| 103 to process a large data item. | |
| 104 </p> | |
| 105 | |
| 106 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 107 <pre> | |
| 108 >>> from pyasn1.type import univ | |
| 109 >>> from pyasn1.codec.ber import encoder | |
| 110 >>> encoder.encode( | |
| 111 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), | |
| 112 ... defMode=False, | |
| 113 ... maxChunkSize=8 | |
| 114 ... ) | |
| 115 b'$\x80\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ | |
| 116 t\x04\x08he lazy \x04\x03dog\x00\x00' | |
| 117 >>> | |
| 118 >>> encoder.encode( | |
| 119 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), | |
| 120 ... maxChunkSize=8 | |
| 121 ... ) | |
| 122 b'$7\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ | |
| 123 t\x04\x08he lazy \x04\x03dog' | |
| 124 </pre> | |
| 125 </td></tr></table> | |
| 126 | |
| 127 <p> | |
| 128 The <b>defMode</b> encoder parameter disables definite length encoding mode, | |
| 129 while the optional <b>maxChunkSize</b> parameter specifies desired | |
| 130 substrate chunk size that influences memory requirements at the decoder's end. | |
| 131 </p> | |
| 132 | |
| 133 <p> | |
| 134 To use CER or DER encoders one needs to explicitly import and call them - the | |
| 135 APIs are all compatible. | |
| 136 </p> | |
| 137 | |
| 138 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 139 <pre> | |
| 140 >>> from pyasn1.type import univ | |
| 141 >>> from pyasn1.codec.ber import encoder as ber_encoder | |
| 142 >>> from pyasn1.codec.cer import encoder as cer_encoder | |
| 143 >>> from pyasn1.codec.der import encoder as der_encoder | |
| 144 >>> ber_encoder.encode(univ.Boolean(True)) | |
| 145 b'\x01\x01\x01' | |
| 146 >>> cer_encoder.encode(univ.Boolean(True)) | |
| 147 b'\x01\x01\xff' | |
| 148 >>> der_encoder.encode(univ.Boolean(True)) | |
| 149 b'\x01\x01\xff' | |
| 150 >>> | |
| 151 </pre> | |
| 152 </td></tr></table> | |
| 153 | |
| 154 <a name="2.2"></a> | |
| 155 <h4> | |
| 156 2.2 Decoders | |
| 157 </h4> | |
| 158 | |
| 159 <p> | |
| 160 In the process of decoding, pyasn1 value objects are created and linked to | |
| 161 each other, based on the information containted in the substrate. Thus, | |
| 162 the original pyasn1 value object(s) are recovered. | |
| 163 </p> | |
| 164 | |
| 165 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 166 <pre> | |
| 167 >>> from pyasn1.type import univ | |
| 168 >>> from pyasn1.codec.ber import encoder, decoder | |
| 169 >>> substrate = encoder.encode(univ.Boolean(True)) | |
| 170 >>> decoder.decode(substrate) | |
| 171 (Boolean('True(1)'), b'') | |
| 172 >>> | |
| 173 </pre> | |
| 174 </td></tr></table> | |
| 175 | |
| 176 <p> | |
| 177 Commenting on the code snippet above, pyasn1 decoder accepts substrate | |
| 178 as an argument and returns a tuple of pyasn1 value object (possibly | |
| 179 a top-level one in case of constructed object) and unprocessed part | |
| 180 of input substrate. | |
| 181 </p> | |
| 182 | |
| 183 <p> | |
| 184 All pyasn1 decoders can handle both definite and indefinite length | |
| 185 encoding modes automatically, explicit switching into one mode | |
| 186 to another is not required. | |
| 187 </p> | |
| 188 | |
| 189 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 190 <pre> | |
| 191 >>> from pyasn1.type import univ | |
| 192 >>> from pyasn1.codec.ber import encoder, decoder | |
| 193 >>> substrate = encoder.encode( | |
| 194 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), | |
| 195 ... defMode=False, | |
| 196 ... maxChunkSize=8 | |
| 197 ... ) | |
| 198 >>> decoder.decode(substrate) | |
| 199 (OctetString(b'The quick brown fox jumps over the lazy dog'), b'') | |
| 200 >>> | |
| 201 </pre> | |
| 202 </td></tr></table> | |
| 203 | |
| 204 <p> | |
| 205 Speaking of BER/CER/DER encoding, in many situations substrate may not contain | |
| 206 all necessary information needed for complete and accurate ASN.1 values | |
| 207 recovery. The most obvious cases include implicitly tagged ASN.1 types | |
| 208 and constrained types. | |
| 209 </p> | |
| 210 | |
| 211 <p> | |
| 212 As discussed earlier in this handbook, when an ASN.1 type is implicitly | |
| 213 tagged, previous outermost tag is lost and never appears in substrate. | |
| 214 If it is the base tag that gets lost, decoder is unable to pick type-specific | |
| 215 value decoder at its table of built-in types, and therefore recover | |
| 216 the value part, based only on the information contained in substrate. The | |
| 217 approach taken by pyasn1 decoder is to use a prototype pyasn1 type object (or | |
| 218 a set of them) to <i>guide</i> the decoding process by matching [possibly | |
| 219 incomplete] tags recovered from substrate with those found in prototype pyasn1 | |
| 220 type objects (also called pyasn1 specification object further in this paper). | |
| 221 </p> | |
| 222 | |
| 223 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 224 <pre> | |
| 225 >>> from pyasn1.codec.ber import decoder | |
| 226 >>> decoder.decode(b'\x02\x01\x0c', asn1Spec=univ.Integer()) | |
| 227 Integer(12), b'' | |
| 228 >>> | |
| 229 </pre> | |
| 230 </td></tr></table> | |
| 231 | |
| 232 <p> | |
| 233 Decoder would neither modify pyasn1 specification object nor use | |
| 234 its current values (if it's a pyasn1 value object), but rather use it as | |
| 235 a hint for choosing proper decoder and as a pattern for creating new objects: | |
| 236 </p> | |
| 237 | |
| 238 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 239 <pre> | |
| 240 >>> from pyasn1.type import univ, tag | |
| 241 >>> from pyasn1.codec.ber import encoder, decoder | |
| 242 >>> i = univ.Integer(12345).subtype( | |
| 243 ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) | |
| 244 ... ) | |
| 245 >>> substrate = encoder.encode(i) | |
| 246 >>> substrate | |
| 247 b'\x9f(\x0209' | |
| 248 >>> decoder.decode(substrate) | |
| 249 Traceback (most recent call last): | |
| 250 ... | |
| 251 pyasn1.error.PyAsn1Error: | |
| 252 TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec | |
| 253 >>> decoder.decode(substrate, asn1Spec=i) | |
| 254 (Integer(12345), b'') | |
| 255 >>> | |
| 256 </pre> | |
| 257 </td></tr></table> | |
| 258 | |
| 259 <p> | |
| 260 Notice in the example above, that an attempt to run decoder without passing | |
| 261 pyasn1 specification object fails because recovered tag does not belong | |
| 262 to any of the built-in types. | |
| 263 </p> | |
| 264 | |
| 265 <p> | |
| 266 Another important feature of guided decoder operation is the use of | |
| 267 values constraints possibly present in pyasn1 specification object. | |
| 268 To explain this, we will decode a random integer object into generic Integer | |
| 269 and the constrained one. | |
| 270 </p> | |
| 271 | |
| 272 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 273 <pre> | |
| 274 >>> from pyasn1.type import univ, constraint | |
| 275 >>> from pyasn1.codec.ber import encoder, decoder | |
| 276 >>> class DialDigit(univ.Integer): | |
| 277 ... subtypeSpec = constraint.ValueRangeConstraint(0,9) | |
| 278 >>> substrate = encoder.encode(univ.Integer(13)) | |
| 279 >>> decoder.decode(substrate) | |
| 280 (Integer(13), b'') | |
| 281 >>> decoder.decode(substrate, asn1Spec=DialDigit()) | |
| 282 Traceback (most recent call last): | |
| 283 ... | |
| 284 pyasn1.type.error.ValueConstraintError: | |
| 285 ValueRangeConstraint(0, 9) failed at: 13 | |
| 286 >>> | |
| 287 </pre> | |
| 288 </td></tr></table> | |
| 289 | |
| 290 <p> | |
| 291 Similarily to encoders, to use CER or DER decoders application has to | |
| 292 explicitly import and call them - all APIs are compatible. | |
| 293 </p> | |
| 294 | |
| 295 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 296 <pre> | |
| 297 >>> from pyasn1.type import univ | |
| 298 >>> from pyasn1.codec.ber import encoder as ber_encoder | |
| 299 >>> substrate = ber_encoder.encode(univ.OctetString('http://pyasn1.sf.net')) | |
| 300 >>> | |
| 301 >>> from pyasn1.codec.ber import decoder as ber_decoder | |
| 302 >>> from pyasn1.codec.cer import decoder as cer_decoder | |
| 303 >>> from pyasn1.codec.der import decoder as der_decoder | |
| 304 >>> | |
| 305 >>> ber_decoder.decode(substrate) | |
| 306 (OctetString(b'http://pyasn1.sf.net'), b'') | |
| 307 >>> cer_decoder.decode(substrate) | |
| 308 (OctetString(b'http://pyasn1.sf.net'), b'') | |
| 309 >>> der_decoder.decode(substrate) | |
| 310 (OctetString(b'http://pyasn1.sf.net'), b'') | |
| 311 >>> | |
| 312 </pre> | |
| 313 </td></tr></table> | |
| 314 | |
| 315 <a name="2.2.1"></a> | |
| 316 <h4> | |
| 317 2.2.1 Decoding untagged types | |
| 318 </h4> | |
| 319 | |
| 320 <p> | |
| 321 It has already been mentioned, that ASN.1 has two "special case" types: | |
| 322 CHOICE and ANY. They are different from other types in part of | |
| 323 tagging - unless these two are additionally tagged, neither of them will | |
| 324 have their own tag. Therefore these types become invisible in substrate | |
| 325 and can not be recovered without passing pyasn1 specification object to | |
| 326 decoder. | |
| 327 </p> | |
| 328 | |
| 329 <p> | |
| 330 To explain the issue, we will first prepare a Choice object to deal with: | |
| 331 </p> | |
| 332 | |
| 333 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 334 <pre> | |
| 335 >>> from pyasn1.type import univ, namedtype | |
| 336 >>> class CodeOrMessage(univ.Choice): | |
| 337 ... componentType = namedtype.NamedTypes( | |
| 338 ... namedtype.NamedType('code', univ.Integer()), | |
| 339 ... namedtype.NamedType('message', univ.OctetString()) | |
| 340 ... ) | |
| 341 >>> | |
| 342 >>> codeOrMessage = CodeOrMessage() | |
| 343 >>> codeOrMessage.setComponentByName('message', 'my string value') | |
| 344 >>> print(codeOrMessage.prettyPrint()) | |
| 345 CodeOrMessage: | |
| 346 message=b'my string value' | |
| 347 >>> | |
| 348 </pre> | |
| 349 </td></tr></table> | |
| 350 | |
| 351 <p> | |
| 352 Let's now encode this Choice object and then decode its substrate | |
| 353 with and without pyasn1 specification object: | |
| 354 </p> | |
| 355 | |
| 356 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 357 <pre> | |
| 358 >>> from pyasn1.codec.ber import encoder, decoder | |
| 359 >>> substrate = encoder.encode(codeOrMessage) | |
| 360 >>> substrate | |
| 361 b'\x04\x0fmy string value' | |
| 362 >>> encoder.encode(univ.OctetString('my string value')) | |
| 363 b'\x04\x0fmy string value' | |
| 364 >>> | |
| 365 >>> decoder.decode(substrate) | |
| 366 (OctetString(b'my string value'), b'') | |
| 367 >>> codeOrMessage, substrate = decoder.decode(substrate, asn1Spec=CodeOrMessage(
)) | |
| 368 >>> print(codeOrMessage.prettyPrint()) | |
| 369 CodeOrMessage: | |
| 370 message=b'my string value' | |
| 371 >>> | |
| 372 </pre> | |
| 373 </td></tr></table> | |
| 374 | |
| 375 <p> | |
| 376 First thing to notice in the listing above is that the substrate produced | |
| 377 for our Choice value object is equivalent to the substrate for an OctetString | |
| 378 object initialized to the same value. In other words, any information about | |
| 379 the Choice component is absent in encoding. | |
| 380 </p> | |
| 381 | |
| 382 <p> | |
| 383 Sure enough, that kind of substrate will decode into an OctetString object, | |
| 384 unless original Choice type object is passed to decoder to guide the decoding | |
| 385 process. | |
| 386 </p> | |
| 387 | |
| 388 <p> | |
| 389 Similarily untagged ANY type behaves differently on decoding phase - when | |
| 390 decoder bumps into an Any object in pyasn1 specification, it stops decoding | |
| 391 and puts all the substrate into a new Any value object in form of an octet | |
| 392 string. Concerned application could then re-run decoder with an additional, | |
| 393 more exact pyasn1 specification object to recover the contents of Any | |
| 394 object. | |
| 395 </p> | |
| 396 | |
| 397 <p> | |
| 398 As it was mentioned elsewhere in this paper, Any type allows for incomplete | |
| 399 or changing ASN.1 specification to be handled gracefully by decoder and | |
| 400 applications. | |
| 401 </p> | |
| 402 | |
| 403 <p> | |
| 404 To illustrate the working of Any type, we'll have to make the stage | |
| 405 by encoding a pyasn1 object and then putting its substrate into an any | |
| 406 object. | |
| 407 </p> | |
| 408 | |
| 409 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 410 <pre> | |
| 411 >>> from pyasn1.type import univ | |
| 412 >>> from pyasn1.codec.ber import encoder, decoder | |
| 413 >>> innerSubstrate = encoder.encode(univ.Integer(1234)) | |
| 414 >>> innerSubstrate | |
| 415 b'\x02\x02\x04\xd2' | |
| 416 >>> any = univ.Any(innerSubstrate) | |
| 417 >>> any | |
| 418 Any(b'\x02\x02\x04\xd2') | |
| 419 >>> substrate = encoder.encode(any) | |
| 420 >>> substrate | |
| 421 b'\x02\x02\x04\xd2' | |
| 422 >>> | |
| 423 </pre> | |
| 424 </td></tr></table> | |
| 425 | |
| 426 <p> | |
| 427 As with Choice type encoding, there is no traces of Any type in substrate. | |
| 428 Obviously, the substrate we are dealing with, will decode into the inner | |
| 429 [Integer] component, unless pyasn1 specification is given to guide the | |
| 430 decoder. Continuing previous code: | |
| 431 </p> | |
| 432 | |
| 433 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 434 <pre> | |
| 435 >>> from pyasn1.type import univ | |
| 436 >>> from pyasn1.codec.ber import encoder, decoder | |
| 437 | |
| 438 >>> decoder.decode(substrate) | |
| 439 (Integer(1234), b'') | |
| 440 >>> any, substrate = decoder.decode(substrate, asn1Spec=univ.Any()) | |
| 441 >>> any | |
| 442 Any(b'\x02\x02\x04\xd2') | |
| 443 >>> decoder.decode(str(any)) | |
| 444 (Integer(1234), b'') | |
| 445 >>> | |
| 446 </pre> | |
| 447 </td></tr></table> | |
| 448 | |
| 449 <p> | |
| 450 Both CHOICE and ANY types are widely used in practice. Reader is welcome to | |
| 451 take a look at | |
| 452 <a href=http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt> | |
| 453 ASN.1 specifications of X.509 applications</a> for more information. | |
| 454 </p> | |
| 455 | |
| 456 <a name="2.2.2"></a> | |
| 457 <h4> | |
| 458 2.2.2 Ignoring unknown types | |
| 459 </h4> | |
| 460 | |
| 461 <p> | |
| 462 When dealing with a loosely specified ASN.1 structure, the receiving | |
| 463 end may not be aware of some types present in the substrate. It may be | |
| 464 convenient then to turn decoder into a recovery mode. Whilst there, decoder | |
| 465 will not bail out when hit an unknown tag but rather treat it as an Any | |
| 466 type. | |
| 467 </p> | |
| 468 | |
| 469 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
| 470 <pre> | |
| 471 >>> from pyasn1.type import univ, tag | |
| 472 >>> from pyasn1.codec.ber import encoder, decoder | |
| 473 >>> taggedInt = univ.Integer(12345).subtype( | |
| 474 ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) | |
| 475 ... ) | |
| 476 >>> substrate = encoder.encode(taggedInt) | |
| 477 >>> decoder.decode(substrate) | |
| 478 Traceback (most recent call last): | |
| 479 ... | |
| 480 pyasn1.error.PyAsn1Error: TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not i
n asn1Spec | |
| 481 >>> | |
| 482 >>> decoder.decode.defaultErrorState = decoder.stDumpRawValue | |
| 483 >>> decoder.decode(substrate) | |
| 484 (Any(b'\x9f(\x0209'), '') | |
| 485 >>> | |
| 486 </pre> | |
| 487 </td></tr></table> | |
| 488 | |
| 489 <p> | |
| 490 It's also possible to configure a custom decoder, to handle unknown tags | |
| 491 found in substrate. This can be done by means of <b>defaultRawDecoder</b> | |
| 492 attribute holding a reference to type decoder object. Refer to the source | |
| 493 for API details. | |
| 494 </p> | |
| 495 | |
| 496 <hr> | |
| 497 | |
| 498 </td> | |
| 499 </tr> | |
| 500 </table> | |
| 501 </center> | |
| 502 </body> | |
| 503 </html> | |
| OLD | NEW |