OLD | NEW |
| (Empty) |
1 <html> | |
2 <title> | |
3 PyASN1 codecs | |
4 </title> | |
5 <head> | |
6 </head> | |
7 <body> | |
8 <center> | |
9 <table width=60%> | |
10 <tr> | |
11 <td> | |
12 <h3> | |
13 2. PyASN1 Codecs | |
14 </h3> | |
15 | |
16 <p> | |
17 In ASN.1 context, | |
18 <a href=http://en.wikipedia.org/wiki/Codec>codec</a> | |
19 is a program that transforms between concrete data structures and a stream | |
20 of octets, suitable for transmission over the wire. This serialized form of | |
21 data is sometimes called <i>substrate</i> or <i>essence</i>. | |
22 </p> | |
23 | |
24 <p> | |
25 In pyasn1 implementation, substrate takes shape of Python 3 bytes or | |
26 Python 2 string objects. | |
27 </p> | |
28 | |
29 <p> | |
30 One of the properties of a codec is its ability to cope with incomplete | |
31 data and/or substrate what implies codec to be stateful. In other words, | |
32 when decoder runs out of substrate and data item being recovered is still | |
33 incomplete, stateful codec would suspend and complete data item recovery | |
34 whenever the rest of substrate becomes available. Similarly, stateful encoder | |
35 would encode data items in multiple steps waiting for source data to | |
36 arrive. Codec restartability is especially important when application deals | |
37 with large volumes of data and/or runs on low RAM. For an interesting | |
38 discussion on codecs options and design choices, refer to | |
39 <a href=http://directory.apache.org/subprojects/asn1/>Apache ASN.1 project</a> | |
40 . | |
41 </p> | |
42 | |
43 <p> | |
44 As of this writing, codecs implemented in pyasn1 are all stateless, mostly | |
45 to keep the code simple. | |
46 </p> | |
47 | |
48 <p> | |
49 The pyasn1 package currently supports | |
50 <a href=http://en.wikipedia.org/wiki/Basic_encoding_rules>BER</a> codec and | |
51 its variations -- | |
52 <a href=http://en.wikipedia.org/wiki/Canonical_encoding_rules>CER</a> and | |
53 <a href=http://en.wikipedia.org/wiki/Distinguished_encoding_rules>DER</a>. | |
54 More ASN.1 codecs are planned for implementation in the future. | |
55 </p> | |
56 | |
57 <a name="2.1"></a> | |
58 <h4> | |
59 2.1 Encoders | |
60 </h4> | |
61 | |
62 <p> | |
63 Encoder is used for transforming pyasn1 value objects into substrate. Only | |
64 pyasn1 value objects could be serialized, attempts to process pyasn1 type | |
65 objects will cause encoder failure. | |
66 </p> | |
67 | |
68 <p> | |
69 The following code will create a pyasn1 Integer object and serialize it with | |
70 BER encoder: | |
71 </p> | |
72 | |
73 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
74 <pre> | |
75 >>> from pyasn1.type import univ | |
76 >>> from pyasn1.codec.ber import encoder | |
77 >>> encoder.encode(univ.Integer(123456)) | |
78 b'\x02\x03\x01\xe2@' | |
79 >>> | |
80 </pre> | |
81 </td></tr></table> | |
82 | |
83 <p> | |
84 BER standard also defines a so-called <i>indefinite length</i> encoding form | |
85 which makes large data items processing more memory efficient. It is mostly | |
86 useful when encoder does not have the whole value all at once and the | |
87 length of the value can not be determined at the beginning of encoding. | |
88 </p> | |
89 | |
90 <p> | |
91 <i>Constructed encoding</i> is another feature of BER closely related to the | |
92 indefinite length form. In essence, a large scalar value (such as ASN.1 | |
93 character BitString type) could be chopped into smaller chunks by encoder | |
94 and transmitted incrementally to limit memory consumption. Unlike indefinite | |
95 length case, the length of the whole value must be known in advance when | |
96 using constructed, definite length encoding form. | |
97 </p> | |
98 | |
99 <p> | |
100 Since pyasn1 codecs are not restartable, pyasn1 encoder may only encode data | |
101 item all at once. However, even in this case, generating indefinite length | |
102 encoding may help a low-memory receiver, running a restartable decoder, | |
103 to process a large data item. | |
104 </p> | |
105 | |
106 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
107 <pre> | |
108 >>> from pyasn1.type import univ | |
109 >>> from pyasn1.codec.ber import encoder | |
110 >>> encoder.encode( | |
111 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), | |
112 ... defMode=False, | |
113 ... maxChunkSize=8 | |
114 ... ) | |
115 b'$\x80\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ | |
116 t\x04\x08he lazy \x04\x03dog\x00\x00' | |
117 >>> | |
118 >>> encoder.encode( | |
119 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), | |
120 ... maxChunkSize=8 | |
121 ... ) | |
122 b'$7\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ | |
123 t\x04\x08he lazy \x04\x03dog' | |
124 </pre> | |
125 </td></tr></table> | |
126 | |
127 <p> | |
128 The <b>defMode</b> encoder parameter disables definite length encoding mode, | |
129 while the optional <b>maxChunkSize</b> parameter specifies desired | |
130 substrate chunk size that influences memory requirements at the decoder's end. | |
131 </p> | |
132 | |
133 <p> | |
134 To use CER or DER encoders one needs to explicitly import and call them - the | |
135 APIs are all compatible. | |
136 </p> | |
137 | |
138 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
139 <pre> | |
140 >>> from pyasn1.type import univ | |
141 >>> from pyasn1.codec.ber import encoder as ber_encoder | |
142 >>> from pyasn1.codec.cer import encoder as cer_encoder | |
143 >>> from pyasn1.codec.der import encoder as der_encoder | |
144 >>> ber_encoder.encode(univ.Boolean(True)) | |
145 b'\x01\x01\x01' | |
146 >>> cer_encoder.encode(univ.Boolean(True)) | |
147 b'\x01\x01\xff' | |
148 >>> der_encoder.encode(univ.Boolean(True)) | |
149 b'\x01\x01\xff' | |
150 >>> | |
151 </pre> | |
152 </td></tr></table> | |
153 | |
154 <a name="2.2"></a> | |
155 <h4> | |
156 2.2 Decoders | |
157 </h4> | |
158 | |
159 <p> | |
160 In the process of decoding, pyasn1 value objects are created and linked to | |
161 each other, based on the information containted in the substrate. Thus, | |
162 the original pyasn1 value object(s) are recovered. | |
163 </p> | |
164 | |
165 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
166 <pre> | |
167 >>> from pyasn1.type import univ | |
168 >>> from pyasn1.codec.ber import encoder, decoder | |
169 >>> substrate = encoder.encode(univ.Boolean(True)) | |
170 >>> decoder.decode(substrate) | |
171 (Boolean('True(1)'), b'') | |
172 >>> | |
173 </pre> | |
174 </td></tr></table> | |
175 | |
176 <p> | |
177 Commenting on the code snippet above, pyasn1 decoder accepts substrate | |
178 as an argument and returns a tuple of pyasn1 value object (possibly | |
179 a top-level one in case of constructed object) and unprocessed part | |
180 of input substrate. | |
181 </p> | |
182 | |
183 <p> | |
184 All pyasn1 decoders can handle both definite and indefinite length | |
185 encoding modes automatically, explicit switching into one mode | |
186 to another is not required. | |
187 </p> | |
188 | |
189 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
190 <pre> | |
191 >>> from pyasn1.type import univ | |
192 >>> from pyasn1.codec.ber import encoder, decoder | |
193 >>> substrate = encoder.encode( | |
194 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), | |
195 ... defMode=False, | |
196 ... maxChunkSize=8 | |
197 ... ) | |
198 >>> decoder.decode(substrate) | |
199 (OctetString(b'The quick brown fox jumps over the lazy dog'), b'') | |
200 >>> | |
201 </pre> | |
202 </td></tr></table> | |
203 | |
204 <p> | |
205 Speaking of BER/CER/DER encoding, in many situations substrate may not contain | |
206 all necessary information needed for complete and accurate ASN.1 values | |
207 recovery. The most obvious cases include implicitly tagged ASN.1 types | |
208 and constrained types. | |
209 </p> | |
210 | |
211 <p> | |
212 As discussed earlier in this handbook, when an ASN.1 type is implicitly | |
213 tagged, previous outermost tag is lost and never appears in substrate. | |
214 If it is the base tag that gets lost, decoder is unable to pick type-specific | |
215 value decoder at its table of built-in types, and therefore recover | |
216 the value part, based only on the information contained in substrate. The | |
217 approach taken by pyasn1 decoder is to use a prototype pyasn1 type object (or | |
218 a set of them) to <i>guide</i> the decoding process by matching [possibly | |
219 incomplete] tags recovered from substrate with those found in prototype pyasn1 | |
220 type objects (also called pyasn1 specification object further in this paper). | |
221 </p> | |
222 | |
223 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
224 <pre> | |
225 >>> from pyasn1.codec.ber import decoder | |
226 >>> decoder.decode(b'\x02\x01\x0c', asn1Spec=univ.Integer()) | |
227 Integer(12), b'' | |
228 >>> | |
229 </pre> | |
230 </td></tr></table> | |
231 | |
232 <p> | |
233 Decoder would neither modify pyasn1 specification object nor use | |
234 its current values (if it's a pyasn1 value object), but rather use it as | |
235 a hint for choosing proper decoder and as a pattern for creating new objects: | |
236 </p> | |
237 | |
238 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
239 <pre> | |
240 >>> from pyasn1.type import univ, tag | |
241 >>> from pyasn1.codec.ber import encoder, decoder | |
242 >>> i = univ.Integer(12345).subtype( | |
243 ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) | |
244 ... ) | |
245 >>> substrate = encoder.encode(i) | |
246 >>> substrate | |
247 b'\x9f(\x0209' | |
248 >>> decoder.decode(substrate) | |
249 Traceback (most recent call last): | |
250 ... | |
251 pyasn1.error.PyAsn1Error: | |
252 TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec | |
253 >>> decoder.decode(substrate, asn1Spec=i) | |
254 (Integer(12345), b'') | |
255 >>> | |
256 </pre> | |
257 </td></tr></table> | |
258 | |
259 <p> | |
260 Notice in the example above, that an attempt to run decoder without passing | |
261 pyasn1 specification object fails because recovered tag does not belong | |
262 to any of the built-in types. | |
263 </p> | |
264 | |
265 <p> | |
266 Another important feature of guided decoder operation is the use of | |
267 values constraints possibly present in pyasn1 specification object. | |
268 To explain this, we will decode a random integer object into generic Integer | |
269 and the constrained one. | |
270 </p> | |
271 | |
272 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
273 <pre> | |
274 >>> from pyasn1.type import univ, constraint | |
275 >>> from pyasn1.codec.ber import encoder, decoder | |
276 >>> class DialDigit(univ.Integer): | |
277 ... subtypeSpec = constraint.ValueRangeConstraint(0,9) | |
278 >>> substrate = encoder.encode(univ.Integer(13)) | |
279 >>> decoder.decode(substrate) | |
280 (Integer(13), b'') | |
281 >>> decoder.decode(substrate, asn1Spec=DialDigit()) | |
282 Traceback (most recent call last): | |
283 ... | |
284 pyasn1.type.error.ValueConstraintError: | |
285 ValueRangeConstraint(0, 9) failed at: 13 | |
286 >>> | |
287 </pre> | |
288 </td></tr></table> | |
289 | |
290 <p> | |
291 Similarily to encoders, to use CER or DER decoders application has to | |
292 explicitly import and call them - all APIs are compatible. | |
293 </p> | |
294 | |
295 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
296 <pre> | |
297 >>> from pyasn1.type import univ | |
298 >>> from pyasn1.codec.ber import encoder as ber_encoder | |
299 >>> substrate = ber_encoder.encode(univ.OctetString('http://pyasn1.sf.net')) | |
300 >>> | |
301 >>> from pyasn1.codec.ber import decoder as ber_decoder | |
302 >>> from pyasn1.codec.cer import decoder as cer_decoder | |
303 >>> from pyasn1.codec.der import decoder as der_decoder | |
304 >>> | |
305 >>> ber_decoder.decode(substrate) | |
306 (OctetString(b'http://pyasn1.sf.net'), b'') | |
307 >>> cer_decoder.decode(substrate) | |
308 (OctetString(b'http://pyasn1.sf.net'), b'') | |
309 >>> der_decoder.decode(substrate) | |
310 (OctetString(b'http://pyasn1.sf.net'), b'') | |
311 >>> | |
312 </pre> | |
313 </td></tr></table> | |
314 | |
315 <a name="2.2.1"></a> | |
316 <h4> | |
317 2.2.1 Decoding untagged types | |
318 </h4> | |
319 | |
320 <p> | |
321 It has already been mentioned, that ASN.1 has two "special case" types: | |
322 CHOICE and ANY. They are different from other types in part of | |
323 tagging - unless these two are additionally tagged, neither of them will | |
324 have their own tag. Therefore these types become invisible in substrate | |
325 and can not be recovered without passing pyasn1 specification object to | |
326 decoder. | |
327 </p> | |
328 | |
329 <p> | |
330 To explain the issue, we will first prepare a Choice object to deal with: | |
331 </p> | |
332 | |
333 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
334 <pre> | |
335 >>> from pyasn1.type import univ, namedtype | |
336 >>> class CodeOrMessage(univ.Choice): | |
337 ... componentType = namedtype.NamedTypes( | |
338 ... namedtype.NamedType('code', univ.Integer()), | |
339 ... namedtype.NamedType('message', univ.OctetString()) | |
340 ... ) | |
341 >>> | |
342 >>> codeOrMessage = CodeOrMessage() | |
343 >>> codeOrMessage.setComponentByName('message', 'my string value') | |
344 >>> print(codeOrMessage.prettyPrint()) | |
345 CodeOrMessage: | |
346 message=b'my string value' | |
347 >>> | |
348 </pre> | |
349 </td></tr></table> | |
350 | |
351 <p> | |
352 Let's now encode this Choice object and then decode its substrate | |
353 with and without pyasn1 specification object: | |
354 </p> | |
355 | |
356 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
357 <pre> | |
358 >>> from pyasn1.codec.ber import encoder, decoder | |
359 >>> substrate = encoder.encode(codeOrMessage) | |
360 >>> substrate | |
361 b'\x04\x0fmy string value' | |
362 >>> encoder.encode(univ.OctetString('my string value')) | |
363 b'\x04\x0fmy string value' | |
364 >>> | |
365 >>> decoder.decode(substrate) | |
366 (OctetString(b'my string value'), b'') | |
367 >>> codeOrMessage, substrate = decoder.decode(substrate, asn1Spec=CodeOrMessage(
)) | |
368 >>> print(codeOrMessage.prettyPrint()) | |
369 CodeOrMessage: | |
370 message=b'my string value' | |
371 >>> | |
372 </pre> | |
373 </td></tr></table> | |
374 | |
375 <p> | |
376 First thing to notice in the listing above is that the substrate produced | |
377 for our Choice value object is equivalent to the substrate for an OctetString | |
378 object initialized to the same value. In other words, any information about | |
379 the Choice component is absent in encoding. | |
380 </p> | |
381 | |
382 <p> | |
383 Sure enough, that kind of substrate will decode into an OctetString object, | |
384 unless original Choice type object is passed to decoder to guide the decoding | |
385 process. | |
386 </p> | |
387 | |
388 <p> | |
389 Similarily untagged ANY type behaves differently on decoding phase - when | |
390 decoder bumps into an Any object in pyasn1 specification, it stops decoding | |
391 and puts all the substrate into a new Any value object in form of an octet | |
392 string. Concerned application could then re-run decoder with an additional, | |
393 more exact pyasn1 specification object to recover the contents of Any | |
394 object. | |
395 </p> | |
396 | |
397 <p> | |
398 As it was mentioned elsewhere in this paper, Any type allows for incomplete | |
399 or changing ASN.1 specification to be handled gracefully by decoder and | |
400 applications. | |
401 </p> | |
402 | |
403 <p> | |
404 To illustrate the working of Any type, we'll have to make the stage | |
405 by encoding a pyasn1 object and then putting its substrate into an any | |
406 object. | |
407 </p> | |
408 | |
409 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
410 <pre> | |
411 >>> from pyasn1.type import univ | |
412 >>> from pyasn1.codec.ber import encoder, decoder | |
413 >>> innerSubstrate = encoder.encode(univ.Integer(1234)) | |
414 >>> innerSubstrate | |
415 b'\x02\x02\x04\xd2' | |
416 >>> any = univ.Any(innerSubstrate) | |
417 >>> any | |
418 Any(b'\x02\x02\x04\xd2') | |
419 >>> substrate = encoder.encode(any) | |
420 >>> substrate | |
421 b'\x02\x02\x04\xd2' | |
422 >>> | |
423 </pre> | |
424 </td></tr></table> | |
425 | |
426 <p> | |
427 As with Choice type encoding, there is no traces of Any type in substrate. | |
428 Obviously, the substrate we are dealing with, will decode into the inner | |
429 [Integer] component, unless pyasn1 specification is given to guide the | |
430 decoder. Continuing previous code: | |
431 </p> | |
432 | |
433 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
434 <pre> | |
435 >>> from pyasn1.type import univ | |
436 >>> from pyasn1.codec.ber import encoder, decoder | |
437 | |
438 >>> decoder.decode(substrate) | |
439 (Integer(1234), b'') | |
440 >>> any, substrate = decoder.decode(substrate, asn1Spec=univ.Any()) | |
441 >>> any | |
442 Any(b'\x02\x02\x04\xd2') | |
443 >>> decoder.decode(str(any)) | |
444 (Integer(1234), b'') | |
445 >>> | |
446 </pre> | |
447 </td></tr></table> | |
448 | |
449 <p> | |
450 Both CHOICE and ANY types are widely used in practice. Reader is welcome to | |
451 take a look at | |
452 <a href=http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt> | |
453 ASN.1 specifications of X.509 applications</a> for more information. | |
454 </p> | |
455 | |
456 <a name="2.2.2"></a> | |
457 <h4> | |
458 2.2.2 Ignoring unknown types | |
459 </h4> | |
460 | |
461 <p> | |
462 When dealing with a loosely specified ASN.1 structure, the receiving | |
463 end may not be aware of some types present in the substrate. It may be | |
464 convenient then to turn decoder into a recovery mode. Whilst there, decoder | |
465 will not bail out when hit an unknown tag but rather treat it as an Any | |
466 type. | |
467 </p> | |
468 | |
469 <table bgcolor="lightgray" border=0 width=100%><TR><TD> | |
470 <pre> | |
471 >>> from pyasn1.type import univ, tag | |
472 >>> from pyasn1.codec.ber import encoder, decoder | |
473 >>> taggedInt = univ.Integer(12345).subtype( | |
474 ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) | |
475 ... ) | |
476 >>> substrate = encoder.encode(taggedInt) | |
477 >>> decoder.decode(substrate) | |
478 Traceback (most recent call last): | |
479 ... | |
480 pyasn1.error.PyAsn1Error: TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not i
n asn1Spec | |
481 >>> | |
482 >>> decoder.decode.defaultErrorState = decoder.stDumpRawValue | |
483 >>> decoder.decode(substrate) | |
484 (Any(b'\x9f(\x0209'), '') | |
485 >>> | |
486 </pre> | |
487 </td></tr></table> | |
488 | |
489 <p> | |
490 It's also possible to configure a custom decoder, to handle unknown tags | |
491 found in substrate. This can be done by means of <b>defaultRawDecoder</b> | |
492 attribute holding a reference to type decoder object. Refer to the source | |
493 for API details. | |
494 </p> | |
495 | |
496 <hr> | |
497 | |
498 </td> | |
499 </tr> | |
500 </table> | |
501 </center> | |
502 </body> | |
503 </html> | |
OLD | NEW |