Chromium Code Reviews| OLD | NEW |
|---|---|
| (Empty) | |
| 1 <h1>Courgette Internals</h1> | |
| 2 | |
| 3 <h2>Patch Generation</h2> | |
| 4 | |
| 5 <p><img src="generation.png" alt="Patch Generation" title="" /></p> | |
| 6 | |
| 7 <ul> | |
| 8 <li><p>courgette_tool.cc:GenerateEnsemblePatch kicks off the patch | |
| 9 generation by calling ensemble_create.cc:GenerateEnsemblePatch</p></li> | |
| 10 <li><p>The files are read in by in courgette:SourceStream objects</p></li> | |
| 11 <li><p>ensemble_create.cc:GenerateEnsemblePatch uses FindGenerators, which | |
| 12 uses MakeGenerator to create | |
| 13 patch_generator_x86_32.h:PatchGeneratorX86_32 classes.</p></li> | |
| 14 <li><p>PatchGeneratorX86_32's Transform method transforms the input file | |
| 15 using Courgette's core techniques that make the bsdiff delta | |
| 16 smaller. The steps it takes are the following:</p> | |
| 17 | |
| 18 <ul> | |
| 19 <li><p><em>disassemble</em> the old and new binaries into AssemblyProgram | |
| 20 objects,</p></li> | |
| 21 <li><p><em>adjust</em> the new AssemblyProgram object, and</p></li> | |
| 22 <li><p><em>encode</em> the AssemblyProgram object back into raw bytes.</p></li> | |
| 23 </ul></li> | |
| 24 </ul> | |
| 25 | |
| 26 <h3>Disassemble</h3> | |
| 27 | |
| 28 <ul> | |
| 29 <li><p>The input is a pointer to a buffer containing the raw bytes of the | |
| 30 input file.</p></li> | |
| 31 <li><p>Disassembly converts certain machine instructions that reference | |
| 32 addresses to Courgette instructions. It is not actually | |
| 33 disassembly, but this is the term the code-base uses. Specifically, | |
| 34 it detects instructions that use absolute addresses given by the | |
| 35 binary file's relocation table, and relative addresses used in | |
| 36 relative branches.</p></li> | |
| 37 <li><p>Done by disassemble:ParseDetectedExecutable, which selects the | |
| 38 appropriate Disassembler subclass by looking at the binary file's | |
| 39 headers.</p> | |
| 40 | |
| 41 <ul> | |
| 42 <li><p>disassembler_win32_x86.h defines the PE/COFF x86 disassembler</p></li> | |
| 43 <li><p>disassembler_elf_32_x86.h defines the ELF 32-bit x86 disassembler</p></li > | |
| 44 <li><p>disassembler_elf_32_arm.h defines the ELF 32-bit arm disassembler</p></li > | |
| 45 </ul></li> | |
| 46 <li><p>The Disassembler replaces the relocation table with a Courgette | |
| 47 instruction that can regenerate the relocation table.</p></li> | |
| 48 <li><p>The Disassembler builds a list of addresses referenced by the | |
| 49 machine code, numbering each one.</p></li> | |
| 50 <li><p>The Disassembler replaces and address used in machine instructions | |
| 51 with its index number.</p></li> | |
| 52 <li><p>The output is an assembly_program.h:AssemblyProgram class, which | |
| 53 contains a list of instructions, machine or Courgette, and a mapping | |
| 54 of indices to actual addresses.</p></li> | |
| 55 </ul> | |
| 56 | |
| 57 <h3>Adjust</h3> | |
| 58 | |
| 59 <ul> | |
| 60 <li><p>This step takes the AssemblyProgram for the old file and reassigns | |
| 61 the indices that map to actual addresses. It is performed by | |
| 62 adjustment_method.cc:Adjust().</p></li> | |
| 63 <li><p>The goal is the match the indices from the old program to the new | |
| 64 program as closely as possible.</p></li> | |
| 65 <li><p>When matched correctly, machine instructions that jump to the | |
| 66 function in both the new and old binary will look the same to | |
| 67 bsdiff, even the function is located in a different part of the | |
| 68 binary.</p></li> | |
| 69 </ul> | |
| 70 | |
| 71 <h3>Encode</h3> | |
| 72 | |
| 73 <ul> | |
| 74 <li><p>This step takes an AssemblyProgram object and encodes both the | |
| 75 instructions and the mapping of indices to addresses as byte | |
| 76 vectors. This format can be written to a file directly, and is also | |
| 77 more appropriate for bsdiffing. It is done by | |
| 78 AssemblyProgram.Encode().</p></li> | |
| 79 <li><p>encoded_program.h:EncodedProgram defines the binary format and a | |
| 80 WriteTo method that writes to a file.</p></li> | |
| 81 </ul> | |
| 82 | |
| 83 <h3>bsdiff</h3> | |
| 84 | |
| 85 <ul> | |
| 86 <li>simple_delta.c:GenerateSimpleDelta</li> | |
| 87 </ul> | |
| 88 | |
| 89 <h2>Patch Application</h2> | |
| 90 | |
| 91 <p><img src="application.png" alt="Patch Application" title="" /></p> | |
| 92 | |
| 93 <ul> | |
| 94 <li><p>courgette_tool.cc:ApplyEnsemblePatch kicks off the patch generation | |
| 95 by calling ensemble_apply.cc:ApplyEnsemblePatch</p></li> | |
| 96 <li><p>ensemble_create.cc:ApplyEnsemblePatch, reads and verifies the | |
| 97 patch's header, then calls the overloaded version of | |
| 98 ensemble_create.cc:ApplyEnsemblePatch.</p></li> | |
| 99 <li><p>The patch is read into an ensemble<em>apply.cc:EnsemblePatchApplication | |
| 100 object, which generates a set of patcher</em>x86<em>32.h:PatcherX86</em>32 | |
| 101 objects for the sections in the patch.</p></li> | |
| 102 <li><p>The original file is disassembled and encoded via a call | |
| 103 EnsemblePatchApplication.TransformUp, which in turn call | |
| 104 patcher<em>x86</em>32.h:PatcherX86_32.Transform.</p></li> | |
| 105 <li><p>The transformed file is then bspatched via | |
| 106 EnsemblePatchApplication.SubpatchTransformedElements, which calls | |
| 107 EnsemblePatchApplication.SubpatchStreamSets, which calls | |
| 108 simple_delta.cc:ApplySimpleDelta, Courgette's built-in | |
| 109 implementation of bspatch.</p></li> | |
| 110 <li><p>Finally, EnsemblePatchApplication.TransformDown assembles, i.e., | |
| 111 reverses the encoding and disassembly, on the patched binary data. | |
| 112 This is done by calling PatcherX86<em>32.Reform, which in turn calls | |
| 113 the global function encoded</em>program.cc:Assemble, which calls | |
| 114 EncodedProgram.AssembleTo.</p></li> | |
| 115 </ul> | |
| 116 | |
| 117 <h2>Glossary</h2> | |
| 118 | |
| 119 <p><strong>Adjust</strong>: Reassign address indices in the new program to match more | |
|
Ben Chan
2013/08/20 16:50:40
not sure if we care about the 80 char limit here.
dgarrett
2013/08/21 19:03:46
I'm guessing not, especially if the file is genera
| |
| 120 closely those from the old.</p> | |
| 121 | |
| 122 <p><strong>Assembly program</strong>: The output of <em>disassembly</em>. Conta ins a list of | |
| 123 <em>Courgette instructions</em> and an index of branch target addresses.</p> | |
| 124 | |
| 125 <p><strong>Assemble</strong>: Convert an <em>assembly program</em> back into an object file | |
| 126 by evaluating the <em>Courgette instructions</em> and leaving the machine | |
| 127 instructions in place.</p> | |
| 128 | |
| 129 <p><strong>Courgette instruction</strong>: Replaces machine instructions in the | |
| 130 program. Courgette instructions replace branches with an index to | |
| 131 the target addresses and replace part of the relocation table.</p> | |
| 132 | |
| 133 <p><strong>Disassembler</strong>: Takes a binary file and produces an <em>assemb ly | |
| 134 program</em>.</p> | |
| 135 | |
| 136 <p><strong>Encode</strong>: Convert an <em>assembly program</em> into an <em>enc oded program</em> by | |
| 137 serializing its data structures into byte vectors more appropriate | |
| 138 for storage in a file.</p> | |
| 139 | |
| 140 <p><strong>Encoded Program</strong>: The output of encoding.</p> | |
| 141 | |
| 142 <p><strong>Ensemble</strong>: A Courgette-style patch containing sections for th e list | |
| 143 of branch addresses, the encoded program. It supports patching | |
| 144 multiple object files at once.</p> | |
| 145 | |
| 146 <p><strong>Opcode</strong>: The number corresponding to either a machine or <em> Courgette | |
| 147 instruction</em>.</p> | |
| OLD | NEW |