| OLD | NEW |
| (Empty) | |
| 1 Courgette Internals |
| 2 =================== |
| 3 |
| 4 Patch Generation |
| 5 ---------------- |
| 6 |
| 7  |
| 8 |
| 9 - courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch |
| 10 generation by calling ensemble\_create.cc:GenerateEnsemblePatch |
| 11 |
| 12 - The files are read in by in courgette:SourceStream objects |
| 13 |
| 14 - ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which |
| 15 uses MakeGenerator to create |
| 16 patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes. |
| 17 |
| 18 - PatchGeneratorX86\_32's Transform method transforms the input file |
| 19 using Courgette's core techniques that make the bsdiff delta |
| 20 smaller. The steps it takes are the following: |
| 21 |
| 22 - _disassemble_ the old and new binaries into AssemblyProgram |
| 23 objects, |
| 24 |
| 25 - _adjust_ the new AssemblyProgram object, and |
| 26 |
| 27 - _encode_ the AssemblyProgram object back into raw bytes. |
| 28 |
| 29 ### Disassemble |
| 30 |
| 31 - The input is a pointer to a buffer containing the raw bytes of the |
| 32 input file. |
| 33 |
| 34 - Disassembly converts certain machine instructions that reference |
| 35 addresses to Courgette instructions. It is not actually |
| 36 disassembly, but this is the term the code-base uses. Specifically, |
| 37 it detects instructions that use absolute addresses given by the |
| 38 binary file's relocation table, and relative addresses used in |
| 39 relative branches. |
| 40 |
| 41 - Done by disassemble:ParseDetectedExecutable, which selects the |
| 42 appropriate Disassembler subclass by looking at the binary file's |
| 43 headers. |
| 44 |
| 45 - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler |
| 46 |
| 47 - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler |
| 48 |
| 49 - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler |
| 50 |
| 51 - The Disassembler replaces the relocation table with a Courgette |
| 52 instruction that can regenerate the relocation table. |
| 53 |
| 54 - The Disassembler builds a list of addresses referenced by the |
| 55 machine code, numbering each one. |
| 56 |
| 57 - The Disassembler replaces and address used in machine instructions |
| 58 with its index number. |
| 59 |
| 60 - The output is an assembly\_program.h:AssemblyProgram class, which |
| 61 contains a list of instructions, machine or Courgette, and a mapping |
| 62 of indices to actual addresses. |
| 63 |
| 64 ### Adjust |
| 65 |
| 66 - This step takes the AssemblyProgram for the old file and reassigns |
| 67 the indices that map to actual addresses. It is performed by |
| 68 adjustment_method.cc:Adjust(). |
| 69 |
| 70 - The goal is the match the indices from the old program to the new |
| 71 program as closely as possible. |
| 72 |
| 73 - When matched correctly, machine instructions that jump to the |
| 74 function in both the new and old binary will look the same to |
| 75 bsdiff, even the function is located in a different part of the |
| 76 binary. |
| 77 |
| 78 ### Encode |
| 79 |
| 80 - This step takes an AssemblyProgram object and encodes both the |
| 81 instructions and the mapping of indices to addresses as byte |
| 82 vectors. This format can be written to a file directly, and is also |
| 83 more appropriate for bsdiffing. It is done by |
| 84 AssemblyProgram.Encode(). |
| 85 |
| 86 - encoded_program.h:EncodedProgram defines the binary format and a |
| 87 WriteTo method that writes to a file. |
| 88 |
| 89 ### bsdiff |
| 90 |
| 91 - simple_delta.c:GenerateSimpleDelta |
| 92 |
| 93 Patch Application |
| 94 ----------------- |
| 95 |
| 96  |
| 97 |
| 98 - courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation |
| 99 by calling ensemble\_apply.cc:ApplyEnsemblePatch |
| 100 |
| 101 - ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the |
| 102 patch's header, then calls the overloaded version of |
| 103 ensemble\_create.cc:ApplyEnsemblePatch. |
| 104 |
| 105 - The patch is read into an ensemble_apply.cc:EnsemblePatchApplication |
| 106 object, which generates a set of patcher_x86_32.h:PatcherX86_32 |
| 107 objects for the sections in the patch. |
| 108 |
| 109 - The original file is disassembled and encoded via a call |
| 110 EnsemblePatchApplication.TransformUp, which in turn call |
| 111 patcher_x86_32.h:PatcherX86_32.Transform. |
| 112 |
| 113 - The transformed file is then bspatched via |
| 114 EnsemblePatchApplication.SubpatchTransformedElements, which calls |
| 115 EnsemblePatchApplication.SubpatchStreamSets, which calls |
| 116 simple_delta.cc:ApplySimpleDelta, Courgette's built-in |
| 117 implementation of bspatch. |
| 118 |
| 119 - Finally, EnsemblePatchApplication.TransformDown assembles, i.e., |
| 120 reverses the encoding and disassembly, on the patched binary data. |
| 121 This is done by calling PatcherX86_32.Reform, which in turn calls |
| 122 the global function encoded_program.cc:Assemble, which calls |
| 123 EncodedProgram.AssembleTo. |
| 124 |
| 125 |
| 126 Glossary |
| 127 -------- |
| 128 |
| 129 **Adjust**: Reassign address indices in the new program to match more |
| 130 closely those from the old. |
| 131 |
| 132 **Assembly program**: The output of _disassembly_. Contains a list of |
| 133 _Courgette instructions_ and an index of branch target addresses. |
| 134 |
| 135 **Assemble**: Convert an _assembly program_ back into an object file |
| 136 by evaluating the _Courgette instructions_ and leaving the machine |
| 137 instructions in place. |
| 138 |
| 139 **Courgette instruction**: Replaces machine instructions in the |
| 140 program. Courgette instructions replace branches with an index to |
| 141 the target addresses and replace part of the relocation table. |
| 142 |
| 143 **Disassembler**: Takes a binary file and produces an _assembly |
| 144 program_. |
| 145 |
| 146 **Encode**: Convert an _assembly program_ into an _encoded program_ by |
| 147 serializing its data structures into byte vectors more appropriate |
| 148 for storage in a file. |
| 149 |
| 150 **Encoded Program**: The output of encoding. |
| 151 |
| 152 **Ensemble**: A Courgette-style patch containing sections for the list |
| 153 of branch addresses, the encoded program. It supports patching |
| 154 multiple object files at once. |
| 155 |
| 156 **Opcode**: The number corresponding to either a machine or _Courgette |
| 157 instruction_. |
| OLD | NEW |