| Index: courgette/description.md
|
| diff --git a/courgette/description.md b/courgette/description.md
|
| new file mode 100644
|
| index 0000000000000000000000000000000000000000..f79f99e6e65eea83dd2fb2ec4bea1d16a9763ff6
|
| --- /dev/null
|
| +++ b/courgette/description.md
|
| @@ -0,0 +1,157 @@
|
| +Courgette Internals
|
| +===================
|
| +
|
| +Patch Generation
|
| +----------------
|
| +
|
| +
|
| +
|
| +- courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch
|
| + generation by calling ensemble\_create.cc:GenerateEnsemblePatch
|
| +
|
| +- The files are read in by in courgette:SourceStream objects
|
| +
|
| +- ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which
|
| + uses MakeGenerator to create
|
| + patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes.
|
| +
|
| +- PatchGeneratorX86\_32's Transform method transforms the input file
|
| + using Courgette's core techniques that make the bsdiff delta
|
| + smaller. The steps it takes are the following:
|
| +
|
| + - _disassemble_ the old and new binaries into AssemblyProgram
|
| + objects,
|
| +
|
| + - _adjust_ the new AssemblyProgram object, and
|
| +
|
| + - _encode_ the AssemblyProgram object back into raw bytes.
|
| +
|
| +### Disassemble
|
| +
|
| +- The input is a pointer to a buffer containing the raw bytes of the
|
| + input file.
|
| +
|
| +- Disassembly converts certain machine instructions that reference
|
| + addresses to Courgette instructions. It is not actually
|
| + disassembly, but this is the term the code-base uses. Specifically,
|
| + it detects instructions that use absolute addresses given by the
|
| + binary file's relocation table, and relative addresses used in
|
| + relative branches.
|
| +
|
| +- Done by disassemble:ParseDetectedExecutable, which selects the
|
| + appropriate Disassembler subclass by looking at the binary file's
|
| + headers.
|
| +
|
| + - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler
|
| +
|
| + - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler
|
| +
|
| + - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler
|
| +
|
| +- The Disassembler replaces the relocation table with a Courgette
|
| + instruction that can regenerate the relocation table.
|
| +
|
| +- The Disassembler builds a list of addresses referenced by the
|
| + machine code, numbering each one.
|
| +
|
| +- The Disassembler replaces and address used in machine instructions
|
| + with its index number.
|
| +
|
| +- The output is an assembly\_program.h:AssemblyProgram class, which
|
| + contains a list of instructions, machine or Courgette, and a mapping
|
| + of indices to actual addresses.
|
| +
|
| +### Adjust
|
| +
|
| +- This step takes the AssemblyProgram for the old file and reassigns
|
| + the indices that map to actual addresses. It is performed by
|
| + adjustment_method.cc:Adjust().
|
| +
|
| +- The goal is the match the indices from the old program to the new
|
| + program as closely as possible.
|
| +
|
| +- When matched correctly, machine instructions that jump to the
|
| + function in both the new and old binary will look the same to
|
| + bsdiff, even the function is located in a different part of the
|
| + binary.
|
| +
|
| +### Encode
|
| +
|
| +- This step takes an AssemblyProgram object and encodes both the
|
| + instructions and the mapping of indices to addresses as byte
|
| + vectors. This format can be written to a file directly, and is also
|
| + more appropriate for bsdiffing. It is done by
|
| + AssemblyProgram.Encode().
|
| +
|
| +- encoded_program.h:EncodedProgram defines the binary format and a
|
| + WriteTo method that writes to a file.
|
| +
|
| +### bsdiff
|
| +
|
| +- simple_delta.c:GenerateSimpleDelta
|
| +
|
| +Patch Application
|
| +-----------------
|
| +
|
| +
|
| +
|
| +- courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation
|
| + by calling ensemble\_apply.cc:ApplyEnsemblePatch
|
| +
|
| +- ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the
|
| + patch's header, then calls the overloaded version of
|
| + ensemble\_create.cc:ApplyEnsemblePatch.
|
| +
|
| +- The patch is read into an ensemble_apply.cc:EnsemblePatchApplication
|
| + object, which generates a set of patcher_x86_32.h:PatcherX86_32
|
| + objects for the sections in the patch.
|
| +
|
| +- The original file is disassembled and encoded via a call
|
| + EnsemblePatchApplication.TransformUp, which in turn call
|
| + patcher_x86_32.h:PatcherX86_32.Transform.
|
| +
|
| +- The transformed file is then bspatched via
|
| + EnsemblePatchApplication.SubpatchTransformedElements, which calls
|
| + EnsemblePatchApplication.SubpatchStreamSets, which calls
|
| + simple_delta.cc:ApplySimpleDelta, Courgette's built-in
|
| + implementation of bspatch.
|
| +
|
| +- Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
|
| + reverses the encoding and disassembly, on the patched binary data.
|
| + This is done by calling PatcherX86_32.Reform, which in turn calls
|
| + the global function encoded_program.cc:Assemble, which calls
|
| + EncodedProgram.AssembleTo.
|
| +
|
| +
|
| +Glossary
|
| +--------
|
| +
|
| +**Adjust**: Reassign address indices in the new program to match more
|
| + closely those from the old.
|
| +
|
| +**Assembly program**: The output of _disassembly_. Contains a list of
|
| + _Courgette instructions_ and an index of branch target addresses.
|
| +
|
| +**Assemble**: Convert an _assembly program_ back into an object file
|
| + by evaluating the _Courgette instructions_ and leaving the machine
|
| + instructions in place.
|
| +
|
| +**Courgette instruction**: Replaces machine instructions in the
|
| + program. Courgette instructions replace branches with an index to
|
| + the target addresses and replace part of the relocation table.
|
| +
|
| +**Disassembler**: Takes a binary file and produces an _assembly
|
| + program_.
|
| +
|
| +**Encode**: Convert an _assembly program_ into an _encoded program_ by
|
| + serializing its data structures into byte vectors more appropriate
|
| + for storage in a file.
|
| +
|
| +**Encoded Program**: The output of encoding.
|
| +
|
| +**Ensemble**: A Courgette-style patch containing sections for the list
|
| + of branch addresses, the encoded program. It supports patching
|
| + multiple object files at once.
|
| +
|
| +**Opcode**: The number corresponding to either a machine or _Courgette
|
| + instruction_.
|
|
|