| OLD | NEW |
| (Empty) | |
| 1 ## vpython - simple and easy VirtualEnv Python |
| 2 |
| 3 `vpython` is a tool, written in Go, which enables the simple and easy invocation |
| 4 of Python code in [VirtualEnv](https://virtualenv.pypa.io/en/stable/) |
| 5 environments. |
| 6 |
| 7 `vpython` is a simple Python bootstrap which (almost) transparently wraps a |
| 8 Python interpreter invocation to run in a tailored VirtualEnv environment. The |
| 9 environment is expressed by a script-specific configuration file. This allows |
| 10 each Python script to trivially express its own package-level dependencies and |
| 11 run in a hermetic world consisting of just those dependencies. |
| 12 |
| 13 When invoking such a script via `vpython`, the tool downloads its dependencies |
| 14 and prepares an immutable VirtualEnv containing them. It then invokes the |
| 15 script, now running in that VirutalEnv, through the preferred Python |
| 16 interpreter. |
| 17 |
| 18 `vpython` does its best not to use hacky mechanisms to achieve this. It uses |
| 19 an unmodified VirtualEnv package, standard setup methods, and local system |
| 20 resources. The result is transparent canonical VirtualEnv environment |
| 21 bootstrapping. `vpython` is also safe for concurrent invocation, using safe |
| 22 filesystem-level locking to perform any enviornment setup and management. |
| 23 |
| 24 `vpython` itself is very fast. The wheel downloads and VirtualEnvs may also be |
| 25 cached and re-used, optimally limiting the runtime overhead of `vpython` to just |
| 26 one initial setup per unique environment. |
| 27 |
| 28 ### Setup and Invocation |
| 29 |
| 30 For the standard case, employing `vpython` is as simple as: |
| 31 |
| 32 First, create and upload Python wheels for all of the packages that you will |
| 33 need. This is done in an implementation-specific way (e.g., upload wheels as |
| 34 packages to CIPD). |
| 35 |
| 36 Once the packages are available: |
| 37 |
| 38 * Add `vpython` to `PATH`. |
| 39 * Write an enviornment specification naming packages. |
| 40 * Change tool invocation from `python` to `vpython`. |
| 41 |
| 42 Using `vpython` offers several benefits to direct Python invocation, especially |
| 43 when vendoring packages. Notably, with `vpython`: |
| 44 |
| 45 * It is trivially enables hermetic Python everywhere. |
| 46 * No `sys.path` manipulation is needed to load vendored or imported packages. |
| 47 * Any tool can define which package(s) it needs without requiring coordination |
| 48 or cooperation from other tools. (Note that the package must be made available |
| 49 for download first). |
| 50 * Adding new Python dependencies to a project is non-invasive and immediate. |
| 51 * Package downloading and deployment are baked into `vpython` and built on |
| 52 fast and secure Google Cloud Platform technologies. |
| 53 * No more custom bootstraps. Several projects and tools, including multiple |
| 54 within the infra code base, have bootstrap scripts that vendor packages or |
| 55 mimic a VirtualEnv. These are at best repetitive and, at worst, buggy and |
| 56 insecure. |
| 57 * Depenencies are explicitly stated, not assumed. |
| 58 |
| 59 ### Why VirtualEnv? |
| 60 |
| 61 VirtualEnv offers several benefits over system Python. Primarily, it is the |
| 62 |
| 63 By using the same environemnt everywhere, Python invocations become |
| 64 reproducible. A tool run on a developer's system will load the same versions |
| 65 of the same libraries as it will on a production system. A production system |
| 66 will no longer fail because it is missing a package, or because it has the |
| 67 wrong version. |
| 68 |
| 69 A direct mechanism for vendoring, `sys.path` manipulation, is nuanced, buggy, |
| 70 and unsupported by the Python community. It is difficult to get right on all |
| 71 platforms in all environments for all packages. A notorious example of this is |
| 72 `protobuf` and other domain-bound packages, which actively fight `sys.path` |
| 73 inclusion. Using VirtualEnv means that any compliant Python package can |
| 74 trivially be included into a project. |
| 75 |
| 76 ### Why CIPD? |
| 77 |
| 78 [CIPD](https://github.com/luci/luci-go/tree/master/cipd) is a cross-platform |
| 79 service and associated tooling and packages used to securely fetch and deploy |
| 80 immutable "packages" (~= zip files) into the local file system. Unlike "package |
| 81 managers" it avoids platform-specific assumptions, executable "hooks", or the |
| 82 complexities of dependency resolution. `vpython` uses this as a mechanism for |
| 83 housing and deploying wheels. |
| 84 |
| 85 infrastructure package deployment system. It is simple, accessible, fast, and |
| 86 backed by resilient systems such as Google Storage and AppEngine. |
| 87 |
| 88 Unlike `pip`, a CIPD package is defined by its content, enabling precise package |
| 89 matching instead of fuzzy version matching (e.g., `numpy >= 1.2`, and |
| 90 `numpy == 1.2` both can match multiple `numpy` packages in `pip`). |
| 91 |
| 92 CIPD also supports ACLs, enabling privileged Python projects to easily vendor |
| 93 sensitive packages. |
| 94 |
| 95 ### Why wheels? |
| 96 |
| 97 A Python [wheel](https://www.python.org/dev/peps/pep-0427/) is a simple binary |
| 98 distrubition of Python code. A wheel can be generic (pure Python) or system- |
| 99 and architecture-bound (e.g., 64-bit Mac OSX). |
| 100 |
| 101 Wheels are prefered over eggs because they come packaged with compiled binaries. |
| 102 This makes their deployment simple (unpack via `pip`) and reduces system |
| 103 requirements and variation, since local compilation is not needed. |
| 104 |
| 105 The increased management burden of maintaining separate wheels for the same |
| 106 package, one for each architecture, is handled naturally by CIPD, removing the |
| 107 only real pain point. |
| 108 |
| 109 ## Wheel Guidance |
| 110 |
| 111 This section contains recommendations for building or uploading wheel CIPD |
| 112 packages, including platform-specific guidance. |
| 113 |
| 114 CIPD wheel packages are CIPD packages that contain Python wheels. A given CIPD |
| 115 package can contain multiple wheels for multiple platforms, but should only |
| 116 contain one version of any given package for any given architecture/platform. |
| 117 |
| 118 For example, you can bundle a Windows, Linux, and Mac OSX version of `numpy` and |
| 119 `coverage` in the same CIPD package, but you should not bundle `numpy==1.11` and |
| 120 `numpy==1.12` in the same package. |
| 121 |
| 122 The reason for this is that `vpython` identifies which wheels to install by |
| 123 scanning the contents of the CIPD package, and if multiple versions appear, |
| 124 there is no clear guidance about which should be used. |
| 125 |
| 126 ### Mac OSX |
| 127 |
| 128 Use the `m` ABI suffix and the `macosx_...` platform. `vpython` installs wheels |
| 129 with the `--force` flag, so slight binary incompatibilities (e.g., specific OSX |
| 130 versions) can be glossed over. |
| 131 |
| 132 coverage-4.3.4-cp27-cp27m-macosx_10_10_x86_64.whl |
| 133 |
| 134 ### Linux |
| 135 |
| 136 Use wheels with the `mu` ABI suffix and the `manylinux1` platform. For example: |
| 137 |
| 138 coverage-4.3.4-cp27-cp27mu-manylinux1_x86_64.whl |
| 139 |
| 140 ### Windows |
| 141 |
| 142 Use wheels with the `cp27m` or `none` ABI tag. For example: |
| 143 |
| 144 coverage-4.3.4-cp27-cp27m-win_amd64.whl |
| 145 |
| 146 |
| 147 ## Setup and Invocation |
| 148 |
| 149 `vpython` can be invoked by replacing `python` in the command-line with |
| 150 `vpython`. |
| 151 |
| 152 `vpython` works with a default Python environment out of the box. To add |
| 153 vendored packges, you need to define an enviornment specification file that |
| 154 describes which wheels to install. |
| 155 |
| 156 An enviornment specification file is a text protobuf defined as `Spec` |
| 157 [here](./api/env/spec.proto). An example is: |
| 158 |
| 159 ``` |
| 160 # Any 2.7 interpreter will do. |
| 161 python_version: "2.7" |
| 162 |
| 163 # Include "numpy" for the current architecture. |
| 164 wheel { |
| 165 path: "infra/python/wheels/numpy/${platform}-${arch}" |
| 166 version: "version:1.11.0" |
| 167 } |
| 168 |
| 169 # Include "coverage" for the current architecture. |
| 170 wheel { |
| 171 path: "infra/python/wheels/coverage/${platform}-${arch}" |
| 172 version: "version:4.1" |
| 173 } |
| 174 ``` |
| 175 |
| 176 This specification can be supplied in one of three ways: |
| 177 |
| 178 * Explicitly, as a command-line option to `vpython` (`-spec`). |
| 179 * Implicitly, as a file alongside your entry point. For example, if you are |
| 180 running `test_runner.py`, `vpython` will look for `test_runner.py.vpython` |
| 181 next to it and load the environment from there. |
| 182 * Implicitly, inined in your main file. `vpython` will scan the main entry point |
| 183 for sentinel text and, if present, load the specification from that. |
| 184 * Implicitly, through the `VPYTHON_VENV_SPEC_PATH` environment variable. This is |
| 185 set by a `vpython` invocation so that chained invocations default to the same |
| 186 environment. |
| 187 |
| 188 ### Optimization and Caching |
| 189 |
| 190 `vpython` has several levels of caching that it employs to optimize setup and |
| 191 invocation overhead. |
| 192 |
| 193 #### VirtualEnv |
| 194 |
| 195 Once a VirtualEnv specification has been resolved, its resulting pinned |
| 196 specification is hashed and used as a key to that VirtualEnv. Other `vpython` |
| 197 invocations expressing hte same enviornment will naturally re-use that |
| 198 VirtualEnv instead of creating their own. |
| 199 |
| 200 #### Download Caching |
| 201 |
| 202 Download mechanisms (e.g., CIPD) can optionally include a package cache to avoid |
| 203 the overhead of downloading and/or resolving a package multiple times. |
| 204 |
| 205 ### Migration |
| 206 |
| 207 #### Command-line. |
| 208 |
| 209 `vpython` is a natural replacement for `python` in the command line: |
| 210 |
| 211 ```sh |
| 212 python ./foo/bar/baz.py -d --flag value arg arg whatever |
| 213 ``` |
| 214 |
| 215 Becomes: |
| 216 ```sh |
| 217 vpython ./foo/bar/baz.py -d --flag value arg arg whatever |
| 218 ``` |
| 219 |
| 220 The `vpython` tool accepts its own command-line arguments. In this case, use |
| 221 a `--` seprator to differentiate between `vpython` options and `python` options: |
| 222 |
| 223 ```sh |
| 224 vpython -spec /path/to/spec.vpython -- ./foo/bar/baz.py |
| 225 ``` |
| 226 |
| 227 #### Shebang (POSIX) |
| 228 |
| 229 If your script uses implicit specification (file or inline), replacing `python` |
| 230 with `vpython` in your shebang line will automatically work. |
| 231 |
| 232 ```sh |
| 233 #!/usr/bin/env vpython |
| 234 ``` |
| 235 |
| OLD | NEW |