From bbbf843ee5ffcd4233db6b6a121e41cab3cb829d Mon Sep 17 00:00:00 2001 From: Lukas Puehringer Date: Fri, 22 Nov 2019 16:46:42 +0100 Subject: [PATCH] Update "Producing Consistent Snapshots" Following discussions with @dstufft and @trishankatdatadog regarding file uploads and simple index generation on PyPI (see secure-systems-lab/peps#70) this commit once more refines the "producing consistent snapshots" section. It includes the following changes: - Remove the notion of *transaction processes* and instead talk about *uploads*. Background: Transaction processes are only relevant if multiple files of a project release need to be handled in a single transaction, which is not the case on PyPI, where each upload of a distribution file is self-contained. With this change, upload process just place files into a queue, without updating bin-n metadata (as transaction processes would have done in parallel), and all the metadata update/creation work is done by the snapshot process in strictly sequential manner. - Add a paragraph about simple index pages and how their hashes should be included in *bin-n* metadata, and how they need to remain stable if re-generated dynamically. --- pep-0458.txt | 78 +++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 41 deletions(-) diff --git a/pep-0458.txt b/pep-0458.txt index 423f3c770bb..aa117ace76d 100644 --- a/pep-0458.txt +++ b/pep-0458.txt @@ -907,55 +907,51 @@ efficiently transfer consistent snapshots from PyPI. Producing Consistent Snapshots ------------------------------ -When a new project release is uploaded to PyPI, PyPI MUST update the *bin-n* -metadata responsible for the target files of the project release. Remember that -target files are sorted into bins by their filename hashes. Consequentially, -PyPI MUST update *snapshot* to account for the updated *bin-n* metadata, and -*timestamp* to account for the updated *snapshot* metadata. These updates -SHOULD be handled by automated processes, e.g. one or more *transaction -processes* and one *snapshot process*. - -Each transaction process keeps track of a project upload, adds all new target -files to the most recent, relevant *bin-n* metadata and informs the -snapshot process to produce a consistent snapshot. Each project release SHOULD -be handled in an atomic transaction, so that a given consistent snapshot -contains all target files of a project release. However, transaction processes -MAY be parallelized under the following constraints: - -- Pairs of transaction processes MUST NOT concurrently work on the same project. -- Pairs of transaction processes MUST NOT concurrently work on projects that - belong to the same *bin-n* role. - -When a transaction process is finished updating the relevant *bin-n* metadata -it informs the snapshot process to generate a new consistent snapshot. The -snapshot process does so by taking the updated *bin-n* metadata, incrementing -their respective version numbers, signing them with the *bin-n* role key(s), -and writing them to *VERSION_NUMBER.bin-N.json*. - -Similarly, the snapshot process then takes the most recent *snapshot* metadata, -updates its *bin-n* metadata version numbers, increments its own version -number, signs it with the *snapshot* role key, and writes it to -*VERSION_NUMBER.snapshot.json*. +When a new distribution file is uploaded to PyPI, PyPI MUST update the +responsible *bin-n* metadata. Remember that all target files are sorted into +bins by their filename hashes. PyPI MUST also update *snapshot* to account for +the updated *bin-n* metadata, and *timestamp* to account for the updated +*snapshot* metadata. These updates SHOULD be handled by an automated *snapshot +process*. + +File uploads MAY be handled in parallel, however, consistent snapshots MUST be +produced in a strictly sequential manner. Furthermore, as long as distribution +files are self-contained, a consistent snapshot MAY be produced for each +uploaded file. To do so upload processes place new distribution files into a +concurrency-safe FIFO queue and the snapshot process reads from that queue one +file at a time and performs the following tasks: + +First, it adds the new file path to the relevant *bin-n* metadata, increments +its version number, signs it with the *bin-n* role key, and writes it to +*VERSION_NUMBER.bin-N.json*. + +Then, it takes the most recent *snapshot* metadata, updates its *bin-n* +metadata version numbers, increments its own version number, signs it with the +*snapshot* role key, and writes it to *VERSION_NUMBER.snapshot.json*. And finally, the snapshot process takes the most recent *timestamp* metadata, updates its *snapshot* metadata hash and version number, increments its own version number, sets a new expiration time, signs it with the *timestamp* role key, and writes it to *timestamp.json*. -The snapshot process MUST generate consistent snapshots sequentially, reading -the notifications received from the transaction process(es) from a -concurrency-safe FIFO queue. Fortunately, the operation of signing is fast -enough that this may be done a thousand or more times per second. +When updating *bin-n* metadata for a consistent snapshot, the snapshot process +SHOULD also include any new or updated hashes of simple index pages in the +relevant *bin-n* metadata. Note that, simple index pages may be generated +dynamically on API calls, so it is important that their output remains stable +throughout the validity of a consistent snapshot. -If there are multiple files in a release, a project MAY release these files in -separate transactions. For example, a project MAY release files for Windows in -one transaction, and the files for Linux in another transaction. However, a project -SHOULD release files that must belong together in order for everything to work -in the same transaction. +Since the snapshot process MUST generate consistent snapshots in a strictly +sequential manner it constitutes a bottleneck. Fortunately, the operation of +signing is fast enough that this may be done a thousand or more times per +second. -At any rate, PyPI SHOULD use a `transaction log`__ to record project -transaction processes and the snapshot queue for auditing and to recover from -errors after a server failure. +Moreover, PyPI MAY serve distribution files to clients before the corresponding +consistent snapshot metadata is generated. In that case the client software +SHOULD inform the user that full TUF protection is not yet available but will +be shortly. + +PyPI SHOULD use a `transaction log`__ to record upload processes and the +snapshot queue for auditing and to recover from errors after a server failure. __ https://en.wikipedia.org/wiki/Transaction_log