diff --git a/swarm/docs/smash/smash.rst b/swarm/docs/smash/smash.rst index ba28f20964..f20bf93b8d 100644 --- a/swarm/docs/smash/smash.rst +++ b/swarm/docs/smash/smash.rst @@ -287,19 +287,19 @@ This type is expected to be used very rarely, since the only way they come about Repeatability and file-level audits ==================================== -In this secion we expose the problem of scalability with repeated audits of fixed sized chunks, then show that the solution lies in finding larger structures than the chunk which are to be audited simultaneously so storage critical audit masks can be reused without comporomising security. Incidentally, this same method offers a systemic and rather intuitive way of auditing documents and document collections (the units that are semantic to the users). We propose an algorithm to recursively generate seeds for the successive chunks of a larger collection and provide a partial secret verification scheme that offers error detection and efficient backtracking to identify missing chunks. +In this secion we expose the problem of scalability that comes with repeated audits of fixed sized chunks. We show that the solution lies in finding larger structures than the chunk which are to be audited directly, essentially auditing many chunks simultaneously. We do this in a way that storage critical audit masks can be reused without comporomising security. Incidentally, this same method offers a systemic and rather intuitive way of auditing documents and document collections (the units that are semantic to the users). We propose an algorithm to recursively generate seeds for the successive chunks of a larger collection and provide a partial secret verification scheme that offers error detection and efficient backtracking to identify missing chunks. This *collection-level recursive audit secret hash* (CRASH) will provide the basis for collective iterative auditing, an efficient automated integrity protection system for the swarm. The problem of scaling audit repeatability with fixed chunks -------------------------------------------------------------- -The choice of :math:`r` has an impact on the length of the merkle proofs which are needed for MASH-proofs. More importantly, though, since someone needs to remember the masks, this scheme has a fix absolute storage overhead that is independent of the size of the pieces we prove the storage of. Since it is not realistic to require more than 5-10% administrative storage overhead even for very long storage periods, larger :math:`r` values only scale if the same seeds can guard the integrity of larger data. +The choice of the repeatability parameter :math:`r` has an impact on the length of the merkle proofs which are needed for MASH-proofs. More importantly, though, since someone needs to remember the masks, this scheme has a fix absolute storage overhead that is independent of the size of the pieces we prove the storage of. Since it is not realistic to require more than 5-10% administrative storage overhead even for very long storage periods, larger :math:`r` values only scale if the same seeds can guard the integrity of larger data. In particular, take the example of a standard swarm chunk size of 4096 bytes (:math:`m=12`) and assuming standard Keccak 256bit Sha3 hash we have :math:`h=5, d=7`. -Given the MASH-base length of :math:`2^{r+h}`, 128 independent audits incurs a 100% storage overhead. Instead for a chunk :math:`r=0,1,2,3,4` seem realistic choices for :math:`r=0.8,1.6,3.125,6.3,12.5\%` storage overhead, respectively. +Given the MASH-base length of :math:`2^{r+h}`, 128 independent audits incurs a 100% storage overhead. Instead for a chunk :math:`r=0,1,2,3,4` seem realistic choices, yielding a storage overhead of :math:`0.8,1.6,3.125,6.3,12.5\%` respectively. -Ultimately, repeatability order should reflect the TTL (time to live = storage period) of the request, therefore *audit repeatability and fix chunk size cannot scale unless we compensate for the overhead by reusing seeds over several chunks*. +Ultimately, repeatability order should reflect the TTL (time to live = storage period) of the request, therefore *audit repeatability and fixed chunk-size cannot scale unless we compensate for the overhead by reusing seeds over several chunks*. This problem does not occur with Storj since the shards can be sufficiently big, however with swarm, the base unit of contracting is the chunk. The insight here is that we can reuse the same seed over several chunks if and only if we query the integrity of those chunks at the same time. @@ -320,9 +320,9 @@ First we define a strict ordering on all chunks in a document collection as foll 2. Let :math:`\Pi(M) \subseteq \Dom(M)` be the set of unique paths in the manifest such that if several paths point to the same document take the first one in the order. .. math:: - \pi \in \Pi(M) \defequiv \nexists \pi\prime \text{\ such that}\ M(\pi) = M(\pi\prime) \text{\ and\ } \pi\prime < \pi + \pi \in \Pi(M) \defequiv \pi\in\Dom(M) \textnormal{ and }\nexists \pi^{\prime} \text{\ such that}\ M(\pi) = M(\pi^{\prime}) \text{\ and\ } \pi^{\prime} < \pi -3. This defines a unique set of documents and a strict ordering over documents. +This defines a unique set of documents and a strict ordering over documents. For each document, take the chunk tree of a document as defined by the swarm hash chunker. See :numref:`figure %s `. @@ -330,8 +330,8 @@ For each document, take the chunk tree of a document as defined by the swarm has 1. Let :math:`\triangle(\node)` be the set of all nodes in the subtree encoded in :math:`\node`. Now define a strict ordering of nodes in the chunk tree for document :math:`\doc`. .. math:: - \node <_\doc \node\prime \defequiv \begin{cases} - \node \in \triangle(\node\prime), & \text{or}\\ + \node <_\doc \node^{\prime} \defequiv \begin{cases} + \node \in \triangle(\node^{\prime}), & \text{or}\\ \exists \node_t\ \text{such that}\ \exists \node_n, \node_m, i, j, \text{and}\ \node_t \ \text{\ such that}\\ \ \Hash(\node_n) = \Segment{\node_t}{i}\text{\ and}\\ @@ -339,20 +339,20 @@ For each document, take the chunk tree of a document as defined by the swarm has \ i < j \end{cases} -2. Combine this ordering of nodes and the ordering of uniq paths in the manifest, extend the ordering of nodes over the entire document collection as follows: +2. Combine this ordering of nodes and the ordering of unique paths in the manifest, extend the ordering of nodes over the entire document collection as follows: .. math:: - \node <_M \node\prime \defequiv \begin{cases} - \node <_\doc \node\prime, & \text{if}\ \exists \doc\text{\ such that}\ \node, \node\prime \in \triangle(\doc) \text{or}\\ - \doc <_M \doc\prime, & \text{if}\ \exists \doc, \doc\prime\text{\ such that}\ \node \in \triangle(\doc)\text{ and\ } \node\prime \in \triangle(\doc\prime) + \node <_M \node^{\prime} \defequiv \begin{cases} + \node <_\doc \node^{\prime}, & \text{if}\ \exists \doc\text{\ such that}\ \node, \node^{\prime} \in \triangle(\doc) \text{or}\\ + \doc <_M \doc^{\prime}, & \text{if}\ \exists \doc, \doc\prime\text{\ such that}\ \node \in \triangle(\doc)\text{ and\ } \node^{\prime} \in \triangle(\doc^{\prime}) \end{cases} 3. Now define the set of unique nodes :math:`\Complement(M)` of the document collection. .. math:: - \node\in \Complement(M) \defequiv \nexists \node\prime \text{\ such that}\ - \SwarmHash(\node) = \SwarmHash(\node\prime) \text{\ and\ } \node\prime <_M \node + \node\in \Complement(M) \defequiv \nexists \node^{\prime} \text{\ such that}\ + \SwarmHash(\node) = \SwarmHash(\node^{\prime}) \text{\ and\ } \node^{\prime} <_M \node .. _fig:swarmhash:: @@ -364,14 +364,14 @@ For each document, take the chunk tree of a document as defined by the swarm has The resulting ordered set of chunks will be used to define the collection-level recursive audit secret hash. -1. Let :math:`M` be the manifest of a document collection and :math:`\Complement(M) = \{\chunk_0, \chunk_1, \dots\chunk_n\}` be the set of unique chunks such that :math:`\chunk_i<\chunk_j` for all :math:`0<=i