Client-side MRU signatures #784

jpeletier · 2018-07-11T16:30:56Z

Client-side MRU signatures

Note:

This is the final step for reviewing the client-side-MRU PR. This PR was reviewed in 5 steps, and this is the final review prior to merging into develop.

The already completed steps were:

Step 1 : (PR client-side mru signatures -- step 1/5 #732) Minimum set of changes for client-side MRU signatures to work + system-time based MRUs. This was the status roughly 8 days ago.
Step 2: (PR client-side mru signatures -- step 2/5 #733) Documentation, refactors and new, more concrete unit tests for the new structures
Step 3: (PR client-side mru signatures -- step 3/5 #734) Refactor LookupParams so that the different combinations are documented. Moved LookupParams to its own file. Reorder code and create separate files for a few structures
Step 4 (PR client-side mru signatures -- step 4/5 #748): Final refactors. Refactor serialization for extensibility. Refactor Timestamp to a "Timestamp" struct rather than a lone uint64. More comments throughout.
Step 5: (PR Client-side MRU signing -- step 5/5 #704) Rewrite CLI and simplify client and internal object structure according to comments to decouple resource creation from updating.

Abstract

The current implementation of MRU uses the running node's private key to sign all MRUs that are requested to that node. This is a limitation for environments in which a node is shared among different users (for example, a company-wide node) and also does not allow people using public nodes to test MRUs, Swarm's Power of the Dark Side!

This PR moves MRU signatures to the client, so anyone can create and update resources with their private/public key through any node, including support for Web3 browsers such as Mist and plug-ins like Metamask. Indeed, this would allow to update content directly from your browser!

This PR's continues @nolash's work that detaches ENS for authorization. WARNING: merging this PR will automatically merge @nolash's #743 !!

Additionally, this PR adds the following:

Fully removes ENS dependencies, such as having to name resources after domain names. The resource name is now a completely independent convenience user field.
Command-line tool (swarm resource) to manage creating, updating resources and reading their metadata easily
Code clean-up and refactoring. Segregated the metadata from updates in easily upgradeable structures
Refactored chunk serialization/deserialization for extensibility
Strong verification that the user updating a resource indeed owns the private key associated to it via a 2-step hash algorithm. There was a bug in the current implementation that allowed anyone to update your resource!!
Removed blockchain dependency for timing. Update frequencies are now expressed in seconds. This was implemented as an abstraction with a timestampProvider interface, so really any source of timing can be used, or time can be "frozen" for testing.
Added support for MRUs in the client (client.go), so MRUs can be used from other golang applications, including Swarm's cli.
Allow the user to set a start time in the future. This allows to publish a resource that won't resolve until the clock hits that time.
Documentation update

Support for automatically signing MRUs by the node's bzzaccount private key has been removed from the server side. However, the API easily allows the node itself to use MRUs with its own private key. The swarm command line tool (client) can pick your local account from the same place, so regular users wouldn't notice where signing is actually taking place.

Code refactoring and terminology standarization

Throughout the package, the code has been refactored with the following reusable and extensible structures. To avoid confusion, the use of ambiguous terms such as addror key has been removed.

Main structures:

resourceUpdate: encapsulates the information sent as part of a resource update: period, version, multihash flag, the data payload and a reference to the Mutable Resource's metadata chunk.
resourceMetadata: carries the immutable information about a mutable resource: start time, frequency, name and owner.
resource: resourceUpdate + resourceMetadata. Caches resource data. When synced it contains the most recent version of the resource data and the metadata of its root chunk
SignedResourceUpdate: resourceUpdate + signature
Request message: SignedResourceUpdate + resourceMetadata. Serializable structure used to issue Mutable Resource creation and update messages.

Terms:

ownerAddr: Address of the resource owner. Type: storage.Address
metaHash is the SHA3 hash of the encoded resourceMetadata, excluding ownerAddr
rootAddr: this is the key of the chunk that contains the Mutable Resource metadata. Calculated as the SHA3 hash of ownerAddr and metaHash. Type: storage.Address
updateAddr: this is the key of the chunk that contains the Mutable Resource update. Calculated as the SHA3 hash of period, version and rootAddr. Type: storage.Address

Chunk Validator refactor

The chunk validator (Validate method) has been rewritten to:

Validate medatata chunks, whose key is content-addressed not as a direct Swarm hash of the content, but rather as SHA3(ownerAddr, metaHash) where metaHash = SHA3(resource name, frequency, startTime).
Validate update chunks verifying signatures and proof of ownership of the resource to be updated without having to lookup/download the referenced metadata chunk.
- Ownership is proven if the following holds true rootAddr == SHA3(signature_address, metaHash), where signature_address is the recovered address from the signature of the update chunk digest; digest = SHA3(updateAddr, metaHash, data), where updateAddr is the update chunk address, updateAddr = SHA3(period, version, rootAddr).

HTTP API

The HTTP API for resources (/bzz-resource:/) now works as follows:

Resource creation:

POST /bzz-resource:/ with the following JSON as payload:

{
  "name": string,
  "frequency": number,
  "startTime": number,
  "ownerAddr" : hex string,
}

where:

name Resource name. This is a user field. You can use any name
frequency Time interval the resource is expected to update at, in seconds.
startTime Time the resource is valid from, in Unix time (seconds). Set to the current epoch.
- Advanced: you can also put a startTime in the past or in the future. Setting it in the future will prevent nodes from finding content until the clock hits startTime. Setting it in the past allows you to create a history for the resource retroactively.
ownerAddr is the address derived from your public key. Hex encoded, prefixed with 0x.

Returns: MRU Manifest Key, as a quoted string. As before, this key can be put in an ENS Resolver contract with setContent.
The MRU Manifest Key resolves to a manifest that points to the metadata root chunk (rootAddr).

Note that this method only creates the resource. The resource will not return any data until a first update is submitted. You will usually create resources and initialize them. To do that, use the following:

Simultaneous resource creation and data initialization:

POST /bzz-resource:/ with the following JSON as payload:

{
  "name": string,
  "frequency": number,
  "startTime": number,
  "rootAddr" : hex string,
  "data": hex string,
  "multihash": boolean,
  "period": number,
  "version": number,
  "signature": hex string
}

where:

multihash is a flag indicating whether the data field should be interpreted as raw data or a multihash
data contains hex-encoded raw data or a multihash of the content the mutable resource will be initialized with. Prefixed with 0x
period indicates for what period we are signing. This must be set to 1.
version indicates what resource version of the period we are signing. Must be set to 1 for creation.
signature Signature of the digest. Hex encoded. Prefixed with 0x. The signature is calculated as follows digest = H(period, version, rootAddr, metaHash, multihash, data), where:
- H() is the SHA3 algorithm.
- period, version are encoded as little-endian uint64
- rootAddr is encoded as a 32 byte array
- metaHash is encoded as a 32 byte array
- multihash is encoded as the least significant bit of a flags byte
- data is the plain data byte array.

Returns: MRU Manifest Key, as a quoted hex string. As before, this key can be put in an ENS Resolver contract with setContent.
The MRU Manifest Key resolves to a manifest that points to the metadata root chunk (rootAddr)

To update a resource:

Step 1:

either
GET /bzz-resource:/<MRU_MANIFEST_KEY>/meta
-or-
GET /bzz-resource:/<DOMAIN THAT POINTS TO MRU_MANIFEST_KEY>/meta

This produces a response like this:

{
  "name": "myresource",
  "frequency": 300,
  "startTime": 1528722352,
  "ownerAddr": "0x7a2e393025c567ec4089d34f393ae6b5c234536a",
  "rootAddr": "0x0c5acf8e3176900bc5b7b732261b5ebafc05a56da3b01c044214408e841f4ded",
  "metaHash": "0x0d6968416844245d605b5e4205858e9a4779936fc3a279dce2d38415de91d79f",
  "version": 1,
  "period": 8,
  "multiHash": true
}

where:

Metadata Fields
- name: filled with the original value used to create the resource
- frequency: filled with the original value used to create the resource
- startTime: filled with the original value used to create the resource
- ownerAddr: filled with the original value used to create the resource
- metaHash Metadata hash, calculated as metaHash=H(len(metadata), startTime, frequency,name). Hex encoded
- rootAddr Metadata Root Chunk Address, calculated as rootAddr = H(metaHash, ownerAddr). This scheme effectively locks the root chunk so that only the owner of the private key that ownerAddr was derived from can sign updates.
Information about current update
- version: filled with the version number of the MRU that must be used for the next update to be successful.
- period: filled with the period number of the MRU that must be used for the next update to be successful
- multihash: whether the current data pointed to by the resource is multihash

Step 2:

To update the resource, create a new flat JSON with the following fields:

data: new data you want to set
multihash: whether the new data should be considered a multihash
period: use the suggested period from Step 1.
version: use the suggested version from Step 1.
signature, calculated in the same way as explained above for simultaneous resource creation and update.

Then, POST the resulting JSON to:
POST /bzz-resource:/

do not change version/period values, changing these values may cause that users can't find your latest update or some users seeing different versions of your content.

Phew! That looks like a lot of stuff! But don't worry, we got you covered with API and tools. See below.

Reading a resource

This is unchanged, you can still use as before:

GET /bzz-resource:/<MRU_MANIFEST_KEY>/
GET /bzz-resource://<MRU_MANIFEST_KEY> - get latest update
GET /bzz-resource://<MRU_MANIFEST_KEY>/<n> - get latest update on period n
GET /bzz-resource://<MRU_MANIFEST_KEY>/<n>/<m> - get update version m of period n

Go API

Swarm client (package swarm/api/client) has been updated with the following new methods:

CreateResource(Request)
- CreateResource creates a Mutable Resource according to the data included in the Request parameter. To create a Request, use the mru.NewCreateRequest() function.
- Returns the resulting Mutable Resource manifest address that you can use to include in an ENS Resolver (setContent) or reference future updates (Client.UpdateResource)
UpdateResource(Request)
- UpdateResource allows you to send a new version of your content
GetResource(manifestAddressOrDomain string) (io.ReadCloser, error)
- Returns the latest data currently contained in the resource as an octect stream. TODO: allow to get specific period/version
GetResourceMetadata(manifestAddressOrDomain string) (*mru.UpdateRequest, error)
- Returns a Request object that describes the resource and can be used to construct an update. Typically you will call this method to obtain a Request and then:
  - call Request.SetData() to put the new data in
  - call Request.Sign() to sign the update
  - call UpdateResource() above to send the update off.

Command line tools:

In this first version, the swarm cli interface has been updated with a new (advanced) command resource to manage resources, with this syntax:

Create:

swarm --bzzaccount="<account>" resource create <frequency> [--name <name>] [--data <0x hex data> [--multihash=true/false]]

If successful, this command will output a Manifest Address that can be used in ENS's setContent or directly in your browser via http://localhost:8500/bzz:/MANIFEST_ADDRESS

Update:

swarm --bzzaccount="<account>" resource update <Manifest Address or ENS domain> <0x Hexdata> [--multihash=true/false]

the --multihash=false flag sets multihash to false. By default the data is considered to be a multihash (--multihash=true). If you want to use the output of swarm up, prefix it with 0x1b20 to indicate a keccak256 hash.

Quick and dirty test:

$ MH=$(swarm up file.jpg) && swarm --bzzaccount "your public key" resource create 300 --name myresource --data 0x1b20${MH}

This uploads file.jpg and creates a resource that is expected to update every 5 minutes.

Final notes:

Please let me know your feedback, questions and test issues while I document and clean up the code. I hope you like this feature. I am available on Gitter (@jpeletier). Enjoy!!

PR Review guide:

Client-side MRU signing.pdf

nolash · 2018-07-12T08:00:44Z

cmd/swarm/mru.go

I find the --multihash flag a bit clumsy. Why didn't we want to use --raw?

I have already changed this twice.
Originally it was --raw
Then --rawmru was suggested in a prior review
Finally --multihash defaulting to true was the final one, meaning that only --multihash=false actually does something.

I like raw too, however there were comments that the raw word is used for other things in swarm

I can't see that the --raw flag is currently in use. @zelig do you know what he means here?

--rawmru is not particularly sexy.

Originally I figured data would be raw unless multihash was explicitly set. But maybe that's counter intuitive to how the rest of the cli tools work.

nolash · 2018-07-12T11:15:54Z

swarm/api/api.go

It's not shown here but there is a method for resourceUpdate declared in resource.go (line 50), it should be in the file where resourceUpdate is declared.

Fixed, thanks.

nolash · 2018-07-16T09:47:33Z

cmd/swarm/clientaccountmanager.go

I still don't understand this temporary directory stuff here. Did you do any looking into why this is solved in this manner? Where did the code come from again? (I think you said you cut and pasted it from somewhere?)

This comes from /node/config.go

well, at the very least the key should come from our actual keystore, andr fail if there is no key (or keystore) there. To create a new key, users should use geth account new

I digged a bit deeper into the ephemeral directory stuff. An Ethereum node will create an ephemeral keystore folder that will be deleted once the node is shutdown, if a folder cannot be set up. This comes from the following:

// KeyStoreDir is the file system folder that contains private keys. The directory can // be specified as a relative path, in which case it is resolved relative to the // current directory. // // If KeyStoreDir is empty, the default location is the "keystore" subdirectory of // DataDir. If DataDir is unspecified and KeyStoreDir is empty, an ephemeral directory // is created by New and destroyed when the node is stopped.

In any case, I have disabled ephemeral directory creation. If a keystore is not present, it will fail with an error.

zelig

just a temporary review

zelig · 2018-07-16T09:33:50Z

cmd/swarm/clientaccountmanager.go

please comment exported methods

zelig · 2018-07-16T09:44:35Z

cmd/swarm/clientaccountmanager.go

identity management and crypto should be solved properly for both MRU signing and Access Control.
we need to use clef or a javascript directy in the browser for non-cli.

for cli we also need to share code @justelad will comment

This code allows the user to access keys stored in the Ethereum keystore for signing MRUs, for his/her convenience.

Otherwise, the user would have to provide a key in a different way. This was not meant to be a final implementation, but rather a way to quickly give access to the feature without much work for the user.

If this is an issue, I can remove clientaccountmanager.go altogether and ask the user to point the cli to a private key on disk, or environment variable, etc, e.g.

swarm resource create 300 --name "myresource" --key "/home/jm/mykey.txt"

but rather a way to quickly give access to the feature without much work for the user

Yes, it should be quick and dirty. Having to separately give access to (something that gives access to) the key is too complicated in my opinion. At least if it's the only option one is given.

Alternatively, and part of the current implementation, the user can point --bzzaccount to a private key file path (encrypted or unencrypted), instead of specifying an address.
If the private key is encrypted, the user can also use --password, otherwise they'll get a prompt.
Thus, the following schemes are possible:

swarm --bzzaccount="/home/jm/key.txt" resource create 300 --name "myresource"

swarm --bzzaccount="/home/jm/key.txt" --password "xyz1234" resource create 300 --name "myresource"
and the regular one that looks for the key in the keystore:

swarm --bzzaccount="0x77b68bf07bdf9986084affaca2eace59700bf644" --password "xyz1234" resource create 300 --name

This is not part of this PR, it is already the way it works for starting a node.

@jpeletier i didn't quite get why you need to shadow the ethereum code for managing keystores? can't you use the built-in ethereum node keystore bootstrapping (as it's being used in the bzzd action in swarm/cmd)?

i didn't quite get why you need to shadow the ethereum code for managing keystores?

@justelad I tried, the problem is that that account management code is tightly coupled to execute when the node is started, and I don't want to start a node. I also didn't want to touch core Ethereum packages.

Another option is to make node/config.go makeAccountManager() public and call it. But that meant touching a core Ethereum package, and I am not sure if we can do that within Swarm's scope.

btw, we just had to do the same thing (more or less) for the access control tries.

Yes, I see, however you start a node there, and for signing a MRU I shouldn't need to start a node.

@justelad @zelig @nolash . can we land this as is, and mabye make a PR to Ethereum to change node/config.go so we make makeAccountManager() public? Once that PR is in, I can refactor this code to use it instead of our copy.

Thanks

Look closer - we don’t really spin up a node there, we just use the builtin functionality to construct one and get the keystore data from it. For the time being no need to expose anything from the node package IMHO. I’d really be happy to see this file go, as the concept of changing something in the main geth ethereum keystore/account management logic will be very hard to swallow and we can’t count on it happening.

Hi @justelad. I took a closer look and I tried to implement your method. However, I ran into a problem that you will probably also have.

In your function doPKNew, line 166 you have the following code:

} initSwarmNode(bzzconfig, stack, ctx) privateKey := getAccount(bzzconfig.BzzAccount, ctx, stack)

This works, however, initSwarmNode() also calls getAccount() inside. Thus, you're calling getAccount() twice. The issue with this is that if the account is encrypted you get a password prompt twice and that is very annoying.

I distilled the essence of this method into the following function maybe we can both use:

// getPrivKey returns the private key of the specified bzzaccount // Used only by client commands, such as `resource` func getPrivKey(ctx *cli.Context) *ecdsa.PrivateKey { // booting up the swarm node just as we do in bzzd action bzzconfig, err := buildConfig(ctx) if err != nil { utils.Fatalf("unable to configure swarm: %v", err) } cfg := defaultNodeConfig if _, err := os.Stat(bzzconfig.Path); err == nil { cfg.DataDir = bzzconfig.Path } utils.SetNodeConfig(ctx, &cfg) stack, err := node.New(&cfg) if err != nil { utils.Fatalf("can't create node: %v", err) } // initSwarmNode(bzzconfig, stack, ctx) return getAccount(bzzconfig.BzzAccount, ctx, stack) }

Notice that I removed the initSwarmNode() call, compared with your code in doPKNew. Do you need this call to make your access control code work? (I don't, I just need the private key)

I have updated this PR adding this function and now the infamous clientaccountmanager.go is gone!

GitCop · 2018-07-16T21:49:45Z

There were the following issues with your Pull Request

Commit: 8646265604c86da164a894f7d7307f9334eb5679
Your subject line is longer than 72 characters

Guidelines are available at https://github.com/ethersphere/go-ethereum

This message was auto-generated by https://gitcop.com

jpeletier · 2018-07-17T07:52:08Z

@zelig @nolash Any further comments or can we proceed to merge this?
Thanks,
Javier

jpeletier · 2018-07-18T08:13:08Z

@nolash @zelig Rebased, fixed conflicts, gitcop issues and ready to merge.

nolash

Not done, just sending a few comments "incrementally"

nolash · 2018-07-16T19:04:53Z

cmd/swarm/main.go

Shouldn't this be raw not multihash?

I like raw too. But I already change this twice. First someone suggested rawmru and I changed it. Then the problem is that raw has other meanings within swarm, so I changed it to multihash as a flag that defaults to true as per @zelig's request.
Please let's not have this discussion again.

I think a comment has been lost here. Anyway, --multihash=false is a clumsy flag. I find the solution poor, so I am afraid I must insist that we discuss this again.

I can't see raw being a flag in swarm, at least not currently. I don't know what that "someone" (@zelig ? @justelad ? @holisticode ? @gbalint ? @lmars ?) meant. I don't like rawmru. I believe my original implementation was that multihash is the secondary data type to mru.

If raw is indeed a problem, I would prefer to have --multihash flag explicitly telling the cli that the data is multihash, and raw if not.

@jpeletier so your working assumption is that multihash is the default mode of operation?
why not change this into: --data, and once you use --data you can input verbatim data just as you would with the old MRUs, and if no --data is specified - data is treated as a multihash.

@justelad that's too ambiguous for my taste. One could get the impression that multihash isn't "data"

I don't think we're getting more opinions on this. I've asked twice now. Let me cut through and encourage that data is raw by default and --multihash is a bool switch that defines multihash data. That should be deleting 1-2 "!"'s no? Ok?

Ok, as a final change 🙃 I am reverting that flag to the --multihash behavior described by @nolash.

I was tempted to add a hidden flag --nolash and another one --justelad that would then activate one behavior or the other. But then the discussion would be which one is the default and then we'd need the --zelig flag.

Anyone in disagreement can create a PR and I will be the reviewer then. Muhahahahaha 😈

nolash · 2018-07-16T19:11:48Z

cmd/swarm/main.go

didn't we say that we should be able to supply any kind of data here, not only 0x prefixed stuff?

You can supply any kind of data, providing you hex-encode it first. In a command line, you don't want problems with character escaping and hex-encoding is very convenient. Another option would be to support base64 as well.

This is not meant to be final. The idea is that "serious" data comes in from stdin, but that'd be a new PR.

@jpeletier didn't we already discuss the possibility of echo arbitrary.bin | swarm resource update? I really would like that to be possible without requiring hex filter. Or is that what you mean by "serious" data? 🏋️‍♂️

Yes, we discussed the possibility of doing this echo arbitrary.bin | swarm resource update. I agree this would be very useful. However, I need to leave it for another PR, and would not be incompatble with having --data "0x ..." in the command line.

nolash · 2018-07-16T19:34:05Z

cmd/swarm/mru.go

I can't see that the --raw flag is currently in use. @zelig do you know what he means here?

--rawmru is not particularly sexy.

Originally I figured data would be raw unless multihash was explicitly set. But maybe that's counter intuitive to how the rest of the cli tools work.

nolash · 2018-07-16T19:50:16Z

cmd/swarm/mru.go

Let's be nice and add a update digest command too (that can be signed independently)? We can also have it in client side, and the caveats of requesting it from the node can be made clear and is at caller's peril.

I agree and I will add it, but this needs to be a new PR as well.

nolash

Still not done.... :)

nolash · 2018-07-18T08:29:33Z

swarm/api/client/client.go

s/send/set/

fixed. Thanks.

nolash · 2018-07-18T08:33:06Z

swarm/api/client/client.go

Should the meta instruction be a HTTP header rather than a subpath?

I don't agree with this. It is more complicated and obscure to use HTTP headers to specify the information you want to retrieve. Also more complicated for browser implementations and cors restrictions that need pre-flight when you specify weird headers.

Additionally, pasting in the browser address bar http://localhost:8500/bzz-resource/12e1beb28d86ed828f9c38f064402e4fac9ca7b56dab9cf59103268a62a2b35f/meta is much esier than having to open up a terminal and type curl -H 'X-SWARM-METADATA' http://localhost:8500/bzz-resource/12e1beb28d86ed828f9c38f064402e4fac9ca7b56dab9cf59103268a62a2b35f or something like that.

When designing an API, usually you specify the resource / identifier for the resource / subresource / identifier for subresource ...
In our case:
/bzz-resource:/<hash>/meta falls perfectly into this, since you want the "meta" subresource of the resource identified by <hash>

So there are two things to this.

I doubt that it's mostly humans that will be performing requests for specific versions and periods. Thus I'm not sure if the argument of what's easier to put in curl or a browser address field is very relevant. (Of course, it's frought with peril to make predictions about the future)

If we would want to allow resolving paths of underlying manifests through the mutable resource hash, then we would have to declare certain prefixes (period, version, meta) invalid. That's no good. I think a more natural comparison of expectations of how this should work would be query strings rather than subpaths (of which the permalink flavors like /period/1/version/2/ tend to be a mod_rewrite hack anyway). We already use query string for for example setting explicit content type on retrieval. So maybe that's what we should do here too?

If we are going to be using paths to retrieve files, etc, I thing that'd be out of scope of /bzz-resource:/ family of endpoints. I understand /bzz-resource:/ as a way to manipulate the existence and lifecycle of resources themselves.

Resolving paths and stuff like that shoud be /bzz:/ or a new root level endpoint.

What I mean is that I understand bzz-resource: as a low level API for resource lifecycle management, not something that supports higher level abstractions such as working with files and paths. That should be a higher layer service and should be on a different endpoint

I think we are mixing discussions. The semantics of bzz-resource: and bzz are different. And those semantics affect whether it is appropriate or not to use HTTP headers depending on the operation.

Headers are to specify how you want the resource delivered, conditions on delivery and to carry the user's context (location, language, session). This affects delivery but does not determine the what.

To specify what you want, you use the URL with its path arrangements and query string possibilities.

Additionally, consider that use of custom headers will cause browsers to issue an extra preflight CORS check HTTP request, and that sucks.

Also note that some proxies may strip custom headers silently and mercilessly. Regarding proxies and intermediate caches, including the browser's: Responses to requests with the same URL but different headers could be cached as the same, so the header would need to be present in request and response, along with a Vary header in the response to indicate that the presence of a header affects the cache key. A mess.

Therefore: Avoid custom headers as much as possible. The easiness of using a straight URL was just a bonus point, not the main one.

Allow me to describe what I understand are the bzz-resource: and bzz: semantics, please let me know if I am not on the right page:

bzz-resource:: Manipulate resources at a low level. Create and update resources. Retrieve straight binary data. Refer to each resource with period+version in the URL, query string or part of the path.
bzz: static site / filesystem abstraction on top of Swarm.

For bzz-resource: I would:

POST /bzz-resource:/: create / update resources

GET /bzz-resource:/{manifest}/meta: obtain resource information

GET /bzz-resource:/{manifest}/period/{period}/version/{version}: obtain raw octet stream of specific period, version. A querystring version of this would also be appropriate.

GET /bzz-resource:/{manifest}/period/{period}/: obtain raw octect stream of latest version in period. A querystring version of this would also be appropriate.

GET /bzz-resource:/{manifest}/: obtain raw content of latest version

Take into account that the /meta call will be needed by any client browser application prior to signing any MRU. Obtaining the resource's status information is indeed a GET. It doesn't make sense I get something completely different in the response just because I added a header saying I want the resource information.
Perhaps what is causing the confusion is the name /meta, maybe it should be renamed to /info.

With these semantics above, use of HTTP headers is not justified.

For bzz::

GET /bzz:/{manifest}/path: Resolve latest version of MRU as pointing to a manifest JSON... static site. That is, as it works today. Nothing new here.

Use HTTP headers to affect delivery of a particular resource. Thus, the same URL/site can be "viewed in the past" by adding, e.g.: X-Swarm-Mru-Time: Wed, Apr 04 2018. Swarm would then translate that to a period, version and retrieve that content, --or-- we could allow the user to specify period, version directly. We would also need to add the header to the response along with a Vary: X-Swarm-Mru-Time so that we don't confuse the browser cache or that of any intermediate proxy.
Use of HTTP headers for this is OK, though you will need a browser extension to set that header for you if you'd like to browse a site in the past. The browser will be issuing preflights if headers are set on XMLHttpRequests.

I agree with @jpeletier , I prefer rest api on bzz-resource.

for bzz I prefer query strings when you go back in history. but headers can be supported as well

Perhaps what is causing the confusion is the name /meta

For my part, no.

Anway, I think what you say about the how and what involving headers makes sense (although I don't think the cache and both sides headers really is such a mess). However, why not use query strings is still a question.

However, why not use query strings is still a question

I am OK with query strings, this was more of a cosmetic taste. Also, be prepared to escape the insidious & in shell commands. ;-)

for bzz I prefer query strings when you go back in history.

My only concern here is that advanced static applications (Angular.JS, etc) may use query strings, so you could potentially affect the appication. This is a corner case, though.

nolash · 2018-07-18T08:41:46Z

swarm/api/http/server.go

So did we explicitly decide on not using HTTP headers for the period and version params? I agree with earlier sentiments by @justelad that it would be more semantically correct.

This is out of scope of this PR, this behavior is untouched by it, but here is my take on this:

For the reasons I specified above, I would not put this as HTTP headers either. It is perfectly reasonable to retrieve a resource with a URL like this:

http://localhost:8500/bzz-resource/12e1beb28d86ed828f9c38f064402e4fac9ca7b56dab9cf59103268a62a2b35f/7/2

Easy, straight to the point, put it in a browser url bar and you are done. Avoid cors issues with HTTP headers and annoying non-standard Ajax implementations if you are accessing from browser javascript. No annoying curl parameters either.

I would, however, refactor it to this:

http://localhost:8500/bzz-resource/12e1beb28d86ed828f9c38f064402e4fac9ca7b56dab9cf59103268a62a2b35f/period/7/version/2

This is much more REST-like.
If GitHub can be any example, look at this URL:
https://github.com/ethersphere/go-ethereum/pull/784/commits/f43fe25bd42d9d6a41b2ed99dc4b28be509f6a27
Notice the URL points to commit f43fe ... of pull request 784. Very easy to read and understand. A URL I can copy and paste in a chat window or email to a friend.

TL;DR; Avoid using HTTP headers. Makes things complicated. I have never seen API definitions use headers as an everyday thing unless you want something strange like change encoding of content and things like that.

What I mean is that I understand bzz-resource: as a low level API for resource lifecycle management, not something that supports higher level abstractions such as working with files and paths. That should be in a different layer and not in bzz-resource:

@nolash land this as is - let's not turn this into a blocker. i wanna see this merged ASAP, we could discuss this later

nolash · 2018-07-18T08:55:12Z

swarm/storage/mru/handler.go

"If it's assumed to be .." ?

Fixed. Thanks,.

nolash · 2018-07-18T08:59:00Z

swarm/storage/mru/handler.go

Maybe we should be careful about using the word synced here. There is no mechanism that keeps this data synced, save for explicit retrieval (or update) calls. So if an update is promiscuously made through a different now, the data claimed to be "synced" in this node will in fact be stale. We should probably call it cache instead.

I really feel we still should have a loop that pulls the current state of selected resources, but that should be a follow-up PR.

I agree. This wording is inherited by this PR, though. I think we should revisit the whole "proactive cache" polling in a new PR.

We have an issue for this already: #569

Do you agree it's a "good first issue" (newbie task) @jpeletier ?

@nolash I am not sure if I'd put this as a "good first issue". There is some refactor involved here. Namely, I don't know if it makes sense to cache anything other than the metadata chunk. What'd be your polling frequency?. Lots of thoughts after weeks dealing with this code. I'll take the discussion to #569.

nolash · 2018-07-18T09:16:06Z

swarm/storage/mru/handler.go

Well, not really; swarm doesn't (currently) disambiguate between network problems and non-existent resources as causes for retrieve failures.

However, if we already have a version in the cache for this resource, maybe we can safely assume that it's network error?

I have edited the comment to specify it could be a network error and as far as the node understands there are no other updates.
Having a previous version in the cache does not guarantee anything, I think. Yes, you have a previous version, but how can you be sure it is the very latest?

nolash · 2018-07-18T09:19:00Z

swarm/storage/mru/handler.go

"Uses the cache to determine the current state of the resource" ?

Added. Also discovered and fixed a bug whereupon the resource cache was altered inadvertently when preparing the parameters for the lookup.

nolash · 2018-07-18T09:21:31Z

swarm/storage/mru/handler.go

This is the only cheap check we can do for sure
without having to lookup update chunks

Should this be a bit more thoroughly explained, and moved to the method comment?

Explained a bit better. Left the comment there, but added an extra explanation to the method comment

nolash · 2018-07-18T09:23:30Z

swarm/storage/mru/handler.go

A thought; should we provide a method, even interfaced through http, that tells us the maximum net size of data an MR accepts?

Leave for a new PR ;-) Maybe this could be more global, like being able to query the swarm node's parameters / network information / node info .... One of those could be MRU max size.

well, but won't the max size depend on the resource "name"? Or is the actual metadata of update chunks constant across resources?

The metadata of update chunks (updateHeader) is of constant length and does not depend on the "name"
The name is part of the root metadata chunk and limited to 255 chars and therefore is stored only once.

The max payload data of a resource update as defined in update.go is:
const maxUpdateDataLength = chunkSize - signatureLength - updateHeaderLength - chunkPrefixLength . This is 3954 bytes.

Gotcha. Nice round number :)

You gotta have inexplicably convoluted constants in your code for people to take you serious.

nolash · 2018-07-18T09:26:22Z

swarm/storage/mru/handler.go

resource index value

This is the cache. We should be specific about the terms.

Again, this is wording change was not my intention to introduce in this PR. I think we should revisit the whole "cache" stuff in a new PR.

yes we should revisit cache

nolash · 2018-07-19T07:25:59Z

swarm/storage/mru/lookup.go

Fixed. thanks.

nolash · 2018-07-19T07:33:01Z

swarm/storage/mru/metadata.go

"or to disambiguate resources with same starttime, frequency, owneraddr"

added. Thanks

nolash · 2018-07-19T07:45:21Z

swarm/storage/mru/request.go

Wouldn't it always be 1 here?

In most cases it will be 1, but you have to take into account what startTime the user put in the metadata parameter.

It could be that the user is creating the first update for a resource whose metadata has a startTime way in the past.

I mean, I may have created a resource yesterday for example with frequency=1h, however I send the first update today. Thus, my first update has a period = 25.

nolash · 2018-07-19T07:51:45Z

swarm/storage/mru/request.go

I still think we should contribute the expectedlength hex decoder to common/hexutils. If you need the name for log, perhaps just wrap it, and we can change the calls with grep in production code?

I am fine either way. Do you mind creating an issue to track this contribution? I would like not to add more work to this PR. It would be better if we leave this as is and when that contribution is accepted, then we refactor our code here in a very small PR.

I did already: #803

nolash · 2018-07-19T07:53:02Z

swarm/storage/mru/resource.go

similarly we have to change this text not to promise sync

fixed. thanks

nolash · 2018-07-19T07:53:48Z

swarm/storage/mru/resource.go

remove please (we should add sync loop instead)

This code is untouched by this PR. I would like that this kind of work is part of the sync loop / cache enhancement PR.

nolash · 2018-07-19T08:04:33Z

swarm/storage/mru/signedupdate.go

Should this check be here? This is called by handler.Validator and handler.updateIndex, and the latter doesn't really need the check, no?

I could remove it and add Verify() to handler.Validate() so that handler.updateIndex() does not have this burden, however this would mean trusting what comes from the disk is not corrupt somehow. What do you think?
In my opinion, better safe than sorry.

this would mean trusting what comes from the disk is not corrupt somehow

I don't think we check any other chunks for corruption when reading from disk. I think if someone broke into your node's dbstore, nothing can be trusted....

nolash · 2018-07-19T08:07:00Z

swarm/swarm.go

remove please

…reum#704

…reate set common time source for mru package

when requesting a resource that does not exist

Cosmetic / comment changes

* swarm/storage/mru: Add embedded publickey and remove ENS dep This commit breaks swarm, swarm/api... but tests in swarm/storage/mru pass * swarm: Refactor swarm, swarm/api to mru changes, make tests pass * swarm/storage/mru: Remove self from recv, remove test ens vldtr * swarm/storage/mru: Remove redundant test, expose ResourceHash mthd * swarm/storage/mru: Make HeaderGetter mandatory + godoc fixes * swarm/storage: Remove validator prefix for metadata chunk * swarm/storage/mru: Use Address instead of PublicKey * swarm/storage/mru: Change index from name to metadata chunk addr * swarm/storage/mru: Refactor swarm/api/... to MRU index changes * swarm/storage/mru: Refactor cleanup * swarm/storage/mru: Rebase cleanup * swarm: Use constructor for GenericSigner MRU in swarm.go * swarm/storage: Change to BMTHash for MRU hashing * swarm/storage: Reduce loglevel on chunk validator logs * swarm/storage/mru: Delint * swarm: MRU Rebase cleanup * swarm/storage/mru: client-side mru signatures Rebase to PR #668 and fix all conflicts * swarm/storage/mru: refactor and documentation * swarm/resource/mru: error-checking tests for parseUpdate/newUpdateChunk * swarm/storage/mru: Added resourcemetadata tests * swarm/storage/mru: Added tests for UpdateRequest * swarm/storage/mru: more test coverage for UpdateRequest and comments * swarm/storage/mru: Avoid fake chunks in parseUpdate() * swarm/storage/mru: Documented resource.go extensively moved some functions where they make most sense * swarm/storage/mru: increase test coverage for UpdateRequest and variable name changes throughout to increase consistency * swarm/storage/mru: moved default timestamp to NewCreateRequest- * swarm/storage/mru: lookup refactor * swarm/storage/mru: added comments and renamed raw flag to rawmru * swarm/storage/mru: fix receiver typo * swarm/storage/mru: refactored update chunk new/create * swarm/storage/mru: refactored signature digest to avoid malleability * swarm/storage/mru: optimize update data serialization * swarm/storage/mru: refactor and cleanup * swarm/storage/mru: add timestamp struct and serialization * swarm/storage/mru: fix lint error and mark some old code for deletion * swarm/storage/mru: remove unnecessary variable * swarm/storage/mru: Added more comments throughout * swarm/storage/mru: Refactored metadata chunk layout + extensive error checking * swarm/storage/mru: refactor cli parser Changed resource info output to JSON * swarm/storage/mru: refactor serialization for extensibility refactored error messages to NewErrorf * swarm/storage/mru: Moved Signature to resource_sign. Check Sign errors in server tests * swarm/storage/mru: Remove isSafeName() checks * swarm/storage/mru: scrubbed off all references to "block" for time * swarm/storage/mru: removed superfluous isSynced() call. * swarm/storage/mru: remove isMultihash() and ToSafeName functions * swarm/storage/mru: various fixes and comments * swarm/storage/mru: decoupled cli for independent create/update * Made resource name optional * Removed unused LookupPrevious * swarm/storage/mru: Decoupled resource create / update & refactor * swarm/storage/mru: Fixed some comments as per issues raised in PR #743 * swarm/storage/mru: Cosmetic changes as per #743 comments * swarm/storage/mru: refct request encoder/decoder > marshal/unmarshal * swarm/storage/mru: Cosmetic changes as per review in #748 * swarm/storage/mru: removed timestamp proof placeholder * swarm/storage/mru: cosmetic/doc/fixes changes as per comments in #704 * swarm/storage/mru: removed unnecessary check in Handler.update * swarm/storage/mru: Implemented Marshaler/Unmarshaler iface in Request * swarm/storage/mru: Fixed linter error * swarm/storage/mru: removed redundant address in signature digest * swarm/storage/mru: fixed bug: LookupLatestVersionInPeriod not working * swarm/storage/mru: Unfold Request creation API for create or update+create set common time source for mru package * swarm/api/http: fix HandleGetResource error variable shadowed when requesting a resource that does not exist * swarm/storage/mru: Add simple check to detect duplicate updates * swarm/storage/mru: moved Multihash() to the right place. * cmd/swarm: remove unneeded clientaccountmanager.go * swarm/storage/mru: Changed some comments as per reviews in #784 * swarm/storage/mru: Made SignedResourceUpdate.GetDigest() public * swarm/storage/mru: cosmetic changes as per comments in #784 * cmd/swarm: Inverted --multihash flag default * swarm/storage/mru: removed Verify from SignedResourceUpdate.fromChunk * swarm/storage/mru: Moved validation code out of serializer Cosmetic / comment changes * swarm/storage/mru: Added unit tests for UpdateLookup * swarm/storage/mru: Increased coverage of metadata serialization * swarm/storage/mru: Increased test coverage of updateHeader serializers * swarm/storage/mru: Add resourceUpdate serializer test

* swarm/storage/mru: Add embedded publickey and remove ENS dep This commit breaks swarm, swarm/api... but tests in swarm/storage/mru pass * swarm: Refactor swarm, swarm/api to mru changes, make tests pass * swarm/storage/mru: Remove self from recv, remove test ens vldtr * swarm/storage/mru: Remove redundant test, expose ResourceHash mthd * swarm/storage/mru: Make HeaderGetter mandatory + godoc fixes * swarm/storage: Remove validator prefix for metadata chunk * swarm/storage/mru: Use Address instead of PublicKey * swarm/storage/mru: Change index from name to metadata chunk addr * swarm/storage/mru: Refactor swarm/api/... to MRU index changes * swarm/storage/mru: Refactor cleanup * swarm/storage/mru: Rebase cleanup * swarm: Use constructor for GenericSigner MRU in swarm.go * swarm/storage: Change to BMTHash for MRU hashing * swarm/storage: Reduce loglevel on chunk validator logs * swarm/storage/mru: Delint * swarm: MRU Rebase cleanup * swarm/storage/mru: client-side mru signatures Rebase to PR #668 and fix all conflicts * swarm/storage/mru: refactor and documentation * swarm/resource/mru: error-checking tests for parseUpdate/newUpdateChunk * swarm/storage/mru: Added resourcemetadata tests * swarm/storage/mru: Added tests for UpdateRequest * swarm/storage/mru: more test coverage for UpdateRequest and comments * swarm/storage/mru: Avoid fake chunks in parseUpdate() * swarm/storage/mru: Documented resource.go extensively moved some functions where they make most sense * swarm/storage/mru: increase test coverage for UpdateRequest and variable name changes throughout to increase consistency * swarm/storage/mru: moved default timestamp to NewCreateRequest- * swarm/storage/mru: lookup refactor * swarm/storage/mru: added comments and renamed raw flag to rawmru * swarm/storage/mru: fix receiver typo * swarm/storage/mru: refactored update chunk new/create * swarm/storage/mru: refactored signature digest to avoid malleability * swarm/storage/mru: optimize update data serialization * swarm/storage/mru: refactor and cleanup * swarm/storage/mru: add timestamp struct and serialization * swarm/storage/mru: fix lint error and mark some old code for deletion * swarm/storage/mru: remove unnecessary variable * swarm/storage/mru: Added more comments throughout * swarm/storage/mru: Refactored metadata chunk layout + extensive error... * swarm/storage/mru: refactor cli parser Changed resource info output to JSON * swarm/storage/mru: refactor serialization for extensibility refactored error messages to NewErrorf * swarm/storage/mru: Moved Signature to resource_sign. Check Sign errors in server tests * swarm/storage/mru: Remove isSafeName() checks * swarm/storage/mru: scrubbed off all references to "block" for time * swarm/storage/mru: removed superfluous isSynced() call. * swarm/storage/mru: remove isMultihash() and ToSafeName functions * swarm/storage/mru: various fixes and comments * swarm/storage/mru: decoupled cli for independent create/update * Made resource name optional * Removed unused LookupPrevious * swarm/storage/mru: Decoupled resource create / update & refactor * swarm/storage/mru: Fixed some comments as per issues raised in PR #743 * swarm/storage/mru: Cosmetic changes as per #743 comments * swarm/storage/mru: refct request encoder/decoder > marshal/unmarshal * swarm/storage/mru: Cosmetic changes as per review in #748 * swarm/storage/mru: removed timestamp proof placeholder * swarm/storage/mru: cosmetic/doc/fixes changes as per comments in #704 * swarm/storage/mru: removed unnecessary check in Handler.update * swarm/storage/mru: Implemented Marshaler/Unmarshaler iface in Request * swarm/storage/mru: Fixed linter error * swarm/storage/mru: removed redundant address in signature digest * swarm/storage/mru: fixed bug: LookupLatestVersionInPeriod not working * swarm/storage/mru: Unfold Request creation API for create or update+create set common time source for mru package * swarm/api/http: fix HandleGetResource error variable shadowed when requesting a resource that does not exist * swarm/storage/mru: Add simple check to detect duplicate updates * swarm/storage/mru: moved Multihash() to the right place. * cmd/swarm: remove unneeded clientaccountmanager.go * swarm/storage/mru: Changed some comments as per reviews in #784 * swarm/storage/mru: Made SignedResourceUpdate.GetDigest() public * swarm/storage/mru: cosmetic changes as per comments in #784 * cmd/swarm: Inverted --multihash flag default * swarm/storage/mru: removed Verify from SignedResourceUpdate.fromChunk * swarm/storage/mru: Moved validation code out of serializer Cosmetic / comment changes * swarm/storage/mru: Added unit tests for UpdateLookup * swarm/storage/mru: Increased coverage of metadata serialization * swarm/storage/mru: Increased test coverage of updateHeader serializers * swarm/storage/mru: Add resourceUpdate serializer test

jpeletier requested review from lmars and nolash as code owners July 11, 2018 16:30

nolash added enhancement ready for review resource update labels Jul 11, 2018

nolash assigned jpeletier Jul 11, 2018

nolash requested review from zelig and removed request for lmars July 11, 2018 21:04

jpeletier changed the title ~~Client side mru~~ Client-side MRU signatures Jul 12, 2018

nolash reviewed Jul 12, 2018

View reviewed changes

nonsense force-pushed the develop branch from 20787f9 to fad379f Compare July 12, 2018 09:44