Skip to content
This repository was archived by the owner on Mar 14, 2019. It is now read-only.

Conversation

@rhyslbw
Copy link
Contributor

@rhyslbw rhyslbw commented Apr 30, 2015

WIP, submitting now for review

The worker package now observes a Mongo backed queue for jobs, which is
handled by a new package job-manager. The collection observers have
been split out into a separate package to enable them to be limited to
a single instance when scaling. Standard packages has also been updated
to include job-manager and collection-observers.

Implementation Issues

  • Event listeners on the FS.Collection within JobManager are firing for all registered collections in app
  • When a remove is observed on a FS.Collection, I'm stringifying the document so an FS.File instance can be passed into FS.TempStore.removeFile and FS.StorageAdapter.prototype.remove. Are we happy to have this much data being passed around in a job?
  • We need a sensible default and configuration capability to deal with managing the JobCollection's completed jobs

The worker package now observes a Mongo backed queue for jobs, which is
handled by a new package job-manager. The collection observers have
been split out into a separate package to enable them to be limited to
a single instance when scaling. Standard packages has also been updated
to include job-manager and collection-observers.
@rhyslbw rhyslbw changed the title Initial commit of changes to support multi-instance systems Mongo backed queue Apr 30, 2015
@bitomule
Copy link

This PR would fix the issue with multiple servers trying to upload the same file if the share db?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Apr 30, 2015

@bitomule I believe the end result will, but this is very much a WIP. This issue right? #635

@bitomule
Copy link

@rhyslbw here's my issue #635

How can I help testing your PR?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Apr 30, 2015

The current design expects the collection-observer package to be running in a single instance to ensure only one app has the responsibility of triggering the collection to emit two key events events, so standard-packages cannot be used where you will have multiple instances, or installed in multiple different applications using the same db. In your case a way to handle this without moving your cfs implementation into a custom package and then creating a small app to handle the observer is by maintaining two versions of your existing app, one with standard-packages and the other with the following:

  • cfs:base-package
  • cfs:file
  • cfs:collection
  • cfs:collection-filters
  • cfs:access-point
  • cfs:job-manager
  • cfs:worker
  • cfs:upload-http

I'm thinking though that the better design here will be for the job-manager to expect to receive the same event multiple times, and to drop any duplications by checking if a job already exists in the queue, as that would reduce the complexity and tighten things up.

@bitomule
Copy link

@rhyslbw so, right now I can do that changes and test it?

One app wil have standard-package and S3 packages while the other one will have the packages you've listed + S3. That would allow me uploading from both apps, no?

Great work, btw.

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Apr 30, 2015

@bitomule Yes that's right, but be aware the worker function isn't handling errors yet to mark the job failed, but I'll look at that within the next couple of days. Actually in your case you want to add cfs:worker to the secondary app too (list updated)

@bitomule
Copy link

Tested and error is still there. Uploaded in app B and app A crashes. :(

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Apr 30, 2015

Is the crash happening at the same point @bitomule ? I can't see any collection observers that would be causing this, which should be the only way for the second app to be triggered. Since my use case has been inserting images on the server I've not put any time into the upload scenario at this point, but I'll do some tests. In the mean time if you better understand the problem let me know.

@rhyslbw
Copy link
Contributor Author

rhyslbw commented May 1, 2015

@raix I'm stuck with a problem relating to events that I could use your help with.

The event listeners registered by the FS.Collection by calling into FS.JobManager such as this are being triggered for all collections registered in the app, which results in duplicate jobs being queued. Does anything stand out? The events are being emitted correctly

@bitomule
Copy link

bitomule commented May 1, 2015

Hi @rhyslbw , this is the error I get:

stream.js:94
throw er; // Unhandled stream error in pipe.
Error: ENOENT, open '/home/action/workspace/myapp/.meteor/local/cfs/files/_tempstore/files-D7801A75-A784-4E05-A32E-BE17D3D6A7E3-0.chunk'

I got this one in the server with standard-packages installed, uploading from the server without it.

@bitomule
Copy link

bitomule commented May 2, 2015

@rhyslbw do you think it would be possible to fix the multiple instances issue soon or it would need big changes? I want to help but CollectionFS is a huge project.

@rhyslbw
Copy link
Contributor Author

rhyslbw commented May 3, 2015

@bitomule Also outside of the scope of this development right now is the use of the Filesystem SA since this presents it's own set of challenges when running the app on multiple servers. If you were using GridFS for the tempstore then this error should go away. I'd suggest looking at using a cloud storage solution, S3 being the obvious choice.

…ctions, and formalisation of FS.CollectionObservers

- Now FS.CollectionObservers, registered within the collection if
installed.
- renamed JobManager method from ‘listen’ to ‘register’
- removeStoredData was not previously being handled, resulting in
stored data being left in tact after FS.Collection doc removal.
- Workaround implemented to deal with known issue of event listeners in
JobManager firing for each FS.Collection in app.
@raix
Copy link

raix commented May 4, 2015

@rhyslbw is it because multiple instances each will add a task to the manager?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented May 4, 2015

@raix no it's the same instance. The single event is being heard by the on method of all collections. I've got a small dev app I will push up later today so you can run locally. It's got two FS.Collections each with two stores, and on startup I insert an image from the server into the first collection. A single event is fired for each of the stages, but the listeners in FS.JobManager fire twice, causing duplicate jobs. I get the expected outcome when commenting out the second FS.Collection instantiation.

…r-side beginStorage to fsFile prototype

Also updated jobCollection workers to better track success of job
@rhyslbw
Copy link
Contributor Author

rhyslbw commented May 6, 2015

I've just pushed another commit to the branch, so please update and review. I needed to implement the TempStore transfer queue as a way to govern the process of inserting a batch of files on the server, as previously it would causes the write streams to all start simultaneously. I used PowerQueue since round tripping to the db was unnecessary. On the client the HTTP Upload Queue provides the control to govern the process, but I can't see how or where the completion of the upload triggers the transfer to the TempStore. I can see the fileObj emits an uploaded event here, but there doesn't appear to be a .on method to hear it. Can anyone shed some light on this?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jun 3, 2015

I've just pushed a version that consolidates the worker tasks into a single queue, to enable use of the Job-Collection priority feature. The saveCopy jobs are ranked the highest priority since the others are cleanup operations. Queue is now also running in JobManager, with the worker registering in like this:
FS.JobManager && FS.JobManager.registerJob('saveCopy',saveCopy);

I've also put together a small demo app that outputs information from the JobCollection based on a server side insert, and then removal.

https://github.com/rhyslbw/cfs-job-manager-demo

@raix
Copy link

raix commented Jun 3, 2015

@rhyslbw Great work! will tryout the demo this weekend,

@mxab
Copy link

mxab commented Jun 4, 2015

Hi @rhyslbw,
Nice work, I'm really looking forward to the release.
I tried packages of your demo setup on my project by linking all the packages in my packages folder, I guess this should work?

I have two cfs collections, that contain in total about 10k images, the current problem that the observers alldone query "added" callback gets triggered for every stored file gets now more problematic as it generates a lot of remove temp file jobs everytime you restart/redeploy the app.

Maybe an idea would be that the cfs collection entry get flagged after the have been processed by the job like this:

{ ...
"cleanedUp" : true
...  }

and instead of making the allDone query make an allDoneButNotCleanedUp query.

@mxab
Copy link

mxab commented Jun 23, 2015

Any news when this will get merged?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jun 23, 2015

It's still early days on this. The design hasn't been reviewed yet, so I'd still treat this as a review PR that's WIP

@mxab
Copy link

mxab commented Jun 23, 2015

anything I can do?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jun 23, 2015

We really need input from @CollectionFS/owners

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jun 26, 2015

@mxab I've push a critical bug fix that you will want to get into your system.

@mxab
Copy link

mxab commented Jul 13, 2015

Hi @rhyslbw in my current setup i have two instance of my app running on different servers, will this work if I have a filesystem as my tempstore or may this run into an error if the one machine saves it on its tempstore and the other one triggers the processing?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jul 14, 2015

@mxab You'll see errors as the saveCopy jobs going into the queue are not restricted to the process that created them, and they won't have access to the store. It's possible for this requirement to be included, resulting in the worker only observing for jobs created in the current process, but this is extra complexity and not ideal for scaling. S3 as the TempStore is the best option if GridFS isn't feasible

@mxab
Copy link

mxab commented Jul 14, 2015

@rhyslbw it depends how you see scaling, we have a geo dns routing approach therefore the gridfs/s3 approach wouldn't scale that much.
A worst case scenario:

  • User inserts file on an app node to Asia
  • Node stores it in temp store in Europe (S3 or GridFS)
  • Same node reads the process starts with the processing (repeat this step for every store)

Would be nice if we could somehow define a policy for that, either for a setup that each node has its own FileSystem tempstore and it's the only one that can be scheduled for the processing job. Another setup might be that the you have several nodes and only the one close to the S3/GridFS tempstore is allowed to process it.

But I agree this introduces extra complexity so we might schedule this after the PR is merged :)

@mxab
Copy link

mxab commented Jul 14, 2015

@rhyslbw I extracted your tempstore fix into a pullrequest as I'm not sure when this branch will be merged #746

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jul 14, 2015

@mxab Yes it's going to be quite variable based on use case. I actually see the future of TempStore cloud storage adaptors uploading direct from the client (like edgee:slingshot), and then any server-side queue tasks running in a separate process or on the current Meteor server for simple setups. If FileSystem is set for the TempStore, then it should be considered that the jobs created can only be completed by the initiating process.

@mxab
Copy link

mxab commented Jul 16, 2015

Do you think for know to work around the the FileSystem tempstore problem would be to restrict the queue at
https://github.com/rhyslbw/Meteor-CollectionFS/blob/feature/jc-queue/packages/job-manager/server.js#L123

with

if(job.type == registeredJobWorker.type && job.sourceId === FS.JobManager.Config.nodeId)

of course this makes only sense in my usecase, but in more general terms would be to have a callback that checks if this processor should process it

if(job.type == registeredJobWorker.type && FS.JobManager.canProcess(job, registeredJobWorker.type))

and somewhere in the application something like:

FS.JobManager.canProcess = function(job, workerType){
 return calculateMyServerLoad()<0.3;
}

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jul 16, 2015

My thought would be to modify the observer query so the queue doesn't poll the job collection unnecessarily

@mxab
Copy link

mxab commented Jul 16, 2015

even better, I just haven't seen this

@mxab
Copy link

mxab commented Jul 16, 2015

@mxab
Copy link

mxab commented Jul 17, 2015

Ok it looks like is working, I added this somewhere in application code

FS.JobManager.jobQueryRestriction = {
    'data.sourceId' : FS.JobManager.Config.nodeId
};

I think it is only important when I do that that I have a fixed node id

FS.JobManager.Config.nodeId = process.env.NODE_ID;

otherwise I guess jobs won't be picked up after restart as the node id is otherwise generated randomly

Any optimisation suggestion @rhyslbw ?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jul 17, 2015

@mxab That looks good, but we should probably put the jobQueryRestriction in the Config object so we can define all options in a single object in the application

@mxab
Copy link

mxab commented Jul 17, 2015

I moved it to config, should I make a PR of this?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Jul 18, 2015

Yep I'll accept that into my branch. nice work

@brianlukoff
Copy link

@rhyslbw Are you using this branch in production now?

@rhyslbw
Copy link
Contributor Author

rhyslbw commented Aug 27, 2015

Hey @brianlukoff

I'm not actually using cfs anywhere right now as I decided to implement a custom solution to better suit my specialised needs. I believe @mxab and @bitomule are though

@bitomule
Copy link

@brianlukoff We're using it a mmmelon.com for web and iOS app without issues so far. The only issue we had is with traffic and digital ocean as we're using amazon S3 for temp files. We're moving to ec2 soon to avoid that traffic.

This was referenced Oct 16, 2015
@raix raix merged commit 9752a27 into Meteor-Community-Packages:devel Dec 7, 2016
@luixal
Copy link

luixal commented Dec 9, 2016

@raix is that merge solving the multiple instances problem?

I understand the problem, but I've got lost with so many forks, issues, branches about this... is there anything in the docs?

Thanks!

@raix
Copy link

raix commented Dec 9, 2016

@luixal I have to investigate

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants