Mongo backed queue #667

rhyslbw · 2015-04-30T05:37:32Z

WIP, submitting now for review

The worker package now observes a Mongo backed queue for jobs, which is
handled by a new package job-manager. The collection observers have
been split out into a separate package to enable them to be limited to
a single instance when scaling. Standard packages has also been updated
to include job-manager and collection-observers.

Implementation Issues

Event listeners on the FS.Collection within JobManager are firing for all registered collections in app
When a remove is observed on a FS.Collection, I'm stringifying the document so an FS.File instance can be passed into FS.TempStore.removeFile and FS.StorageAdapter.prototype.remove. Are we happy to have this much data being passed around in a job?
We need a sensible default and configuration capability to deal with managing the JobCollection's completed jobs

The worker package now observes a Mongo backed queue for jobs, which is handled by a new package job-manager. The collection observers have been split out into a separate package to enable them to be limited to a single instance when scaling. Standard packages has also been updated to include job-manager and collection-observers.

bitomule · 2015-04-30T07:25:12Z

This PR would fix the issue with multiple servers trying to upload the same file if the share db?

rhyslbw · 2015-04-30T07:42:34Z

@bitomule I believe the end result will, but this is very much a WIP. This issue right? #635

bitomule · 2015-04-30T07:54:08Z

@rhyslbw here's my issue #635

How can I help testing your PR?

rhyslbw · 2015-04-30T08:52:18Z

The current design expects the collection-observer package to be running in a single instance to ensure only one app has the responsibility of triggering the collection to emit two key events events, so standard-packages cannot be used where you will have multiple instances, or installed in multiple different applications using the same db. In your case a way to handle this without moving your cfs implementation into a custom package and then creating a small app to handle the observer is by maintaining two versions of your existing app, one with standard-packages and the other with the following:

cfs:base-package
cfs:file
cfs:collection
cfs:collection-filters
cfs:access-point
cfs:job-manager
cfs:worker
cfs:upload-http

I'm thinking though that the better design here will be for the job-manager to expect to receive the same event multiple times, and to drop any duplications by checking if a job already exists in the queue, as that would reduce the complexity and tighten things up.

bitomule · 2015-04-30T10:06:03Z

@rhyslbw so, right now I can do that changes and test it?

One app wil have standard-package and S3 packages while the other one will have the packages you've listed + S3. That would allow me uploading from both apps, no?

Great work, btw.

rhyslbw · 2015-04-30T10:44:54Z

@bitomule Yes that's right, but be aware the worker function isn't handling errors yet to mark the job failed, but I'll look at that within the next couple of days. Actually in your case you want to add cfs:worker to the secondary app too (list updated)

bitomule · 2015-04-30T13:58:23Z

Tested and error is still there. Uploaded in app B and app A crashes. :(

rhyslbw · 2015-04-30T23:51:40Z

Is the crash happening at the same point @bitomule ? I can't see any collection observers that would be causing this, which should be the only way for the second app to be triggered. Since my use case has been inserting images on the server I've not put any time into the upload scenario at this point, but I'll do some tests. In the mean time if you better understand the problem let me know.

rhyslbw · 2015-05-01T07:52:16Z

@raix I'm stuck with a problem relating to events that I could use your help with.

The event listeners registered by the FS.Collection by calling into FS.JobManager such as this are being triggered for all collections registered in the app, which results in duplicate jobs being queued. Does anything stand out? The events are being emitted correctly

bitomule · 2015-05-01T08:26:40Z

Hi @rhyslbw , this is the error I get:

stream.js:94
throw er; // Unhandled stream error in pipe.
Error: ENOENT, open '/home/action/workspace/myapp/.meteor/local/cfs/files/_tempstore/files-D7801A75-A784-4E05-A32E-BE17D3D6A7E3-0.chunk'

I got this one in the server with standard-packages installed, uploading from the server without it.

bitomule · 2015-05-02T18:41:45Z

@rhyslbw do you think it would be possible to fix the multiple instances issue soon or it would need big changes? I want to help but CollectionFS is a huge project.

rhyslbw · 2015-05-03T04:31:12Z

@bitomule Also outside of the scope of this development right now is the use of the Filesystem SA since this presents it's own set of challenges when running the app on multiple servers. If you were using GridFS for the tempstore then this error should go away. I'd suggest looking at using a cloud storage solution, S3 being the obvious choice.

…ctions, and formalisation of FS.CollectionObservers - Now FS.CollectionObservers, registered within the collection if installed. - renamed JobManager method from ‘listen’ to ‘register’ - removeStoredData was not previously being handled, resulting in stored data being left in tact after FS.Collection doc removal. - Workaround implemented to deal with known issue of event listeners in JobManager firing for each FS.Collection in app.

raix · 2015-05-04T05:33:38Z

@rhyslbw is it because multiple instances each will add a task to the manager?

rhyslbw · 2015-05-04T06:27:02Z

@raix no it's the same instance. The single event is being heard by the on method of all collections. I've got a small dev app I will push up later today so you can run locally. It's got two FS.Collections each with two stores, and on startup I insert an image from the server into the first collection. A single event is fired for each of the stages, but the listeners in FS.JobManager fire twice, causing duplicate jobs. I get the expected outcome when commenting out the second FS.Collection instantiation.

…r-side beginStorage to fsFile prototype Also updated jobCollection workers to better track success of job

rhyslbw · 2015-05-06T13:49:41Z

I've just pushed another commit to the branch, so please update and review. I needed to implement the TempStore transfer queue as a way to govern the process of inserting a batch of files on the server, as previously it would causes the write streams to all start simultaneously. I used PowerQueue since round tripping to the db was unnecessary. On the client the HTTP Upload Queue provides the control to govern the process, but I can't see how or where the completion of the upload triggers the transfer to the TempStore. I can see the fileObj emits an uploaded event here, but there doesn't appear to be a .on method to hear it. Can anyone shed some light on this?

…o 1 queue

rhyslbw · 2015-06-03T20:42:36Z

I've just pushed a version that consolidates the worker tasks into a single queue, to enable use of the Job-Collection priority feature. The saveCopy jobs are ranked the highest priority since the others are cleanup operations. Queue is now also running in JobManager, with the worker registering in like this:
FS.JobManager && FS.JobManager.registerJob('saveCopy',saveCopy);

I've also put together a small demo app that outputs information from the JobCollection based on a server side insert, and then removal.

https://github.com/rhyslbw/cfs-job-manager-demo

raix · 2015-06-03T20:49:56Z

@rhyslbw Great work! will tryout the demo this weekend,

mxab · 2015-06-04T19:03:00Z

Hi @rhyslbw,
Nice work, I'm really looking forward to the release.
I tried packages of your demo setup on my project by linking all the packages in my packages folder, I guess this should work?

I have two cfs collections, that contain in total about 10k images, the current problem that the observers alldone query "added" callback gets triggered for every stored file gets now more problematic as it generates a lot of remove temp file jobs everytime you restart/redeploy the app.

Maybe an idea would be that the cfs collection entry get flagged after the have been processed by the job like this:

{ ...
"cleanedUp" : true
...  }

and instead of making the allDone query make an allDoneButNotCleanedUp query.

mxab · 2015-06-23T05:34:41Z

Any news when this will get merged?

rhyslbw · 2015-06-23T09:18:25Z

It's still early days on this. The design hasn't been reviewed yet, so I'd still treat this as a review PR that's WIP

mxab · 2015-06-23T11:15:15Z

anything I can do?

rhyslbw · 2015-06-23T16:40:44Z

We really need input from @CollectionFS/owners

…ogging

rhyslbw · 2015-06-26T13:12:53Z

@mxab I've push a critical bug fix that you will want to get into your system.

mxab · 2015-07-13T10:22:22Z

Hi @rhyslbw in my current setup i have two instance of my app running on different servers, will this work if I have a filesystem as my tempstore or may this run into an error if the one machine saves it on its tempstore and the other one triggers the processing?

rhyslbw · 2015-07-14T07:00:20Z

@mxab You'll see errors as the saveCopy jobs going into the queue are not restricted to the process that created them, and they won't have access to the store. It's possible for this requirement to be included, resulting in the worker only observing for jobs created in the current process, but this is extra complexity and not ideal for scaling. S3 as the TempStore is the best option if GridFS isn't feasible

mxab · 2015-07-14T07:33:00Z

@rhyslbw it depends how you see scaling, we have a geo dns routing approach therefore the gridfs/s3 approach wouldn't scale that much.
A worst case scenario:

User inserts file on an app node to Asia
Node stores it in temp store in Europe (S3 or GridFS)
Same node reads the process starts with the processing (repeat this step for every store)

Would be nice if we could somehow define a policy for that, either for a setup that each node has its own FileSystem tempstore and it's the only one that can be scheduled for the processing job. Another setup might be that the you have several nodes and only the one close to the S3/GridFS tempstore is allowed to process it.

But I agree this introduces extra complexity so we might schedule this after the PR is merged :)

mxab · 2015-07-14T14:00:42Z

@rhyslbw I extracted your tempstore fix into a pullrequest as I'm not sure when this branch will be merged #746

rhyslbw · 2015-07-14T23:47:19Z

@mxab Yes it's going to be quite variable based on use case. I actually see the future of TempStore cloud storage adaptors uploading direct from the client (like edgee:slingshot), and then any server-side queue tasks running in a separate process or on the current Meteor server for simple setups. If FileSystem is set for the TempStore, then it should be considered that the jobs created can only be completed by the initiating process.

mxab · 2015-07-16T14:03:47Z

Do you think for know to work around the the FileSystem tempstore problem would be to restrict the queue at
https://github.com/rhyslbw/Meteor-CollectionFS/blob/feature/jc-queue/packages/job-manager/server.js#L123

with

if(job.type == registeredJobWorker.type && job.sourceId === FS.JobManager.Config.nodeId)

of course this makes only sense in my usecase, but in more general terms would be to have a callback that checks if this processor should process it

if(job.type == registeredJobWorker.type && FS.JobManager.canProcess(job, registeredJobWorker.type))

and somewhere in the application something like:

FS.JobManager.canProcess = function(job, workerType){
 return calculateMyServerLoad()<0.3;
}

rhyslbw · 2015-07-16T14:17:27Z

My thought would be to modify the observer query so the queue doesn't poll the job collection unnecessarily

mxab · 2015-07-16T14:20:09Z

even better, I just haven't seen this

mxab · 2015-07-16T22:12:01Z

something like this?
https://github.com/rhyslbw/Meteor-CollectionFS/compare/feature/jc-queue...planet-training:feature/jc-queue-job-query-restriction?expand=1

mxab · 2015-07-17T10:37:16Z

Ok it looks like is working, I added this somewhere in application code

FS.JobManager.jobQueryRestriction = {
    'data.sourceId' : FS.JobManager.Config.nodeId
};

I think it is only important when I do that that I have a fixed node id

FS.JobManager.Config.nodeId = process.env.NODE_ID;

otherwise I guess jobs won't be picked up after restart as the node id is otherwise generated randomly

Any optimisation suggestion @rhyslbw ?

rhyslbw · 2015-07-17T14:42:24Z

@mxab That looks good, but we should probably put the jobQueryRestriction in the Config object so we can define all options in a single object in the application

mxab · 2015-07-17T18:07:18Z

I moved it to config, should I make a PR of this?

rhyslbw · 2015-07-18T01:46:40Z

Yep I'll accept that into my branch. nice work

brianlukoff · 2015-08-27T02:06:22Z

@rhyslbw Are you using this branch in production now?

rhyslbw · 2015-08-27T12:22:44Z

Hey @brianlukoff

I'm not actually using cfs anywhere right now as I decided to implement a custom solution to better suit my specialised needs. I believe @mxab and @bitomule are though

bitomule · 2015-08-28T19:51:50Z

@brianlukoff We're using it a mmmelon.com for web and iOS app without issues so far. The only issue we had is with traffic and digital ocean as we're using amazon S3 for temp files. We're moving to ec2 soon to avoid that traffic.

luixal · 2016-12-09T12:15:00Z

@raix is that merge solving the multiple instances problem?

I understand the problem, but I've got lost with so many forks, issues, branches about this... is there anything in the docs?

Thanks!

raix · 2016-12-09T16:57:11Z

@luixal I have to investigate

rhyslbw changed the title ~~Initial commit of changes to support multi-instance systems~~ Mongo backed queue Apr 30, 2015

This was referenced Apr 30, 2015

Added eachFileFromPath helper function #617

Closed

[1.2.0] Update the cfs-worker #214

Open

rhyslbw mentioned this pull request Apr 30, 2015

run collection FS with 2 meteor app #651

Closed

Fixed event name to use the new 'tempStoreTransferComplete'

d9b4e4a

Implemented PowerQueue to govern tempStoreTransfers, refactored serve…

af3b0ab

…r-side beginStorage to fsFile prototype Also updated jobCollection workers to better track success of job

Small bug fix

e91f870

rhyslbw mentioned this pull request May 9, 2015

S3/Dropbox directory synchronization/observing #676

Open

Bug fixes and refactoring to consolidate all jobCollections tasks int…

d2b1612

…o 1 queue

rhyslbw mentioned this pull request Jun 3, 2015

Unhandled stream error in pipe. production #642

Closed

rhyslbw mentioned this pull request Jun 8, 2015

concurrency setting ignored? vsivsi/meteor-job-collection#82

Closed

rhyslbw added 2 commits June 25, 2015 21:56

Fixes typo that would result in stream error not failing job + more l…

71b7084

…ogging

Fixes job.log echo option to default to false if FS.debug is undefined

16cb304

mxab mentioned this pull request Jul 6, 2015

App crash due to non existent id #408

Closed

This was referenced Oct 16, 2015

Useful Packages meteor/guide#30

Closed

Multiple Instances crash #834

Closed

raix merged commit 9752a27 into Meteor-Community-Packages:devel Dec 7, 2016

Uh oh!

Mongo backed queue #667

Mongo backed queue #667

Uh oh!

Conversation

rhyslbw commented Apr 30, 2015

Implementation Issues

Uh oh!

bitomule commented Apr 30, 2015

Uh oh!

rhyslbw commented Apr 30, 2015

Uh oh!

bitomule commented Apr 30, 2015

Uh oh!

rhyslbw commented Apr 30, 2015

Uh oh!

bitomule commented Apr 30, 2015

Uh oh!

rhyslbw commented Apr 30, 2015

Uh oh!

bitomule commented Apr 30, 2015

Uh oh!

rhyslbw commented Apr 30, 2015

Uh oh!

rhyslbw commented May 1, 2015

Uh oh!

bitomule commented May 1, 2015

Uh oh!

bitomule commented May 2, 2015

Uh oh!

rhyslbw commented May 3, 2015

Uh oh!

raix commented May 4, 2015

Uh oh!

rhyslbw commented May 4, 2015

Uh oh!

rhyslbw commented May 6, 2015

Uh oh!

rhyslbw commented Jun 3, 2015

Uh oh!

raix commented Jun 3, 2015

Uh oh!

mxab commented Jun 4, 2015

Uh oh!

mxab commented Jun 23, 2015

Uh oh!

rhyslbw commented Jun 23, 2015

Uh oh!

mxab commented Jun 23, 2015

Uh oh!

rhyslbw commented Jun 23, 2015

Uh oh!

rhyslbw commented Jun 26, 2015

Uh oh!

mxab commented Jul 13, 2015

Uh oh!

rhyslbw commented Jul 14, 2015

Uh oh!

mxab commented Jul 14, 2015

Uh oh!

mxab commented Jul 14, 2015

Uh oh!

rhyslbw commented Jul 14, 2015

Uh oh!

mxab commented Jul 16, 2015

Uh oh!

rhyslbw commented Jul 16, 2015

Uh oh!

mxab commented Jul 16, 2015

Uh oh!

mxab commented Jul 16, 2015

Uh oh!

mxab commented Jul 17, 2015

Uh oh!

rhyslbw commented Jul 17, 2015

Uh oh!

mxab commented Jul 17, 2015

Uh oh!

rhyslbw commented Jul 18, 2015

Uh oh!