-
-
Notifications
You must be signed in to change notification settings - Fork 235
Mongo backed queue #667
Mongo backed queue #667
Conversation
The worker package now observes a Mongo backed queue for jobs, which is handled by a new package job-manager. The collection observers have been split out into a separate package to enable them to be limited to a single instance when scaling. Standard packages has also been updated to include job-manager and collection-observers.
|
This PR would fix the issue with multiple servers trying to upload the same file if the share db? |
|
The current design expects the collection-observer package to be running in a single instance to ensure only one app has the responsibility of triggering the collection to emit two key events events, so
I'm thinking though that the better design here will be for the job-manager to expect to receive the same event multiple times, and to drop any duplications by checking if a job already exists in the queue, as that would reduce the complexity and tighten things up. |
|
@rhyslbw so, right now I can do that changes and test it? One app wil have standard-package and S3 packages while the other one will have the packages you've listed + S3. That would allow me uploading from both apps, no? Great work, btw. |
|
@bitomule Yes that's right, but be aware the worker function isn't handling errors yet to mark the job failed, but I'll look at that within the next couple of days. Actually in your case you want to add cfs:worker to the secondary app too (list updated) |
|
Tested and error is still there. Uploaded in app B and app A crashes. :( |
|
Is the crash happening at the same point @bitomule ? I can't see any collection observers that would be causing this, which should be the only way for the second app to be triggered. Since my use case has been inserting images on the server I've not put any time into the upload scenario at this point, but I'll do some tests. In the mean time if you better understand the problem let me know. |
|
@raix I'm stuck with a problem relating to events that I could use your help with. The event listeners registered by the FS.Collection by calling into FS.JobManager such as this are being triggered for all collections registered in the app, which results in duplicate jobs being queued. Does anything stand out? The events are being emitted correctly |
|
Hi @rhyslbw , this is the error I get: stream.js:94 I got this one in the server with standard-packages installed, uploading from the server without it. |
|
@rhyslbw do you think it would be possible to fix the multiple instances issue soon or it would need big changes? I want to help but CollectionFS is a huge project. |
|
@bitomule Also outside of the scope of this development right now is the use of the Filesystem SA since this presents it's own set of challenges when running the app on multiple servers. If you were using GridFS for the tempstore then this error should go away. I'd suggest looking at using a cloud storage solution, S3 being the obvious choice. |
…ctions, and formalisation of FS.CollectionObservers - Now FS.CollectionObservers, registered within the collection if installed. - renamed JobManager method from ‘listen’ to ‘register’ - removeStoredData was not previously being handled, resulting in stored data being left in tact after FS.Collection doc removal. - Workaround implemented to deal with known issue of event listeners in JobManager firing for each FS.Collection in app.
|
@rhyslbw is it because multiple instances each will add a task to the manager? |
|
@raix no it's the same instance. The single event is being heard by the |
…r-side beginStorage to fsFile prototype Also updated jobCollection workers to better track success of job
|
I've just pushed another commit to the branch, so please update and review. I needed to implement the TempStore transfer queue as a way to govern the process of inserting a batch of files on the server, as previously it would causes the write streams to all start simultaneously. I used PowerQueue since round tripping to the db was unnecessary. On the client the HTTP Upload Queue provides the control to govern the process, but I can't see how or where the completion of the upload triggers the transfer to the TempStore. I can see the fileObj emits an uploaded event here, but there doesn't appear to be a .on method to hear it. Can anyone shed some light on this? |
|
I've just pushed a version that consolidates the worker tasks into a single queue, to enable use of the Job-Collection priority feature. The saveCopy jobs are ranked the highest priority since the others are cleanup operations. Queue is now also running in JobManager, with the worker registering in like this: I've also put together a small demo app that outputs information from the JobCollection based on a server side insert, and then removal. |
|
@rhyslbw Great work! will tryout the demo this weekend, |
|
Hi @rhyslbw, I have two cfs collections, that contain in total about 10k images, the current problem that the observers alldone query "added" callback gets triggered for every stored file gets now more problematic as it generates a lot of remove temp file jobs everytime you restart/redeploy the app. Maybe an idea would be that the cfs collection entry get flagged after the have been processed by the job like this: and instead of making the allDone query make an allDoneButNotCleanedUp query. |
|
Any news when this will get merged? |
|
It's still early days on this. The design hasn't been reviewed yet, so I'd still treat this as a review PR that's WIP |
|
anything I can do? |
|
We really need input from @CollectionFS/owners |
|
@mxab I've push a critical bug fix that you will want to get into your system. |
|
Hi @rhyslbw in my current setup i have two instance of my app running on different servers, will this work if I have a filesystem as my tempstore or may this run into an error if the one machine saves it on its tempstore and the other one triggers the processing? |
|
@mxab You'll see errors as the saveCopy jobs going into the queue are not restricted to the process that created them, and they won't have access to the store. It's possible for this requirement to be included, resulting in the worker only observing for jobs created in the current process, but this is extra complexity and not ideal for scaling. S3 as the TempStore is the best option if GridFS isn't feasible |
|
@rhyslbw it depends how you see scaling, we have a geo dns routing approach therefore the gridfs/s3 approach wouldn't scale that much.
Would be nice if we could somehow define a policy for that, either for a setup that each node has its own FileSystem tempstore and it's the only one that can be scheduled for the processing job. Another setup might be that the you have several nodes and only the one close to the S3/GridFS tempstore is allowed to process it. But I agree this introduces extra complexity so we might schedule this after the PR is merged :) |
|
@mxab Yes it's going to be quite variable based on use case. I actually see the future of TempStore cloud storage adaptors uploading direct from the client (like edgee:slingshot), and then any server-side queue tasks running in a separate process or on the current Meteor server for simple setups. If FileSystem is set for the TempStore, then it should be considered that the jobs created can only be completed by the initiating process. |
|
Do you think for know to work around the the FileSystem tempstore problem would be to restrict the queue at with of course this makes only sense in my usecase, but in more general terms would be to have a callback that checks if this processor should process it and somewhere in the application something like: |
|
My thought would be to modify the observer query so the queue doesn't poll the job collection unnecessarily |
|
even better, I just haven't seen this |
|
Ok it looks like is working, I added this somewhere in application code I think it is only important when I do that that I have a fixed node id otherwise I guess jobs won't be picked up after restart as the node id is otherwise generated randomly Any optimisation suggestion @rhyslbw ? |
|
I moved it to config, should I make a PR of this? |
|
Yep I'll accept that into my branch. nice work |
|
@rhyslbw Are you using this branch in production now? |
|
Hey @brianlukoff I'm not actually using cfs anywhere right now as I decided to implement a custom solution to better suit my specialised needs. I believe @mxab and @bitomule are though |
|
@brianlukoff We're using it a mmmelon.com for web and iOS app without issues so far. The only issue we had is with traffic and digital ocean as we're using amazon S3 for temp files. We're moving to ec2 soon to avoid that traffic. |
|
@raix is that merge solving the multiple instances problem? I understand the problem, but I've got lost with so many forks, issues, branches about this... is there anything in the docs? Thanks! |
|
@luixal I have to investigate |
WIP, submitting now for review
The worker package now observes a Mongo backed queue for jobs, which is
handled by a new package job-manager. The collection observers have
been split out into a separate package to enable them to be limited to
a single instance when scaling. Standard packages has also been updated
to include job-manager and collection-observers.
Implementation Issues