Skip to main content

Processing a large dataset in less than 100 lines of Node.js with async.queue

If you’re more of a skip to the code person, check out the gist here.

caolan’s async.queue to the rescue


To fix the call stack issue needed to manage API calls by pushing them into a queue where they could be processed in parallel. 

Pushing items to the queue

Image IDs are in a newline delimited JSON file. First convert this file into a JSON object using readFileSync. The object contains a list of image IDs and in queue want to send each image to the Vision API. The queue takes a task (in this case my object of image IDs) and a callback function, called when the worker is finished processing:

q.push(imageIds, function (err) {
 if (err) {
  console.log(err)
 }
});


Defining the queue

The queue takes a function and a concurrency number as parameters. Let’s start with the function: we pass it a task (our image ID from above) and a callback, which will be called when the worker completes a task. Inside the function is where do image processing.
This function should return some JSON about the image which want to write to a local JSON file. Will define that in the next step.

                        Concurrency tells Node.js the maximum number of workers to process our task in parallel. Playing with the number until  found a balance of something that wasn’t too slow, but also didn’t result in API limits or call stack errors. The number will vary depending on what you’re doing, so it’s definitely ok to fine tune it by hand until you find your “magic number.” Here’s queue:

let q = async.queue(callVision, 20);

Processing images


Last, it’s time to write the callVision() function referenced above. This part isn’t exactly async.queue specific, but it’s still important because it’s the meat of my queue task. 
Here using Google’s Cloud Vision API for image analysis, and use the Google Cloud Node.js module to call it. Once get a JSON response for each image, create a JSON string of the response to write to a newline delimited JSON file (using this format because it’s  what BigQuery expects, which is where will be storing the data eventually). Once this function completes, the data is sent back to the queue where it is written to local JSON file. You can find all of the callVision() code in the gist.

That’s it!  you’ve done something interesting with async.queue 



                                                                                        *Sara Robinson






Comments

Popular posts from this blog

Design Tools to Help You Create Your Next Project- Part 3

Coolors Coolors   is a super fast color scheme generator. You can explore thousands of pre-existing color schemes (each one features five colors). Or, you can generate your own in a matter of minutes. Once you go to the “generate” page, hit the space bar to start with a different color scheme, and then you can adjust each color’s hue, saturation, and brightness accordingly. Web Gradients Web Gradients   is a collection of almost 200 background gradients, created by the  itmeo  team. You can use each of these content backdrops for any part of your website. You’ll find a .PNG version of each gradient, as well as easy-to-copy CSS3 crossbrowser code. Bonus: there are even curated packs for  Sketch  &  Photoshop . Color Hunt On  Color Hunt , browse through countless palettes, comprised of four colors each. You can browse and sort through the list based on what’s hot and popular, or just pick “random” and see what comes u...

WordPress 4.8 Release Candidate 2 on June 1, 2017

The second release candidate for WordPress 4.8 is now available. To test WordPress 4.8, you can use the  WordPress Beta Tester  plugin or you can  download the release candidate here  (zip). We’ve made  a handful of changes  since releasing RC 1 last week. For more details about what’s new in version 4.8, check out the  Beta 1 ,  Beta 2 , and  RC1  blog posts. Think you’ve found a bug?  Please post to the  Alpha/Beta support forum . If any known issues come up, you’ll be able to  find them here . Happy testing!                                                                                                                   ...

#PHP 5.6 is now in security fix only mode, which runs until the end of 2018. Plan your upgrades to PHP 7!- The 5th Annual China PHP Conference

Supported Versions Each release branch of PHP is fully supported for two years from its initial stable release. During this period, bugs and security issues that have been reported are fixed and are released in regular point releases. After this two year period of active support, each branch is then supported for an additional year for critical security issues only. Releases during this period are made on an as-needed basis: there may be multiple point releases, or none, depending on the number of reports. Once the three years of support are completed, the branch reaches its end of life and is no longer supported. Currently Supported Versions Branch Initial Release Active Support Until Security Support Until 5.6   * 28 Aug 2014 2 years, 8 months ago 19 Jan 2017 4 months ago 31 Dec 2018 in 1 year, 7 months 7.0 3 Dec 2015 1 year, 5 months ago 3 Dec 2017 in 6 months 3 Dec 2018 in 1 year, 6 months 7.1 1 Dec 2016 5 months ago 1 Dec 2018 in 1 year, 6 months 1 Dec 2019 in ...