Skip to main content

Processing a large dataset in less than 100 lines of Node.js with async.queue

If you’re more of a skip to the code person, check out the gist here.

caolan’s async.queue to the rescue


To fix the call stack issue needed to manage API calls by pushing them into a queue where they could be processed in parallel. 

Pushing items to the queue

Image IDs are in a newline delimited JSON file. First convert this file into a JSON object using readFileSync. The object contains a list of image IDs and in queue want to send each image to the Vision API. The queue takes a task (in this case my object of image IDs) and a callback function, called when the worker is finished processing:

q.push(imageIds, function (err) {
 if (err) {
  console.log(err)
 }
});


Defining the queue

The queue takes a function and a concurrency number as parameters. Let’s start with the function: we pass it a task (our image ID from above) and a callback, which will be called when the worker completes a task. Inside the function is where do image processing.
This function should return some JSON about the image which want to write to a local JSON file. Will define that in the next step.

                        Concurrency tells Node.js the maximum number of workers to process our task in parallel. Playing with the number until  found a balance of something that wasn’t too slow, but also didn’t result in API limits or call stack errors. The number will vary depending on what you’re doing, so it’s definitely ok to fine tune it by hand until you find your “magic number.” Here’s queue:

let q = async.queue(callVision, 20);

Processing images


Last, it’s time to write the callVision() function referenced above. This part isn’t exactly async.queue specific, but it’s still important because it’s the meat of my queue task. 
Here using Google’s Cloud Vision API for image analysis, and use the Google Cloud Node.js module to call it. Once get a JSON response for each image, create a JSON string of the response to write to a newline delimited JSON file (using this format because it’s  what BigQuery expects, which is where will be storing the data eventually). Once this function completes, the data is sent back to the queue where it is written to local JSON file. You can find all of the callVision() code in the gist.

That’s it!  you’ve done something interesting with async.queue 



                                                                                        *Sara Robinson






Comments

Popular posts from this blog

Pay Per Click by AppWorks Technologies Pvt Ltd

Microsoft launches a new AI startup program at Station F in Paris

Microsoft is rethinking its strategy when it comes to startup acceleration in Paris. The company is going to focus on artificial intelligence. This will lead to a new program for AI startups at  Station F . Microsoft has had a startup accelerator in the Sentier neighborhood for a few years now. When Station F opens at the  end of June , the company is going to focus exclusively on artificial intelligence with a partnership with  INRIA  and move everything to the startup campus. “We think that we’re first going to select 5 or 6 startups that can foster an ecosystem around INRIA and themselves,” Microsoft France Developer Experience leader Christophe Shaw told me. “The idea is that we’re eventually going to have a hundred startups in this club.” And the first startup joining this program is  Recast.ai . This French startup has been building a service that helps you build, launch and manage chatbots. The company also takes care of hosting those bots. A few big French companies hav

Making telescopes that curve and twist

New computational tool automates design of telescoping structures that provide compact storage and rapid deployment. Shown here: a complex telescoping lizard expands to many times its original volume, serving as a stress test of the method. Credit: Chris Yu/Carnegie Mellon University A new tool for computational design allows users to turn any 3D shape into a collapsible telescoping structure. New mathematical methods developed by researchers at Carnegie Mellon University capture the complex and diverse properties of such structures, which are valuable for a variety of applications in 3D fabrication and robotics—particularly where mechanisms must be compact in size and easily deployable. The research, "Computational Design of Telescoping Structures," led by Carnegie Mellon Professors Stelian Coros and Keenan Crane and PhD student Christopher Yu, will be presented at the annual SIGGRAPH conference, 30 July to 3 August in Los Angeles. The conference each year spotlights the