Covering Scientific & Technical AI | Wednesday, November 27, 2024

ADP, Dow Jones Help Mainstream Node.js 

The one thing that all enterprises are always on the hunt for is a hardware or software technology that can significantly speed up an existing process or, in some cases, make a new process possible that was not feasible in the past. By definition, EnterpriseTech is very keen on such technologies and it looks like yet another software technology that was spawned in the hyperscale world, called Node.js, is getting set to go mainstream.

The "js" in the name is short for JavaScript, the popular browser scripting language that glues much of the Web together. And as such, it is not exactly the kind programming language that you would expect on back-end systems. But this is precisely what is happening and the reason is simple: Companies want a faster means of moving and manipulating data than they have with Java, PHP, Ruby, or Python, and the fact that so many programmers are familiar with JavaScript already makes it relatively easy for them to be productive on Node.js from the get-go.

Not that Node.js coders are not comically sensitive about this. Joyent, the public cloud company that has designed its own variant of Solaris called SmartOS to underpin its system, is the sponsor of the Node.js project and Bryan Cantrill, chief technology officer at Joyent and formerly a Solaris kernel and Java engineer as well as the co-inventor of the DTrace dynamic tracing tool that makes Solaris so useful, went on a bit of a rant about how JavaScript is perceived in the datacenter at a local Node on the Road user meeting in New York City this week.

"I was at Sun for 14 years and when Sun was invaded by the Nazis I began to look around and what I saw in Joyent and with Node.js in particular was something of a Java phenomenon," Cantrill explained as the room erupted into laughter in reference to the Oracle acquisition of that venerable Unix system provider. "Not in terms of the technology in particular, but because it was in exactly the right place at exactly the right time. I had used JavaScript initially on the client side but then started to use it on the server side. I was dividing my time between C and JavaScript and as a kernel developer I can tell you that it is very difficult to accept that you may have fallen in love with JavaScript. This is the forbidden fruit of kernel development and there is a lot of shame. But you know what? I own it, I am proud of it, I love JavaScript, and I am loud an proud. Node was exactly what I was looking for. It was JavaScript with a ripping VM in V8 and with the Unix philosophy around it: Build small tools, do well-defined things, and you can stitch them together into complex systems much more easily than these kind of barcoded up frameworks. What I saw in Joyent was a Java do-over. I didn't like what happen to Java at Sun."

The V8 referenced above is not a vegetable juice accompanied by the flat palm forehead plant or the 485 horsepower 6.4 liter Hemi engine in the Dodge Challenger SRT, but rather the V8 JavaScript engine written in C++ by Google and embedded in its Chrome browser to give it the performance it most definitely shows over Internet Explorer, Mozilla, Safari, Opera, and others. The V8 engine debuted in 2008 and the big new idea in it was to take JavaScript code and compile it right down to machine language to make it run fast and then go back after the code was executing to do dynamic performance tuning. On the Web, everyone wants the performance now, not after the virtual machine's just-in-time compiler warms up.

Soon after the V8 engine was launched as an open source tool, Ryan Dahl, a software engineer at Joyent, decided to use the V8 engine on the servers instead of a browser and create a set of applications in JavaScript instead of Ruby and this Node.js was born. Today, Node.js is a server development runtime environment, and one that just so happens to use a programming language that is very familiar to millions of programmers. That language can now be used on the front-end of applications or on the back-end and the interesting bit is that Node.js has an asynchronous communication layer that it scalable and non-blocking. While Node.js is admittedly not for every application – you would not use it for jobs where there is a heavy compute or number-crunching company, but you could use Node.js for quickly pulling data from databases, NoSQL data stores, or file systems and passing it around for processing. It is lightning fast for Web application front ends and companies that had been working with Java, PHP, or Ruby for such work are tearing out that code and replacing it with Node.js. And interestingly, once they get experience with that they are starting to embed Node.js in back-end applications where Java is by far the dominant language.

Two such customers made presentations at the Node on the Road event in New York. The first was Automatic Data Processing, the $11.3 billion company that does payroll and benefits processing for more than 620,000 organizations in 125 countries. The company manages one in five payrolls in the United States and one in ten worldwide. About three years ago, to get closer to the hottest programming talent in the New York region (ADP Is located in Roseland, New Jersey, not exactly where hip young programmers want to live), the payroll processor set up ADP Innovation Labs in the Chelsea section of the Big Apple, not too far from Google's digs, and now has 85 software engineers there plugging away on new technologies.

Roberto Masiero, who is senior vice president of ADP Innovation Labs, said the very first project undertaken three years ago was a mobile front-end to its applications. At the time, explained Masiero, Node.js was not ready, even though Cantrill said that it was ready enough for deployments inside of Joyent. (And there is the difference between a hyperscale shop and an enterprise shop in a nutshell.) This app, said Masiero, was fairly primitive in that it was coded for BlackBerry mobile phones and written in HTML 1.0 and CSS 2.0 and JavaScript was strictly forbidden; it was a very "circa 1985" application, he quipped.

Fast forward a few years and the new application that ADP Innovation Labs is working on is called Semantic Search. "What we wanted to do is create a search engine that spanned all of the objects at ADP," explained Masiero. "We have a lot of objects at ADP. Human capital management and payroll is a complicated thing, and we do benefits and retirement, and so on. We needed a search engine that would go across all of those objects. And we wanted to do even more, such as instead of just looking at nouns, we do verbs, too."

So, for instance, end users can search for people and open positions at their firms and hire them by simply typing into the Semantic Search engine:

node-js-adp-semantic-search

"We needed to be very fast. We wanted the same thing that Google gives you, millisecond response – very, very fast. When the user issues a query predicate, we needed to break this into two queries, one against metadata and one against the index data itself. We need to parallelize these things like crazy because we decided to use Instant [the instant search add-on for the Apache Solr search engine], which means that every character that you put on the predicate is a new query firing against the server. We said, I don't think PHP can do that, and so we decided to test this Node thing."

That was about two and a half years ago. "We wrote it in Node and it was awesome. The thing just screams. It is every easy even though you are doing very complicated manipulation of the predicate."

The stack for the Semantic Search starts with Linux, of course, and the Nginx high-performance Web server that is used by a slew of Web properties, including Netflix, Hulu, Zappos, Pinterest, Airbnb, and Zynga. The search engine is Apache Solr and metadata describing the objects in the ADP application portfolio is stored in the MongoDB NoSQL data store. Node.js is what glues it all together.

"When we demoed this on the tablet we added voice commands and it's cool. You can literally walk the hallways and say, 'Fire John!'"

Having created their mobile application for smartphones with PHP, which now has a million users, Masiero said that when they started to think about creating one for tablets, sticking with PHP just "didn't sound right." The reason is that ADP wanted to code its application dashboards for end users as separate little tiles, each portion of it being updated in parallel using the asynchronous back-end of Node.js. This is how modern Web pages work, with elements loading separately rather than all at once. The application is aware of your location and the dashboard changes based on where you are. For instance, it will show you your retirement benefits when it knows you are at home but your daily planner schedule when it knows you are in the office. (The app is able to use proximity sensors installed at employers to see people check in and check out of work automatically.)

Now here's the interesting bit. All of this Node.js front end is running on six servers, three apiece in each of ADP's two datacenters. That is how little iron it takes to grab data from all of those ADP applications and rip it out to those users. This is important. ADP is facing is the same issue one that most enterprises are wrestling with: What happens when you expose applications to users who have smartphones and tablets and who can flood into the system at any time? There are no predictable patterns to user access anymore – everyone is always working – and that means your systems have to be architected to run as efficiently as possible and be able to handle large peaks. ADP was one of the early adopters of IBM mainframes and still has these systems at its core today, but they are cloaked in layers of systems and machines to make the apps snappy and modern. Node.js is a key part of that, and specifically, it creates what Masiero calls an API Multi Proxy, or APIMP for short. This sits behind some BIG-IP firewalls from F5 Networks and in front of all of the ADP applications.

node-js-adp-apimp

As is common with users of open source technology, ADP is giving back to the community. It has created a tool called PigeonKeeper, an example of what is called a directed acyclic graph engine. The problem this solves is easy enough to say, but hard to do: How do you know when all of the tiles in a disaggregated application coded in Node.js are all updated? (In other words, how do you know when all of the pigeons have come home to roost?) PigeonKeeper doesn't just watch these processes, but actually orchestrates them based on the dependencies between the tiles in the application. ADP has also created an API for querying JSON documents, called JQL, which is based on a prior tool called JSONPath and that allows for updates and deleting of items stored in a JSON document. This is important to ADP because the MongoDB data store uses the JSON format for storing data. The company is in the process of working through its legal department to open source PageKeeper and JQL; Masiero has no idea how long that will take.

Over at Dow Jones, part of the Fox empire and notably the publisher of the Wall Street Journal, the conversion to Node.js for Web and streaming applications is nearly universal, explained Scott Rahner, an IT manager at the media giant. Dow Jones started out in early 2011 with Node for a Facebook reader for the online version of the paper called WSJ Social, and the same issues of speed and scalability applied. With over 1 billion users on Facebook, it was hard to predict what the traffic might look like. Because Dow Jones had plenty of JavaScript programmers, it was able to bring out WSJ Social, from design to production, in three months.

Since that time, the whole WSJ site as well as WSJ.D and Barron's are all coded completely in Node.js on and a service called Real-Time, which provides access to the Dow Jones News Service and other market feeds to financial institutions, was also coded in Node.js – again, mainly because of the need for speed and because of the familiarity with JavaScript. The company has built its own application framework to wrap around Node.js applications, called Tesla, to make it even easier to code. Incidentally, when Dow Jones needed more programmers to work on the Real-Time project, it took 150 C# programmers over and they picked it all up pretty fast, says Rahner. Dow Jones is looking at where Node.js will play into its MarketWatch and Factiva services, and it is already deployed behind its e-commerce and feed management systems.

"We have basically made it the gold standard," says Rahner of Node.js. "If you are going to build a Web app at Dow Jones, it is probably going to be in Node and you have to have a good excuse why it needs to be in another language. We have also created a core team whose job it is to evangelize about Node and Tesla."

Node.js may have got its start on the Google Chrome browser but it is picking up back in the stack, as the Semantic Search service at ADP demonstrates. Node.js is also a key component of Joyent's own Manta object storage service, which is based on the ZFS file system originally created by Sun and including integrated data analytics and compute. Other applications at Groupon, Wal-Mart, LinkedIn, and PayPal have already been created using Node.js and no doubt plenty more will follow wherever non-blocking I/O and streaming is the key attribute of the application.

"The Streams API is pretty powerful, and is used a lot, especially in the extract, transform, load path," explained TJ Fontaine, one of the software engineers at Joyent and the lead for the Node.js project that the company sponsors. "I have personally done this a number of times. There are people doing big data and Node today. But one thing to keep in mind is that JavaScript is not always the right tool for the job. If you are doing a lot of high-precision, computational, transaction things, probably you don't want to be using JavaScript anyway. But shuttling data from one service to another, performing analysis on that and using Node as the glue for the rest of your infrastructure, this is a perfect fit. Big data and Node are in love."

Node.js needs to mature a bit and see more widespread installation for enterprises to get truly comfortable with it, much as has been the case with all open source tools and indeed any new technology. The V8 engine was created for X86 machines and Google is working to get the V8 engine ported to ARM processors already. Now that Google and IBM are working together in the OpenPower Foundation, it is very likely that the V8 engine will be ported to the Power architecture at some point, too.

In the meantime, the Node.js community is working to get its 0.12 release out the door and towards the 1.0 release that will have all of the bells and whistles necessary for enterprise-grade applications. Fontaine did not provide a timetable for when either Node.js 0.12 or 1.0 would be available. The good bit, said Fontaine, is that to be compatible with Microsoft's IIS web server, Node.js had to disable heartbeat functions, and therefore, it was never susceptible to the Heartbleed security breach that is causing all kinds of grief out there on the Internet. The 0.12 release sports improvements in the transport layer security (TLS) and cryptography module with PayPal showing client connections running 50 percent faster in early tests and support for dynamic tracing to debug Node.js programs. The update also includes Streams3, another revamp on the streaming I/O capability in the Node stack. Longer term, with the 1.0 release, Node.js will include a C API that will allow programmers coding in C or C++ to target the Node.js backend.

AIwire