Running Node.js apps in production

Frederic Hemberger
@fhemberger

Topics I'll talk about today:

  • Deployment
  • Run Node.js (and keep it running)
  • Metrics

Deployment

Deployment

Different popular deployment techniques:

  • Git Hooks
  • GitHub Webhooks
  • Capistrano, Fabric, deploy.sh, et. al.

Git Hooks

Pushing to Git remote on your server


# ./git/hooks/post-receive
cd /var/www/myapp.com
git pull
npm install --production
service myapp restart
...
                    

Done.

Git Hooks

Pro:
  • Easy for the developer: Just push to production (aka fire and forget)
  • Hosting-Platforms like Heroku use this method as well
Con:
  • But what happens on the server?
  • Deployment knowledge is stored separately from code
  • When deploying on multiple servers, post-receive hooks must be in sync
Solution:

Add the deploy script to your repository and symlink to post-receive-hook.

GitHub Webhooks

GitHub Webhooks

GitHub Webhooks

GitHub Webhooks

Pro:
  • When the rest of your development work already resolves around GitHub, it integrates nicely into the workflow
Con:
  • Hooks run all independently in parallel:
    E.g. if the CI hook fails, the webhook for deployment still gets triggered.
    Some CI services like Travis CI offer their own hooks to trigger a deployment afterwards.
  • Critical dependency for your deployment:
    Remember, even GitHub is down or gets DDoS'ed from time to time
  • Requires server component running update script.
    Must be secured to not accept fake payload or mess up deployment.

Capistrano, fabric, deploy.sh, et. al.

  • Remotely checks out your code from a repository
  • Directory is named after current date and/or revision
  • Symlinks it to current

deploy_directory
├─┬ releases
│ ├── 20140319001122
│ └── ...
├─┬ shared
│ ├── log
│ ├── pids
│ └── system
└── current ⇨ releases/20140319001122
                    

Capistrano, fabric, deploy.sh, et. al.

Additionlly triggers scripts that can:

  • restart the web server
  • create a database and it's scheme
  • install/update your app's dependencies

Capistrano, fabric, deploy.sh, et. al.

Pro:
  • Clean server side application structure (including logs, shared files, etc.)
  • Trigger arbitrary scripts before/after the deployment
  • Quickly rewind to previous deployment on error
Con:
  • Introduces another language as additional dependency
    (Capistrano: Ruby; Fabric: Python)

Run Node.js
(and keep it running)

Run Node.js (and keep it running)

Start the script as a daemon:

  • Nodemon/node-forever (written in Node.js)
  • supervise (UNIX daemontools)
  • Upstart (Ubuntu)

Example Upstart script


start on runlevel [2345]
stop on runlevel [06]

respawn
respawn limit 5 60

NODE_SCRIPT = /var/www/myapp/server.js
LOGFILE = /var/log/myapp.log

exec start-stop-daemon --start --chuid node \
     --exec /usr/local/bin/node -- \
     $NODE_SCRIPT >> $LOGFILE 2>&1
                    

More elaborate: PM2

Process manager with built-in load-balancer

PM2

Monitor processes

Whatever method you use to run your applications:

Startup scripts should …

  • … be as general as possible (only path, environment, main JS file)
  • not contain configuration settings for your application
  • … be included alongside your deployment (symlink if necessary)
  • … be kept under version control as well

There are at least two occasions,
where your app will not be available:

  • While deploying a new version
  • On application errors/exceptions

Deployment

Downtime during deployment should be kept to a minimum:

  • Only deploy tested code to production
  • Automate the entire deployment process
  • Use a cluster to reload workers
    (complete app restart is only needed if the master changes)

recluster

wrapper around Node.js's own cluster module


// cluster.js
var recluster = require('recluster'),
    path      = require('path')
    cluster   = recluster(path.join(__dirname, 'server.js'));

process.on('SIGUSR2', function() {
    console.log('Got SIGUSR2, reloading cluster ...');
    cluster.reload();
});

cluster.run();
                    

Reload cluster workers: kill -s SIGUSR2 <cluster_pid>

recluster


// server.js
server.on('close', function() {
    // cleanup
});
                    

Errors/Exceptions

Different categories of errors:

  • Hardware/network errors:
    You're screwed, can't do much about it.
  • Component errors:
    Database not responding, files missing, wrong access privileges
    Throw an exception, exit application (check your restart script!)
  • Programming errors:
    Testing your code is great, but some bugs will eventually slip through.
    Hardly assessable level of impact, try to fail gracefully
  • Usage errors:
    Validate inputs, inform the user and offer guidance

Errors/Exceptions

  • Bind error handling to individual parts of your application
  • Those parts may differ in error handling: e.g. request errors, input parsing, external APIs/services
  • Try to resolve errors with minimum impact to the overall application:
    • Unable to connect? => Notify the user, log error, try again
    • Invalid input? => Notify the user, stop processing
  • Try to get focused stack traces: Easier for debugging

Metrics

Metrics help you to see

  • What are people really doing?
    How do they use the application?
  • What errors do occur?
  • Where are bottlenecks?
  • Is someone messing with your app?

Metrics: Monitoring

What is going on?

  • CPU load, memory usage, Node.js heap size
  • HTTP requests, response times
  • Database monitoring, CPU/memory profiling, alerts

Monitoring: look

Pro:
  • Open Source
Con:
  • Older fork of Nodetime (two years old)

Monitoring: Nodetime, New Relic, etc.

(Commercial Products)

Pro:
  • Many different metrics
  • Free tier
Con:
  • Free tiers are very limited:
    Nodetime: Only one process(!), New Relic: Only 24h data retention
  • May not be suitable for smaller or low-traffic projects
  • Smallest plans:
    Nodetime: 99$/month, New Relic: 149$/month and host

Metrics: Logging

  • Keep your logs in one place, either on application level or in /var/log.
  • Use log levels: Separate debug information from warnings and errors
  • Use a coherent log format (timestamp, level, message, payload)
  • Separate your access logs (e.g. in Express) from your application logs
  • Track your deployments with your analytics tools

Metrics: Logging

One possible solution: Bunyan

  • All logs are stored in JSON format (timestamp, app, message, payload)
  • Uses streams, offers different targets out of the box: File, rotating file, database, etc.

Metrics: Logging

But …

  • Uncaught exceptions are still logged to stderr
  • Other components may still use console.log statements


node app.js >> /var/log/myapp.log 2>&1
Again, multiple logs in different formats.

Analysis of gathered metrics

Different log formats and destinations make data analysis difficult:

# Apache access log
10.0.1.22 - - [15/Oct/2010:11:46:46 -0700] "GET /favicon.ico HTTP/1.1" 404 209
fe80::6233:4bff:fe29:3173 - - [15/Oct/2010:11:46:58 -0700] "GET / HTTP/1.1" 200 44

# Apache error log
[Fri Oct 15 11:46:46 2010] [error] [client 10.0.1.22] File does not exist: /Library/WebServer/Documents/favicon.ico
[Fri Oct 15 11:46:58 2010] [error] [client fe80::6233:4bff:fe29:3173] File does not exist: /Library/WebServer/Documents/favicon.ico

# typical Express.js log output
[Mon, 21 Nov 2011 20:52:11 GMT] 200 GET /foo (1ms)
Blah, some other unstructured output to from a console.log call.
                    

»ELK« stack

  • Elasticsearch (Storage/Search)
  • Logstash (Logfile processor)
  • Kibana (Logfile viewer)

»ELK« stack

Pro:
  • Very powerful and extendable log analysis
  • Parse logs for Squid, Apache, Nginx, Syslog, MySQL, …
  • Feed logs directly to statsd/Graphite
  • Easy querying and visualization
  • Realtime search
  • Open Source
Con:
  • Slightly more complex setup (Java, JRuby, etc.)
  • Thus might not fit for smaller projects/hosting solutions

Logstash

Turns messy data in different log formats …

# Apache access log
10.0.1.22 - - [15/Oct/2010:11:46:46 -0700] "GET /favicon.ico HTTP/1.1" 404 209
fe80::6233:4bff:fe29:3173 - - [15/Oct/2010:11:46:58 -0700] "GET / HTTP/1.1" 200 44

# Apache error log
[Fri Oct 15 11:46:46 2010] [error] [client 10.0.1.22] File does not exist: /Library/WebServer/Documents/favicon.ico
[Fri Oct 15 11:46:58 2010] [error] [client fe80::6233:4bff:fe29:3173] File does not exist: /Library/WebServer/Documents/favicon.ico

# typical Express.js log output
[Mon, 21 Nov 2011 20:52:11 GMT] 200 GET /foo (1ms)
Blah, some other unstructured output to from a console.log call.
                    

Logstash

… into structured output

{
    "message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800…
 "@timestamp" => "2013-12-11T08:01:45.000Z",
   "@version" => "1",
       "host" => "cadenza",
   "clientip" => "127.0.0.1",
  "timestamp" => "11/Dec/2013:00:01:45 -0800",
       "verb" => "GET",
    "request" => "/xampp/status.php",
"httpversion" => "1.1",
   "response" => "200",
      "bytes" => "3891",
   "referrer" => "\"http://cadenza/xampp/navi.php\"",
      "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X…
}
                    

Logstash

  • Easily extendable to custom log formats
  • Read log information from file, Heroku, Redis, RabbitMQ, stdin, syslog, TCP, UDP, XMPP, ZeroMQ, …
  • Output to file, Ganglia, Graphite, Irc, Loggly, MongoDB, Nagios, RabbitMQ, Redis, Riak, S3, Statsd, Syslog, TCP, UDP, Websocket, XMPP, ZeroMQ, …

Kibana

Kibana

That's all folks!

Links and further resources

Frederic Hemberger
Twitter
@fhemberger
GitHub
fhemberger
Web
frederic-hemberger.de