Playing with Node.js

Last update: 27-05-2011 10:00pm (Node 0.4.8)

Hi. My name is Ɓukasz Anwajler (@anwajler). Recently I heard many good things about Node.js and I decided to give it a try. I would like to show how to perform some tasks using Node. There will be no introduction, I will talk about things as we go through code examples. If you don't want to waste time reading books and so on - it may be a good place for you, just read and see how fun programming with Node is. This little project is a part of my learning Node - so if you see any errors - please correct me!

All you need to know: Node.js...

Let's rock! Firstly, download Node.js. At the time of writing this article current version is 0.4.8

Unpack archive, go to extracted directory and type:

$ ./configure
$ make
$ sudo make install

Voila! To enter Node.js interactive shell type:

$ node

We're in Node interactive shell. Okay let's play a little.

> 1 + 1
2
> console.log('some shit');
some shit
> var x = function(x1, x2) {
... return x1 + x2 
... };
> x(1,2)
3

You can write a simple script and save it as node-example1.js, then run it using:

$ node node-example1.js

Processes.

If you want to pass some arguments to script - use process.argv to read them. Put something like that in your test1.js file

console.log(process.argv[0]);
console.log(process.argv[1]);
console.log(process.argv[2]);
console.log(process.argv[3]);

and run it from command line:

$ node test1.js arg1
node                 /* name of the application */
/Users/node/test1.js /* path to script */
arg1                 /* passed argument */
undefined            /* I passed only one argument so next elements in process.argv array are be undefined */

Let's go back to Node command-line. We can easily change current working directory or travel through our file system.

	> process.cwd();
	'/Users/node'
	> process.chdir('/var');
	> process.chdir('/Users');
	> process.cwd();
	'/Users'
	> process.chdir('/wtf');
	Error: No such file or directory
	    at [object Context]:1:9
	    at Interface. (repl.js:171:22)
	    at Interface.emit (events.js:64:17)
	    at Interface._onLine (readline.js:153:10)
	    at Interface._line (readline.js:408:8)
	    at Interface._ttyWrite (readline.js:585:14)
	    at ReadStream. (readline.js:73:12)
		at ReadStream.emit (events.js:81:20)
	    at ReadStream._emitKey (tty_posix.js:307:10)
    	at ReadStream.onData (tty_posix.js:70:12)

Ooops! We went to a non-existent directory and Node was hit! Always handle errors properly, because they can crash your whole Node process.. This is the life of event-driven developer with event-loop waiting to attack. Well, we are real men ready to survive in this hostile environment. We just need to be a little more careful:

> try {
... process.chdir('/wtf');
... } catch (err) {
... console.log('error while chdir: ' + err);
... }
error while chdir: Error: No such file or directory

As I mentioned before - uncaught errors can bubble up to main event loop and crash Node process.

While we're using some simple scripts to run them and forget about them in a minute - who cares. The place where problem of uncaught errors becomes important is production environment. You really don't want to crash Node process with hundreds of users unless you're eval() evil.

Anyway, I keep picturing all these little kids playing some game in this big field of rye and all. Thousands of little kids, and nobody's around - nobody big, I mean - except me. And I'm standing on the edge of some crazy cliff. What I have to do, I have to catch everybody if they start to go over the cliff - I mean if they're running and they don't look where they're going I have to come out from somewhere and catch them. That's all I do all day. I'd just be the catcher in the rye and all. -- The Catcher in the Rye, Holden Caulfield in Chapter 22

Fortunately, we have our own Holden Caulfield: uncaughtException event.

	> process.on('uncaughtException', function (err) {
	... console.log('OMG something went really wrong: ' + err);
	... });

Let's play some more with processes.

	> process.pid
	58319
	> process.title
	'node'
	> process.kill(process.pid, 'SIGHUP');
	Hangup

I think that suicide is good moment to switch to some more interesting stuff.

File system: I/O, Streams, Buffers

Before we start programming there is one thing you should know. One of the most important things about Node is: its I/O operations are non-blocking. What does it mean? Let me give you some real world example:

John wants to go from his home to a shop nearby. With other technologies his little trip would look like this:

  1. John leaves home and starts walking.
  2. John's phone is ringing - he stops and starts talking.
  3. After finishing his conversation he resumes walking.
  4. John's phone is ringing - he stops and starts talking.
  5. After finishing his conversation he resumes walking.
  6. John arrives at his favourite shop.

Node is more like:

  1. John leaves home and starts walking.
  2. John's phone is ringing - he answers and doesn't stop walking -- these operations are not excluding one from another.
  3. John arrives at his favourite shop.

Talking over phone in programming land is: I/O operations, querying database etc. This is why Node is so different and efficient.

Now when you know why non-blocking is so cool - let's proceed to some file system operations.

Let's make script which kills itself after file it's in is accessed.

	var fs = require('fs');
	var command = process.argv.join(' ');
	var file = process.argv[1];
	var fileListener = function(curr, prev) {
		if (curr.mtime != prev.mtime) {
			console.log('File changed at ' + curr.mtime);
			console.log('Arrow up & return or..');
			console.log('..re-run: ' + command);		
			fs.unwatchFile(file);
			process.exit(0);
		}
	};
	var options = { persistent: true, interval: 0 };
	fs.watchFile(file, options, fileListener );

At the beginning I import module to access file system stuff, then initalize some variables and finally create listener. If file is accessed after running script - it outputs access date and kindly exits. Easy, right?

Interactive shell looks very sexy so you should train some more, for sake of code readibility I will show clear snippets for bigger parts of code. Let's do some I/O.

In most cases you will need just the simplest way to write to file:

var fs = require('fs');
var callback = function (err) {
  if (err) throw err;
  console.log('Data written!');
};
fs.writeFile('/tmp/simplest.txt', 'Yo mamma', callback);

Short, easy, not so flexible. Just remember than it's going to overwrite your file. What if we need something more suitable for our needs?

var fs = require('fs');
var path = '/tmp/write-simple.txt';
var data = 'hack the planet! make web not war! visit Warsaw once in a lifetime!';
var offset = 0;
var buffer = new Buffer(data);
var position = null;
var dataWritten = function(err, written, buffer) {
	console.log('Written ' + written + ' bytes into file');
};
var fileClose = function(err){
	console.log('File closed');
};
var afterOpen = function(err, fd) {
	console.log('File opened');
	fs.write(fd, buffer, offset, buffer.length, position, dataWritten)	
	fs.close(fd, fileClose);
};

fs.open(path, 'w+', afterOpen);

In this approach you can specify every single detail. Problems begin when you want to write to the file multiple times without waiting for callbacks and without buffers (they cannot be resized - so are inconvenient when you don't know the exact size of data) -- if you need such things - use streams instead:

	var fs = require('fs');
	var path = '/tmp/test.txt';

	var streamOptions = { flags: 'w+',
	  					  encoding: 'utf-8',
	  					  mode: 0666 };

	var streamClose = function () {
		console.log('Stream closed');
	};

	var streamCreate = function () {
		console.log('Stream created');
	};

	var streamError = function (err){
		console.log('Achtung achtung: ' + err);
	};

	var afterOpen = function(err, fd) {
		console.log('File opened.');
		var writeStream = fs.createWriteStream(path, streamOptions);	
		writeStream.on('open', streamCreate);
		writeStream.on('close', streamClose);
		writeStream.on('error', streamError);
		writeStream.write("oh yeah");
		writeStream.write("oh cool");
		writeStream.write("oh super");
		writeStream.write("oh great");
		writeStream.write("oh lulz");
		writeStream.end('kkthxbye');
	};
	fs.open(path, 'w+', afterOpen);

Ok so what do we have here... I use WriteStream to write as many data to file as I can imagine (you see my imagination is limited). I put some callbacks to perform simple tasks (here - logging). I don't need to specify any buffers and sizes.

You can access writing streams' events. I'm lazy so I will just give you definitions from official Node API.

Examples:

	// ...
	var writeStream = fs.createWriteStream(path, streamOptions);
	writeStream.on('drain', function writeOnDrain(){
		// do some stuff
	});
	writeStream.on('error', function writeOnError(exception){
		// do some stuff
		console.log(exception);
	});
	writeStream.on('close', function writeOnClose(){
		// do some stuff
		console.log('file closed');
	});
	writeStream.on('pipe', function writeOnPipe(src){
		// do some stuff
		console.log('pipe');
	});

Let's switch to reading. Similarly to writing there are methods for reading. For instance, there is a writeFile() and guest what... there is readFile() too.

var fs = require('fs');
var callback = function (err, data) {
  if (err) throw err;
  console.log('Data read: ', data.toString());
};
fs.readFile('/tmp/simplest.txt', callback);

Note that I use toString() to display Buffer object. Otherwise I would get something like:

Data read:  <Buffer 59 6f 20 6d 61 6d 6d 61>

instead of

Data read:  Yo mamma

Why is that? Let's dive into buffer.js source code.

	// toString(encoding, start=0, end=buffer.length)
	Buffer.prototype.toString = function(encoding, start, end) {
	  encoding = String(encoding || 'utf8').toLowerCase();

	  /* (...) */
	 
	  switch (encoding) {
	    case 'hex':
	      return this.parent.hexSlice(start, end);

	    case 'utf8':
	    case 'utf-8':
	      return this.parent.utf8Slice(start, end);

	    case 'ascii':
	      return this.parent.asciiSlice(start, end);

	    case 'binary':
	      return this.parent.binarySlice(start, end);

	    case 'base64':
	      return this.parent.base64Slice(start, end);

	    case 'ucs2':
	    case 'ucs-2':
	      return this.parent.ucs2Slice(start, end);

	    default:
	      throw new Error('Unknown encoding');
	  }
	};

As you see while working with Buffers if we don't use encoding (which toString defaults to 'utf-8') we get just raw data.

Okay, so why Buffers are so retarded that they need to be handled in their own special way? Well, that's because you may store in files some binary data as well as strings - think about TCP connections, streaming etc.

Summing up Buffers:

Enough about Buffers, we're going back to our I/O operations.

Things get interesting. Let's do opposite task to the second writing to file example - more flexible file reading.

var fs = require('fs');
var path = '/tmp/read-simple.txt'; /* don't forget to put something here :') */
var offset = 0;
var buffer;
var position = null;
var afterStat = function(err, stats) {	
	buffer = new Buffer(stats.size);
	fs.open(path, 'r', afterOpen);	
};
var dataRead = function(err, bytesRead, buffer) {
	console.log('Read ' + bytesRead + ' bytes from file');
};
var fileClose = function(err){
	console.log('File closed');
	console.log(buffer.toString());
};
var afterOpen = function(err, fd) {
	console.log('File opened');
	fs.read(fd, buffer, offset, buffer.length, position, dataRead);	
	fs.close(fd, fileClose);
};
fs.stat(path, afterStat);

At the beginning (yes - reading from the end), I start by calling fs.stat() to get size of the file. afterStat callback proceeds with creating buffer and opening the file. fs.open() triggers next callback afterOpen which executes fs.read(). After reading file content - next callback named dataRead is fired and so on.

This is what programming with Node looks like. You execute statements and after finishing they job they call functions you provided asynchronously. Note that I'm using named functions here - I think they're easier to read. The same code with anonymous functions would look like that:

var fs = require('fs');
var path = '/tmp/read-simple.txt';
var offset = 0;
var buffer;
var position = null;

fs.stat(path, function(err, stats) {	
	buffer = new Buffer(stats.size);
	fs.open(path, 'r', function(err, fd) {
		console.log('File opened');
		fs.read(fd, buffer, offset, buffer.length, position, function(err, bytesRead, buffer) {
			console.log('Read ' + bytesRead + ' bytes from file');
		});	
		fs.close(fd, function(err){
			console.log('File closed');
			console.log(buffer.toString());
		});
	});	
});

As you see it's a bit shorter, but more horizontal and harder to read. Anyway, if your callback tree isn't too deep - you may use this style. On the other hand, using named functions instead of anonymous ones means you can re-use them - you can benefit from this.

One more thing. To make your code better, even while using anonymous functions you should name them. Read more about rules at Felix's Node.js Style Guide.

So the proper way of writing anonymous functions should look like this:

// ...
fs.stat(path, function afterStat(err, stats) {	
	buffer = new Buffer(stats.size);
	fs.open(path, 'r', function afterOpen(err, fd) {
		console.log('File opened');
		fs.read(fd, buffer, offset, buffer.length, position, function dataRead(err, bytesRead, buffer) {
			console.log("Read " + bytesRead + " bytes from file");
		});	
		fs.close(fd, function afterClose(err){
			console.log('File closed');
			console.log(buffer.toString());
		});
	});	
});

This way it's easier for debugging (you know what exactly crashed because it has its own name).

Okay, let's create read stream.

	var fs = require('fs');
	var path = '/tmp/read-simple.txt';
	var readStream;
	var dataReceived = function(data) {
		console.log("Data: " + data);
	};
	var streamError = function(exception) {
		console.log(exception);
	};
	var streamClosed = function() {
		console.log('Closed');
	};
	var streamEnd = function() {
		console.log('End');
	};
	var streamCreated = function(fd) {
		console.log('Open');
		readStream.on('data', dataReceived);
		readStream.on('error', streamError);
		readStream.on('end', streamEnd);	
	}
	var fileOpened = function (fd) {
		var options = { flags: 'r',
		  encoding: 'utf-8',
		  mode: 0666,
		  bufferSize: 1024,
		  start: 0,
		  end: 100,
		  fd: fd
		};
		readStream = fs.createReadStream(path, options);
		readStream.on('open', streamCreated);	
	};
	fs.open(path, 'r', 0666, fileOpened);

And this is it!

Network: TCP, UDP, HTTP

Let's make it clear - majority of people who are interested in Node.js comes from Web development background. And you will be satisfied. Firstly, let's take a look at network stuff.

var net = require('net');
var server = net.createServer(function (c) {
	c.write('Stack trace or GTFO');
	c.end('\n');
});
server.listen(1337, 'localhost');  

Code above creates simple TCP server which outputs quote and terminates connection. You can check it out by telneting to 1337 port.

$ telnet localhost 1337

Let's create something very similar.

var net = require('net');
var server = net.createServer(function (c) {
	c.write('Stack trace or GTFO');
	c.end('\n');
});
server.listen('/tmp/yo.sock');

And there is the same server but with Unix sockets instead of TCP. To connect you need to use netcat (nc).

    The nc (or netcat) utility is used for just about anything under the sun involving TCP or UDP.  It can
    open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports, do port scanning, and
    deal with both IPv4 and IPv6.  Unlike telnet(1), nc scripts nicely, and separates error messages onto
    standard error instead of sending them to standard output, as telnet(1) does with some.

	...
	
     -U      Specifies to use Unix Domain Sockets.	

Here we go:

$ nc -U /tmp/yo.sock

If using TCP connections you can access remoteAddress and remotePort attributes and create simple logger like this:

	var net = require('net');
	var server = net.createServer(function (c) {
		c.write('Stack trace or GTFO');
		console.log(c.remoteAddress);
	  	console.log(c.remotePort);
		c.end('\n');
	});
	server.listen(1337, 'localhost');	

After few connections server output would look similar to:

$ node test1.js 
127.0.0.1
50041
127.0.0.1
50050
127.0.0.1
50053
127.0.0.1
50056

Let's create server and make use of all events and callbacks.

	var net = require('net');
	var server = net.createServer();
	
	var onConnection = function onConnection(c){
		c.write('OMG machine is talking');
		console.log('Wilma, I\'m home.. exactly at: ' + c.remoteAddress + ':' + c.remotePort);
		c.end('\n'); 
		server.close();		
	};	
	var onClose = function onClose(){
		console.log('Goodbye cruel world');				
	}; 
	var onError = function onError(exception){
		console.log('It is right time to panic, because ' + exception);		
	};
	var onListen = function onListen() {
		server.on('connection', onConnection);
		server.on('error', onError);
		server.on('close', onClose);
				
		console.log('Server always listens');		
		server.pause(10000);
		console.log('For the first 10secs I\'m lazy and won\'t answer any connections.');
	};
		
	server.listen(1337, 'localhost', onListen);

I'll explain it a little. After listening on port 1337 we get onListen callback, then event handlers get attached, next - server pauses - clients are still able to connect but onConnection won't get executed as long as server is paused. After 10 secs server is not paused anymore and is ready to process connections - if client connects - onConnection event is fired. Inside event handler we send text to client and close connection and finally - kill server.

Server, servers, servers. Let's give clients a chance to talk. I'll show you how to write client in Node interactive shell - we will try to connect to server from previous section.

	$ node
	> var net = require('net');
	> var s = new net.Socket({type: 'tcp4'});
	> s
	{ bufferSize: 0,
	  fd: null,
	  type: 'tcp4',
	  allowHalfOpen: false,
	  _writeImpl: [Function],
	  _readImpl: [Function],
	  _shutdownImpl: [Function] }
	> s.connect(1337, 'localhost', function(){ s.on('data', function(data) { console.log(data.toString()); });  });
	> OMG machine is talking

As you can see Sockets can be connected to specific file descriptor, they have type (tcp4, tcp6 or unix). Some low-level stuff: being half-opened means that you cannot read socket but you are still able to write to it - and you need to manually close it using end(). Default value is false. Now you can forget about it, I just thought you might me curious.

We can now leave Node interactive shell and play some more with client-server communication. Let's write a simple ping-pong script.

	// client
	
	var net = require('net');
	var s = new net.Socket({type: 'tcp4'});
	var onData = function onData(data) {
		var toSend = (parseInt(data)+1).toString();
		console.log('Received: ' + data.toString()); 
		this.write(toSend);
		console.log('Send: ' + toSend);
	};
	var onConnect = function onConnect() {
		s.write("1"); 
		s.on('data', onData);  
	};
	s.connect(1337, 'localhost', onConnect);	

Client connects to listening server and begins with sending "1". Whenever server answers - client sends back incremented value of received data.

	// server
	
	var net = require('net');
	var server = net.createServer();
	var onData = function onData(data) {
		console.log('Received: ' + data);
		var toSend = (parseInt(data)+1).toString();
		this.write(toSend);
		console.log('Send: ' + toSend);
	}
	var onConnection = function onConnection(c) {
		c.on('data', onData);
		console.log('Client ' + c.remoteAddress + ':' + c.remotePort + ' wants to play ping-pong');
	};	
	var onClose = function onClose(){
		console.log('Goodbye cruel world');				
	}; 
	var onError = function onError(exception) {
		console.log('It is right time to panic, because ' + exception);		
	};
	var onListen = function onListen() {
		server.on('connection', onConnection);
		server.on('error', onError);
		server.on('close', onClose);				
		console.log('Server always listens');		
	};			
	server.listen(1337, 'localhost', onListen);	

Server does similar thing as client. After receiving some data, increments and sends it back. Both client and server should output something like:

	...
	Received: 23089
	Send: 23090
	Received: 23091
	Send: 23092
	Received: 23093
	Send: 23094
	Received: 23095
	Send: 23096
	Received: 23097
	Send: 23098
	Received: 23099
	Send: 23100
	Received: 23101
	Send: 23102
	Received: 23103
	Send: 23104

You can launch two or more clients and check if it's working correctly.

This was easy. What if we need to remember state of connections? Let's have a look.

	// client
	
	var net = require('net');
	var s = new net.Socket({type: 'tcp4'});
	var onConsoleData = function(chunk) { 
		s.write(chunk);
	};
	var onEnd = function onEnd() {
		console.log('Socket closed');
	};
	var onData = function onData(data) {
		console.log('Received:\n' + data.toString());
	}
	var onConnect = function onConnect() {
		var stdin = process.openStdin();
		stdin.on('data', onConsoleData);
		s.on('data', onData);
		s.on('end', onEnd);
	};
	s.connect(1337, 'localhost', onConnect);	

Client connects to server and waits for console input. There are two commands 'LIST' and 'REMOVE'.

	// server
	
	var net = require('net');
	var server = net.createServer();
	var connectionsList = [];
	var findByAddressAndPort = function (address, port) {
		var i;
		for(i = 0; i < connectionsList.length; i++){
			if(connectionsList[i] !== undefined) {
				if(connectionsList[i].address === address && connectionsList[i].port === port) {
					return {'index': i, 'object': connectionsList[i]};
				}
			}
		}		
	};
	var dataHandler = function (connection, data) {
		var answer = '';
		if(data.toString() === 'LIST\n') {
			var current = findByAddressAndPort(connection.remoteAddress, connection.remotePort).object;
			current.lastCommand = data.toString();			
			for(var i = 0; i < connectionsList.length; i++){
				if(connectionsList[i] !== undefined) {
					answer = answer + connectionsList[i].address + ":" + connectionsList[i].port + ', last command: ' + connectionsList[i].lastCommand + '\n';
				}
			}
		} else if(data.toString() === 'REMOVE\n') {
			var index = findByAddressAndPort(connection.remoteAddress, connection.remotePort).index;			
			connection.end();
			answer = null;
			delete connectionsList[index]; // need something more gentle here
		}
		return answer;
	};
	var onData = function onData(data) {
		var toSend = dataHandler(this, data);
		if(toSend === null) {
			console.log('Connection with client closed');
		} else {
			this.write(toSend);
		}
	}
	var onConnection = function onConnection(c) {
		c.on('data', onData);
		console.log('Client ' + c.remoteAddress + ':' + c.remotePort + ' connected');
		connectionsList.push({'address': c.remoteAddress, 'port': c.remotePort, 'lastCommand': 'none'});
	};	
	var onClose = function onClose(){
		console.log('Goodbye cruel world');				
	}; 
	var onError = function onError(exception) {
		console.log('It is right time to panic, because ' + exception);		
	};
	var onListen = function onListen() {
		server.on('connection', onConnection);
		server.on('error', onError);
		server.on('close', onClose);				
		console.log('Server always listens');		
	};			
	server.listen(1337, 'localhost', onListen);

Server saves information about connections in an array called connectionsList. When clients hit with 'LIST' command server sends back list of all current connections and last commands from each connected client. After 'REMOVE' command, client is disconnected and removed from the connectionsList.

... to be continued soon