Service Discovery and Configuration Management for Node & Express with ZooKeeper

Node is very easy to start with. Especially when you are using Express. During development you are runnign your app and your database on the same server, as well as anything else that you need.

In fact, when you start you often have the same setup in production. Everything is on the same machine. To modify your configuration you just log in to the server and quickly change whatever you need.

When your app grows, your needs grow too. You begin to scale. You have multiple front end servers running your application and you also have multiple other servers running your database and many other services.

It is no more easy. You cannot just log into one server. You have to make the change everywhere and it should happen quickly.

What you are facing are two very well known problems called configuration management and service discovery.

Configuration management tells your Node app how it should be configured and also how all other services, like the database for example, are configured.

Service discovery informs your Express app what kind of services and where they are available. For example, it can tell you that there are 3 Redis instances, one available only for writing and two available only for reading.

What is so hard?

Initially, you have all this information in your code. It knows everything it needs to know. In your code you store what database you have, their passwords, their ip addresses and anything else that you might need.

Security

Even when everything is running on the same machine there is already a problem of security. Your git repository is not a place to hold secure stuff. Even when removed it will still be available in the older commits.

This can be easily solved by using a separate file or environment variables, both outside of your source control. This works with one or two servers, but when you have hundred it is not more sufficient.

Each time you have to change something you need to log in everywhere, which is impossible when you have hundreds or even more servers.

Immediate Configuration Changes

This brings us to the next problem. When you make a change to your configuration like for example setting a new database password all front end servers should know about it immediately.

It is just not an option to go and change each and every server manually.

Service Availability & Discovery

Often the bottleneck of a web application is the database. Especially at peak traffic you might be required to add more database replicas for reading.

How do you let your front end servers know about those new replicas? How do they let them know when they are no more available?

Again, changing all files manually is super slow and error prone.

Wouldn’t it be nice your front end servers to automatically and immediately be aware when some configuration changes or when you add another database server?

Fortunately, there are some tools that will help us with that. They are ZooKeeper and zkfarmer.

ZooKeeper

ZooKeeper is a simple solution that can help you with all of the problems I mentioned above.

First of all, if you are working with Node & Express, you are probably not a big fan of Java, but ZooKeeper runs on Java.

Don’t worry it will not mess up your scalability. Not only ZooKeeper is fast, but in your solution it will never be touched by any user requests and it will contain very little data. Therefore it will not use too much memory or processing power.

Actually, we will use ZooKeeper to keep many different files in sync.

What is ZooKeeper?

ZooKeeper is a simple database, with a few very simple features.

First, it stores its data like a file tree. The base node is / then you can have child node /child and even deeper children /child/deep.

Each node can have children, just like a folder, but each node can also contain data.

However, a node is not intented to have lots of data but it is limited to 1MB, and usually ZooKeeper nodes have much less than that. This is more than enough for a simple json structure for example.

Another critical feature is ephemeral nodes. An ephemeral node exist only while the client that created it is still connected to ZooKeeper. As soon as the client is disconnected the node ceases to exist. Those nodes cannot have children, but they can have data.

Also with ZooKeeper it is easy to setup multiple instances which automatically synchronize between themselves.

These are just some of the great features that ZooKeeper provides and there are a few more, which you can read about if you are interested at zookeeper.apache.org.

How do I connect to ZooKeeper?

You can connect directly from Node, just like with any other database. On the ZooKeeper web site they talk only about Java and C interfaces, but there are actually for many more languages and platforms, including Node and JavaScript.

Another option is to use the client utility which ships with ZooKeeper.

zkfarmer

However, today we are not going to do any programming against ZooKeeper, instead we are going to use another great, tool developed by the nice people at Dailymotion called zkfarmer

It is a python tool, but don’t worry again it will not mess up your scalability as no traffic will ever know that it exists.

It is a tool which will sync json files and also folders with different nodes on ZooKeeper, without any programming on your side. Isn’t this nice?

Getting your hands dirty

No more talking, let’s do something. Our example project will have the following servers:

  • 1 ZooKeeper server
  • 1 Mongo server for writing
  • 2 Mongo servers for reading
  • 2 Front end servers for running your Node & Express app

It might sound a little bit complicated, but actually ZooKeeper really starts to shine when you are using it with more machines.

We are not going to look how to install Mongo or your node code here. You can find most of this information in my other articles.

Installing ZooKeeper

Let’s install ZooKeeper on your ZooKeeper server.

Java

First you need to install Java. This depends on your machine, so I will not get into details how it is done, but if you are on linux you will have to do one of the following

$ apt-get install openjdk-7-jre
$ yum install java-1.7.0-openjdk

These are both just the runtime without the development kit, which is all you need to run ZooKeeper.

ZooKeeper

Once you have Java, just download ZooKeeper and extract it from its archive.

When you are on your Linux server here is an example how to do it

$ wget http://apache.cbox.biz/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
$ tar zxf zookeeper-3.4.6.tar.gz
$ cd zookeeper-3.4.6

You can see the latest version at ZooKeeper Download.

Once you are inside the ZooKeeper folder create a file conf/zoo.cfg with the following content

tickTime=2000

dataDir=/var/zookeeper-data

clientPort=2181

It instructs where ZooKeeper should listen for incoming connections and where it should store its data. tickTime is the base time in milliseconds used by ZooKeeper when performing various operations, don’t worry about it.

Return to the ZooKeeper folder and run the following command

$ bin/zkServer.sh start

Congratulations, ZooKeeper should be up and running!

Let’s connect with it by issueing the following command from the ZooKeeper folder

$ bin/zkCli.sh -server 127.0.0.1:2181

Then do the following

[...] ls /
[zookeeper]
[...] create /services just_small_data
[...] ls /
[services, zookeeper]
[...] create /services/mongo {}
[...] ls /
[services, zookeeper]
[...] ls /services
[mongo]

You just played with ZooKeeper. First, you list all nodes under the base / node. Then your create a /services node with a string just_small_data as data. Then you create a child of that node called mongo with a simple {} string.

We’ve just prepared our zkfarmer data for the rest of this article.

Installing zkfarmer

Next we are going to install zkfarmer on the ZooKeeper server and also on each of the three Mongo servers.

Python

Fortunately, python comes with every Linux installation so installing it on your server should not be difficult. You will also need pethon setuptools.

$ apt-get install python-setuptools
$ yum install python-setuptools

Then clone the zkfarmer repository and install zkfarmer

$ git clone https://github.com/rs/zkfarmer
$ cd zkfarmer
$ python setup.py install

You are done. You have everything installed. I know it was not fun, but it is worth it.

Service discovery

Go to the Mongo server for writing and to the following.

$ mkdir zookeeper-mongo
$ echo 1 > zookeeper/enabled
$ echo 1 > zookeeper/write
$ echo 0 > zookeeper/read

The enabled file tells that the server is accepting connections. The write that it accepts write and the read file tells that this mongo server doesn’t accept reads, therefore it should be used only for writing.

Now run the following

$ zkfarmer --host <zookeeper-server-address>:2181 join /services/mongo zookeeper-mongo/

Right now, when you check your ZooKeeper server you will see that it has the a node /services/mongo/<mongo-write-server-ip> and it’s content is

{
  "enabled": "1",
  "write": "1",
  "read": "0",
}

Cool! Right?

When you Ctrl+C your running zkfarmer from above you will see that the node disappears. So when your server dies for any reason, ZooKeeper will immediately know about it.

Now let’s go to one of the mongo read servers and do the following

$ mkdir zookeeper-mongo
$ echo 1 > zookeeper/enabled
$ echo 0 > zookeeper/write
$ echo 1 > zookeeper/read

Then run zkfarmer

$ zkfarmer --host <zookeeper-server-address>:2181 join /services/mongo zookeeper-mongo/

Repeat the exact same commands on the other mongo reader server. As you can see this time zookeeper/read contains 1 and zookeeper/write 0, showing that those two servers are only for reading.

Front end servers

On the front end servers, let’s have the foolowing application

/conf
/controllers
/models
/views
app.js

where app.js is the starting point of your application. Now let’s run the following in the root folder of your project. This command will not return.

$ zkfarmer --host <zookeeper-server-address>:2181 export /services/mongo conf/mongo.json

Checkout the content of mongo.json and it should be something like

{
  "<mongo-write-server-ip>": {
    "enabled": "1",
    "write": "1",
    "read": "0",
    "hostname": "..."
  },
  "<mongo-read-1-server-ip>": {
    "enabled": "1",
    "write": "0",
    "read": "1",
    "hostname": "..."
  },
  "<mongo-read-2-server-ip>": {
    "enabled": "1",
    "write": "0",
    "read": "1",
    "hostname": "..."
  },
}

When one of these server dies, this file will be automatically updated. When you change the value for example of enabled or of anything else, it will also be immediately reflected in this json file.

To use the file all you have to do is include it in your app.js file with

var conf_mongo = require('conf/mongo.json')

It is that simple. However, we are not ready yet. When the conf/mongo.json is updated, how do you notify your app that something has changed?

Let’s imagine that you are using systemd to run your node application. Just like we did in Hosting & Deploying NodeJS Apps on Ubuntu and Hosting & Deploying NodeJS Apps on CentOS

Then replace the zkfarmer from above with

$ zkfarmer --host <zookeeper-server-address>:2181  export /services/mongo conf/mongo.json --changed-cmd "systemctl restart node-app-1"

Each time the conf/mongo.json changes systemd will restart your application and the new file will be loaded.

Adding and removing Mongo servers is a breeze and super easy and your app will always be aware of it.

Configuration Management

The same abilities can be used to propagate configuration changes. Imagine that your node application is using S3 to store files. You can use ZooKeeper to provide your credentials for S3.

Go the server where ZooKeeper is installed and create the following

$ mkdir zookeeper-node-conf
$ echo "<S3_ID>" > zookeeper-node-conf/s3_id
$ echo "<S3_SECRET_KEY>" > zookeeper-node-conf/s3_key
$ echo "<S3_BUCKET>" > zookeeper-node-conf/s3_bucket
$ zkfarmer --host <zookeeper-server-address>:2181 join /services zookeeper-node-conf/ --common

Notice the --common flag at the end. Unlike the before this time a new node named /services/common is created which just like before contains a json data.

{
  "s3_id": "<S3_ID>",
  "s3_key": "<S3_KEY>",
  "s3_bucket": "<S3_BUCKET>"
}

Again you can use it on your front end servers with the following

$ zkfarmer --host <zookeeper-server-address>:2181  export /services/common conf/common.json --changed-cmd "systemctl restart node-app-1"

Each time your configuration changes it will be automatically updated in ZooKeeper and on your front end servers, which then will be restarted.

Next

Right now I can see several places where you can go from here.

First, you can use systemd to manage both your ZooKeeper and your zkfarmer instances.

Second, you can try to run multiple instances of ZooKeeper on different servers. It is really easy, and this makes your ZooKeeper installation much more reliable.

Third, you can make ZooKeeper bind to a host ip for a private network. As a result your ZooKeeper and your configurations will be safe from outside access.

Last, the simple ZooKeeper can help in many more situations. Try to identify a few in your current Node & Express infrastructure.


Other articles that you may like

Did you like this article?

Please share it

We are Stefan Fidanov & Vasil Lyutskanov. We share actionable advice about development with Node, Express, React and other web & mobile technologies.

It is everything that we have learned from years of experience working with customers from all over the world on projects of all sizes.

Let's work together
© 2024 Terlici Ltd · Terms · Privacy