The elephant is running….Where is the Zookeeper???


Highlighting experience of running zookeeper on Hadoop on Azure platform

In the title elephant is a personification for Hadoop…

Hadoop on Azure is the upcoming feature where Microsoft has joined hands with open source community Hadoop to give a robust platform for Big Data Analytics. Through this platform, Microsoft wants to unlock opportunities in data analytics domain for structured and unstructured data. It opens opportunity for online retailers, storage companies, networking companies, software product companies, health industries and service companies.

In this blog post we will discuss on a specific feature which comes as a package with this platform. It is named as Zookeeper. Zookeeper is an Apache foundation work, which maintains naming, configuration management, synchronization and group services over a distributed file system. It addresses high performance, high availability and strict access of data according to the permission given by ACL(Access control list).

To make this blog post short and crispy, I would like you to read the details of Zookeeper from http://zookeeper.apache.org/. Now let me focus on a problem statement wherein I create a zookeeper node (znode) and try to read the information it possess. Create a child node for the same. Get to know the stats and limits and finally delete the node.

Create a Znode:
Connect to the head node of the cluster given by remote desktop connection from the portal. Follow this link from Avkash Chauhan’s blog for more help.
Zookeeper version 3.4.2 jar has been deployed at location c:\Apps\dist\zookeeper.


To connect to Zookeeper one has to configure the java class path. Create a new environment variable “PATH” and enter the following location C:\Apps\java\openjdk7\bin
Now open the Hadoop command shell from the desktop and execute the following command


All the port related configuration are present at zoo.cfg in conf directory (C:\Apps\dist\zookeeper\conf). In my config file the client port is 2181, thereby you observe that zookeeper is getting connected at the mentioned port. After enabling all the features it will get connected to the Zookeeper shell. Following command prompt will be highlighted. This denoted that we are connected to the shell.


Now execute the following command in the shell prompt


Note: Initially there will be a system defined znode called zookeeper. The syntax for create node is: create <path> <data> <acl> <create mode>
Here I have given cdrw (create, delete, read and write) permission to the anyone who falls under ACL “World”. After successful creation of node on executing “ls /” command, you will see your custom node getting highlighted. 

Read Information / status from the newly created node
To read the configuration of the newly created node following command needs to be executed



The command get <path> will show you the detailed information of the created node. At this point I will ask you to keep a note of numChildren parameter which is 0. We will try to create a child node and verify this parameter.
Another command that you can try is stat <path>. Similar to get command having only one difference the data associated with znode (in our case it is "description") is not shown as a part of output in stat. All rest parameters are shown likewise. For more information on this, request you to browse to this link.

Create a child node
There are broadly 2 types of nodes that can be created in Zookeeper – (link)
1. Ephermal node (the znode exists as long as the session exist)
2. Sequential node (Unique naming)
To create a sequential and Ephermal znode following command is executed



Note: you can create a sequential node by –s argument and –e denotes Ephermal node. In my case child-0000000000 is the sequential node and “ephermalnode” is the Ephermal one. The next command get shows us the numChildren parameter to be 1 which depicts we have successfully created 1 child node.

How to set a data for the created znode
To set a data for newly created znode. Try the set command



Now when you try the get command, you will notice the previous data “description” which was associated with the newly created znode has been replaced with “junk”.

How to set a quota for a znode
In order to set a quota for your newly created znode, you need to execute the following command



There are 2 options given by setquota –
 1. –b (depicts as byte input)
 2. –n (depicts as namespace input)

How to read a stat and quota for znode
In order to read a stat or znode you need to go into the system pre-defined znode “zookeeper” which has a child node called “quota”.



Note the newly created “myznode” is present at this location. Inside the myznode lies to child nodes zookeeper_limits and zookeeper_stats




On executing get command against zookeeper_limits we get



With zookeeper_stats the same output would look like:



Now both the commands yield different output related to the functionality. Limits denote the max limit that can be achieved by the znode and stats shows the current state.

Delete a Znode
To delete a znode one should verify that the znode getting deleted should not have a child node



In the above execution an exception was thrown because the newly created znode was having a child node. Once the child node gets deleted, the root node also gets deleted.

Summary
The only objective of this blog is to help people understand zookeeper in Hadoop on Azure platform. The blog covered broader areas of creating, deleting and managing znode on cluster. The testing of all these commands have been executed in zookeeper standalone mode. If your build is going in production it is preferred that zookeeper is in replicated mode. To change to replicated mode you need to add the other server addresses in zoo.cfg file.


Comments

Popular posts from this blog

Firebase authentication with Ionic creator

Big Data - SWOT Analysis

LINKEDIN api call using NODE.JS OAUTH module