Showing posts from June, 2012

Cloud Trend at my location - Pune

Cloud Trend in Pune

Based on the conference and meets, held up by cloud enthusiasts in past few months
Have you ever realized the importance of data content posted on public forums? How crucial that bunch of data can be for a business analyst to know the trend of the technology in his/her domain? If your answer is "no" or "may be", then here is a blog than shows its potential.

 Through this blog I want to highlight a study that I have done over past few days to understand the cloud trend of my location. The study is based on the conferences that have been held over past few months and the profiles of author from social networking sites, linked-in.
Initially, the data captured looked gibberish and bulky, my excel file was getting flooded with huge texts, but during course of time, the graphical structures began making more and more sense. Some major trends of cloud study for my location Pune are as follows:
Classification Trend related to Cloud Topics

Note: Cloud section …

Big Data for choosing bride/groom

Want to know how many female friends of yours are still  un-married....
This Blog is regarding the Logistic Regression Algorithm that has been developed  primarily for classification. I see this algorithm to be massively useful for cloud based-big data analysis. Let us understand this algorithm with a simple example. Problem statement: I want to calculate how many of my friends who are married, divorced or single. Solution: Here I have two groups of Sets
Marital Status group has 3 states – Single, Married and Divorced. Gender Status has 2 states Male and Female. Each of the 2 states has a score associated with it (z-score), a score that represents the normalized weight of the state. Z score is used to map the raw distribution into a space where the mean is represented by 0 and standard deviation is represented by 1.

Note the graph 1 – shows a linear line across the sets of friends. This has been poorly modeled data using linear regression. Graph 2- shows a comparatively exact data, hav…

Online Classification for Big Data

Highlighting my experience during coding of Stochastic Gradient Descent algorithm
Stochastic Gradient Descent is an online classification algorithm. This algorithm proves to be very efficient in classification of huge big data problems. Unlike Logistic algorithm, which is somewhat ancestor of this, it takes one row at a time from the input data which can also be called as tuple instead of a whole matrix. The data possess by the tuple undergoes computation and result gets added to previous computed record. There is a huge possibility of the usage of this algorithm in Big data analytics. Most common use case has been seen in medical field wherein the system accepted input from the patient on basis of cholesterol level and other parameters, and with help of this algorithm gave a probability of that individual having heart disease. link

Through this blog I intent to share my team’s experience in coding this feature, as well as documenting major findings during course of time.

It was p…

The elephant is running….Where is the Zookeeper???

Highlighting experience of running zookeeper on Hadoop on Azure platform
In the title elephant is a personification for Hadoop…
Hadoop on Azure is the upcoming feature where Microsoft has joined hands with open source community Hadoop to give a robust platform for Big Data Analytics. Through this platform, Microsoft wants to unlock opportunities in data analytics domain for structured and unstructured data. It opens opportunity for online retailers, storage companies, networking companies, software product companies, health industries and service companies.
In this blog post we will discuss on a specific feature which comes as a package with this platform. It is named as Zookeeper. Zookeeper is an Apache foundation work, which maintains naming, configuration management, synchronization and group services over a distributed file system. It addresses high performance, high availability and strict access of data according to the permission given by ACL(Access control list).
To make th…

Big Data - SWOT Analysis

Highlighting the strength, weakness, opportunities and Threats in Big Data

Strength: Helps research oriented topics for analytics and inquiry across domains of science, medical, history etc.Academic excellence, opening new area of statistical research and BIGreat support from industry all over the world.Microsoft join hands with open source community, launching Hadoop on AzureOpen source community will continue to prevail with Apache Mahout on Hadoop.Buzzword created by tech firmsMoore’s Law: 2 years ago huge investment cost required for 1TB data storage, now this is easily achieved through cloud computing.
Weakness: Lack of technology to support all formats, current implementation has complex logicLots of unstructured data present in platforms like - social mediaHuman conversation are messy, hard to process and currently unpredictableRequires excessive human interpretation to processContinuous monitoring required.
Opportunity: People look adaptive to this paradigm shiftCustomer looking to…