Showing posts from December, 2014

Trace Investigation with HDInsight Hive query

Pre-requisites 1.Prepare an HDInsight cluster 2.Download Azure Powershell
This article is to parallelize  trace investigations using Hadoop Map Reduce and thereby reduce time and effort in investigations. 
Current Problem
1.During on call events there is a time and effort investment in recognizing the exact issues. There is no mechanism which can give a prediction of probable issues. 2.Traces are big data. Huge GB files are scanned to filter out exact traces. Currently we scan through traces which we download as buffer locally. Here Parallel execution of multiple filters is putting more load to the system. 3.We are having 10-15 filter strings but executing all of them in one go is not possible in current scenario.
Proposed Solution We propose a solution of using HDInsight to use Hive Query and do parallel execution using map reduce . Map Reduce is a technology to divide the large data in multiple chunks and send it to mappers. Mappers are executors which work on small data size and provide out…