I currently work for Hortonworks as a part of Stinger project to improve the performance for Apache Hive. I pursued Masters in Computer Science and Engineering from The Ohio State University, Columbus under the guidance of Prof. Arnab Nandi with specific focus on large scale data analytics and big data interaction research. I was also a Google Summer of Code GSoC 2012 student, worked on implementing CUBE and ROLLUP operators in Apache Pig. I also implemented MR-Cube algorithm for distributed cubing on holistic measures as outlined in this paper. Prior to that, I worked at Advanced Computing Center for the Arts and Design (ACCAD) as a GRA under Prof. Alan Price on real time data streaming using VRPN. I also worked as a web application developer at OSUOGG.

Before joining graduate studies, I worked as a Software Engineer at Samsung India Software R&D Center for 3 years in Game and Connectivity team. I completed Bachelor of Engineering (Hons) in Computer Science from Birla Institute of Technology and Science,Pilani,Goa. I did my final year internship ar SAP Labs, Bangalore in NetWeaver User Interface Foundation Team.


Large-scale distributed systems, Big data analysis, Data mining, Web application development


CUBE operation in Apache Pig: Worked (still contributing) on this Google Summer of Code project for providing OLAP cube operation support for Apache Pig. Click here to view my project proposal. Following are the links to Pig JIRA issues related to this project
  • PIG-2167: CUBE operation in Pig
  • PIG-2710: Naive CUBE operator implementation
  • PIG-2726: Handling legitimate NULL values
  • PIG-2765: Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator
  • PIG-2814: Fix issues with documentation
  • PIG-2831: MR-Cube implementation (Distributed cubing for holistic measures)

  • Pig Latin mode for CodeMirror: Implemented pig mode in CodeMirror(online javascript based code editor) with support for syntax highlighting, autosuggestion and autocompletion for Pig Latin language (used by Apache Pig).

    Map-side fragment replicated join in YSmart: As a part of large scale distributed systems course project, I implemented map-side fragment replicated join support for YSmart. In this the smallest table in the join query is distributed to all mappers using hadoop's distributed cache and an in-memory map side join is performed thereby reducing the amount of intermediate data transfer in shuffle phase.


    Research Papers (Bookmarking)

    Bottom-Up Computation of Sparse and Iceberg CUBEs - link
    Data Cube Materialization and Mining over MapReduce - link
    Distributed Cube Materialization on Holistic Measures - link
    Large-Scale Machine Learning at Twitter - link
    Dremel: Interactive Analysis of Web-Scale Datasets - link
    Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications - link
    Processing a Trillion Cells per Mouse Click - link


    Programming Collective Intelligence
    Hadoop: The Definitive Guide
    In the Plex: How Google Thinks, Works and Shapes our life.
    The Google Story


    Click here to download.