Jul 282013

Big Data and Data Science are among the hottest words in Technology world currently. Every company seems to be doing some really complex work with Big Data. In Silicon Valley, lot of organisations are racing against time to build products around Big Data. Big Data continues to be bigger, of-course more complex and even broader in scope. According to Gartner Emerging Technology Hype Cycle 2012, Big Data is in “Peak of Inflated Expectations” area.

With rise of Big Data, there is a new breed of highly paid and scarcely available Professionals : Data Scientists. Based upon my experience of working in the space since past 1.5 years, I would like to share my understanding of current state of art in terms of Technologies. This talk is intended for Software Professionals interested in gaining an overview of Technology stack in Big Data projects, skill-set required and how to work towards building these competencies. Broad agenda would be:

  • What’s a typical Big Data Project
  • Main Products and Technologies
    • How do they fit in at different stages,  Competitive Landscape
  • Competencies of a Data Scientist
    • Statistics, Distributed Programming, Machine Learning, Text Mining, Data Visualization, Data Ingestion
  • Toolkit (Demo of some of the tools and libraries)
    • R / Python / Java,  Map-Reduce / Storm, R Libraries / Mahout / WEKA, Hive / Pig, Pentaho / Tableau / Excel, Sqoop / Flume / Oozie
  • My Experiences in Big Data Project at European Bank

Narinder Kumar is a practicing Technologist, learning Entrepreneur and passionate Product Developer. I am part of IT Industry professionally since 1996. During this time period, I have worked across diverse Industries, different countries and performed different roles.  I am currently working for a Big Data project at ING Bank, Netherlands. My work includes different components of Hadoop, Machine Learning Algorithms, NoSQL Data stores and Cloud Frameworks. I am also Certified Trainer for Apache Hadoop trainings delivered by Cloudera…