Sangeeta Oak is founder of Rightrix Solutions & IndicThreads. She was earlier with Cognizant Technology Solutions where she led several projects in Broadvision, J2EE and EAI. She has been a speaker at conferences by IndicThreads & JAX. She holds a master’s degree in computer management. She is an enthusiastic painter and a believer in the Yoga way to a good life.
Big Data and Data Science are among the hottest words in Technology world currently. Every company seems to be doing some really complex work with Big Data. In Silicon Valley, lot of organisations are racing against time to build products around Big Data. Big Data continues to be bigger, of-course more complex and even broader in scope. According to Gartner Emerging Technology Hype Cycle 2012, Big Data is in “Peak of Inflated Expectations” area.
With rise of Big Data, there is a new breed of highly paid and scarcely available Professionals : Data Scientists. Based upon my experience of working in the space since past 1.5 years, I would like to share my understanding of current state of art in terms of Technologies. This talk is intended for Software Professionals interested in gaining an overview of Technology stack in Big Data projects, skill-set required and how to work towards building these competencies. Broad agenda would be:
- What’s a typical Big Data Project
- Main Products and Technologies
- How do they fit in at different stages, Competitive Landscape
- Competencies of a Data Scientist
- Statistics, Distributed Programming, Machine Learning, Text Mining, Data Visualization, Data Ingestion
- Toolkit (Demo of some of the tools and libraries)
- R / Python / Java, Map-Reduce / Storm, R Libraries / Mahout / WEKA, Hive / Pig, Pentaho / Tableau / Excel, Sqoop / Flume / Oozie
- My Experiences in Big Data Project at European Bank
I am practicing Technologist, learning Entrepreneur and passionate Product Developer. I am part of IT Industry professionally since 1996. During this time period, I have worked across diverse Industries, different countries and performed different roles.
I am currently working for a Big Data project at ING Bank, Netherlands. My work includes different components of Hadoop, Machine Learning Algorithms, NoSQL Data stores and Cloud Frameworks. I am also Certified Trainer for Apache Hadoop trainings delivered by Cloudera.
I am strong believer of Agile methodologies and have been following Scrum and XP since 2005. I have trained & coached several Software Development teams to successfully adapt Scrum. I am Certified Trainer for CSD (Certified Scrum Developer) courses through Scrum Alliance and also a Contributor to Distributed Scrum Primer.