Aug 012013

Since early days the Hadoop community has made several attempts to stretch Hadoop beyond its role as a distributed programming framework. The key strength that Hadoop brings to the table is its ability to scale linearly. Can we combine this advantage of Hadoop with the efficiency of databases? What does it take to run SQL over Hadoop?

Running SQL-on-Hadoop implies accessing data from “within” Hadoop using SQL as the interface. Accomplishing this demands a significant re-architecture of the storage and compute infrastructures.

SQL-on-Hadoop also shifts Hadoop’s role from being a technology, viewed so far as complementary to databases into something that could compete with them. Its perhaps the single most significant feature that will help Hadoop find its way into more enterprises.

This will be highlighting some conceptual ideas of the different ways that SQL processors can be implemented atop Hadoop. I’ll be taking some examples of OSS and Research-ware products.


Srihari SQL HadoopSrihari currently heads the technology organization for ThoughtWorks India. He’s been a developer and architect for several enterprise applications with focus on building large scale systems based on service oriented architectures, domain specific languages etc. He is passionate about distributed systems and databases.