To begin playing around with what Hadoop does, I decided to go down the path of using HortonWorks Sandbox. One of the first things the setup has you do, is install Oracle VirtualBox, which is a virtual machine. Within that virtual machine is where the Sandbox will run. One note, the browser IP is wrong in the tutorial, it should be http://127.0.0.1:8000 to open the Sandbox GUI.
I then proceeded to follow the “Hello World” tutorial with I was able to import some actual data from the NYSE and run some Hive and Pig queries. I have a substantial SQL background (but is not essential) so it was a breeze.
I’m impressed on how easy and well written the tutorial was. Great way to get started!