What the heck is Mahout?

Here is the tutorial I used by Steve Cook on youtube.

Links to downloads libraries for java:

http://mahout.apache.org/general/downloads.html

http://www.slf4j.org/download.html

Here is the data:

http://grouplens.org/datasets/movielens/

https://code.google.com/p/guava-libraries/

The basics of Mahout (which is an Apache product) is to accomplish the following:

  • Collaborative Filtering (recommendations)
  • Classification (spam email or not)
  • Clustering (Google news)

Getting Started with Python

My background languages are java, objective-c, SQL, and html….but on Python!! Not a problem. I turned my attention to good ole’ youtube.com, and while I was doing the elliptical machine at 5:30am, I ran into some great videos from OneStopProgramming.

Summary steps to get started:

  • install Python exe
  • install notepad++ if you don’t already have it
  • create a simple .py script
  • open command prompt, find the .py and run it

Useful functions:

  • len(“hello”)b = 5
  • help(len) = gives info about len function
  • dir() = gives you all the variables you’ve declared
  • “H” in “Hello” = TRUE
  • “h” in “Hello” = FALSE

 

Getting Started with Hadoop

To begin playing around with what Hadoop does, I decided to go down the path of using HortonWorks Sandbox.  One of the first things the setup has you do, is install Oracle VirtualBox, which is a virtual machine.  Within that virtual machine is where the Sandbox will run.  One note, the browser IP is wrong in the tutorial, it should be http://127.0.0.1:8000 to open the Sandbox GUI.

I then proceeded to follow the “Hello World” tutorial with I was able to import some actual data from the NYSE and run some Hive and Pig queries.  I have a substantial SQL background (but is not essential) so it was a breeze.

I’m impressed on how easy and well written the tutorial was.  Great way to get started!