Category Archives: Uncategorized

Object Orientated Design for Data Science

Working with complex datasets, often custom code is needed for an intended solution. However, when designing custom code, the use object-oriented design practices promote code reusability and ease to update & extend funtionality. In this post, I’m going to look at the Titanic data (of the 2k passengers, which survived, etc), you can download here Titanic dataset.

This is a basic dataset, however the principles can be applied to more complex data sets. The idea is to perform different operations on each row of data depending on the passenger class (1st, 2nd, 3rd, or the ship’s crew). This could be accomplished by using a bunch of “if, else” statements, but again I’m looking for clean and reusable code here, and when working with complex data sets, it’s a much better approach.

For your reference, here is the top few rows of the data:

Class,Sex,Age,Survived
3rd,Male,Child,No
3rd,Female,Child,Yes
2nd,Male,Adult,No
Crew,Male,Adult,No
1st,Male,Adult,No

And here is the code (with comments):

def readData():
   # source data column mapping: Passenger,Class,Sex,Age,Survived
    hf = HandlerFactory() #create instance
    with open('titanic.csv',"r") as fl:
        next(fl) # skip header
        for lines in fl:
            lines = lines.strip('\n')
            fields = lines.split(',')
            passenger_class = fields[1]
            handler = hf.getHandler(passenger_class)
            handler.apply(fields)
            
class HandlerFactory():
    def __init__(self):
        self.handlers = {} #dict for each handler class
        self.register(CrewHandler())
        self.register(FirstClassHandler())
        self.register(SecondClassHandler())
        self.register(ThirdClassHandler())
    def register(self,handler):
        self.handlers[handler.getField()] = handler #add each class to dict
    def getHandler(self,fld):
        return self.handlers[fld] # returns the python class to handle specific passenger class

class FieldHander: # base python class to be inherited
    def __init__(self, field):
        self.field = field
    def getField(self):
        return self.field
    def setField(self, field):
        self.field = field    
            
class CrewHandler(FieldHander): # print passenger class and survived
    def __init__(self):
        FieldHander.__init__(self, "Crew")
    def apply(self, fields):
        print ([fields[1]])  

class FirstClassHandler(FieldHander): # print passenger class, gender, age, and survived
    def __init__(self):
        FieldHander.__init__(self, "1st")
    def apply(self, fields):
        print ([fields[1], fields[2], fields[3], fields[4]]) 

class SecondClassHandler(FieldHander): # print passenger class, gender, and survived
    def __init__(self):
        FieldHander.__init__(self, "2nd")
    def apply(self, fields):
        print ([fields[1], fields[2], fields[4]])

class ThirdClassHandler(FieldHander): # print passenger class and survived
    def __init__(self):
        FieldHander.__init__(self, "3rd")
    def apply(self, fields):
        print ([fields[1], fields[4]])

And the results (notice how each class of passenger has different fields chosen):
[‘3rd’, ‘No’]
[‘3rd’, ‘Yes’]
[‘2nd’, ‘Male’, ‘No’]
[‘Crew’]
[‘1st’, ‘Male’, ‘Adult’, ‘No’]
….