Working with complex datasets, often custom code is needed for an intended solution. However, when designing custom code, the use object-oriented design practices promote code reusability and ease to update & extend funtionality. In this post, I’m going to look at the Titanic data (of the 2k passengers, which survived, etc), you can download here Titanic dataset.
This is a basic dataset, however the principles can be applied to more complex data sets. The idea is to perform different operations on each row of data depending on the passenger class (1st, 2nd, 3rd, or the ship’s crew). This could be accomplished by using a bunch of “if, else” statements, but again I’m looking for clean and reusable code here, and when working with complex data sets, it’s a much better approach.
For your reference, here is the top few rows of the data:
Class,Sex,Age,Survived
3rd,Male,Child,No
3rd,Female,Child,Yes
2nd,Male,Adult,No
Crew,Male,Adult,No
1st,Male,Adult,No
And here is the code (with comments):
def readData():
# source data column mapping: Passenger,Class,Sex,Age,Survived
hf = HandlerFactory() #create instance
with open('titanic.csv',"r") as fl:
next(fl) # skip header
for lines in fl:
lines = lines.strip('\n')
fields = lines.split(',')
passenger_class = fields[1]
handler = hf.getHandler(passenger_class)
handler.apply(fields)
class HandlerFactory():
def __init__(self):
self.handlers = {} #dict for each handler class
self.register(CrewHandler())
self.register(FirstClassHandler())
self.register(SecondClassHandler())
self.register(ThirdClassHandler())
def register(self,handler):
self.handlers[handler.getField()] = handler #add each class to dict
def getHandler(self,fld):
return self.handlers[fld] # returns the python class to handle specific passenger class
class FieldHander: # base python class to be inherited
def __init__(self, field):
self.field = field
def getField(self):
return self.field
def setField(self, field):
self.field = field
class CrewHandler(FieldHander): # print passenger class and survived
def __init__(self):
FieldHander.__init__(self, "Crew")
def apply(self, fields):
print ([fields[1]])
class FirstClassHandler(FieldHander): # print passenger class, gender, age, and survived
def __init__(self):
FieldHander.__init__(self, "1st")
def apply(self, fields):
print ([fields[1], fields[2], fields[3], fields[4]])
class SecondClassHandler(FieldHander): # print passenger class, gender, and survived
def __init__(self):
FieldHander.__init__(self, "2nd")
def apply(self, fields):
print ([fields[1], fields[2], fields[4]])
class ThirdClassHandler(FieldHander): # print passenger class and survived
def __init__(self):
FieldHander.__init__(self, "3rd")
def apply(self, fields):
print ([fields[1], fields[4]])
And the results (notice how each class of passenger has different fields chosen):
[‘3rd’, ‘No’]
[‘3rd’, ‘Yes’]
[‘2nd’, ‘Male’, ‘No’]
[‘Crew’]
[‘1st’, ‘Male’, ‘Adult’, ‘No’]
….