PVA – FUNDRAISING IDS 572 – DATA MININGASSIGNMENT 2Arati SinghNirav DedhiaSachin PandeyTable of ContentsContentsQuestion 1: Data Analysis        Question 2: Modelling        Question 3: Classification under Asymmetric Response and Cost        Question 1: Data AnalysisData Statistics:The table 1 in the appendix gives the mean, number of missing values, minimum and maximum values for each attribute. The appendix also includes the distribution of the variables.Following are some of the inferences from the data analysis:The Target_B variable shows 65% of donors and 35% of Non-donors.There are no missing values for Target_B, State, DOB, HIT, however there are missing values in AGE, HOMEOWNR, NUMCHILD, INCOME, and GENDER.The donors consist of 5367 Females and 4107 Males.The average age of the population is 61.6.The minimum age is 25 for the Donors.The maximum donors are from State CA.The average number of children is 1-2.The number of homeowners is 5478.Note: Click on DATA SUMMARY: for detailed viewData cleaning and transformation modelBelow we have described each process we performed on the attributes:Generate New Attribute: We used generate new attribute to replace the missing values with 0 and non-missing values with 1 and thus we created new variable for every variable transformed. The details of the variable generated and the function applied is given in the appendix table 4.Note: Click here Generate attribute for detailed view.Remove old attributes: We selected attribute for which we created a new attribute in the previous step and removed the old variable for the transformed variables. The details of the attributes removed is given below.        Remove old Attribute:CHILD03CHILD07CHILD12CHILD18DOMAINGENDERHOMEOWNRMAJORPEPSTRFLPVASTATERECINHSERECP3RECPGVGRECSWEEP Note: Click here  Remove old variables for detailed view.Eliminating less relevant Variables:Remove useless attributes: In this step, we eliminated variables which seemed less relevant to us to obtain donor/non donor target prediction. We deleted some variables which had more than 50% null values and variables which were highly skewed, also some variables like past date of donation or amount of donation, as these types of variables did not contribute much to modelling , we decided to delete these.The details of the less relevant attributes removed is given below.AttributesReason for removalADATE_1-ADATE_24These are all historical promotion values which we found irrelevant for prediction.MDMAUDMDMAUD_AMDMAUD_FMDMAUD_RThese variables represent Major donor matrix who have given gift previously which we think is not necessary for our target prediction. Also the values for this field is highly skewed.WEALTH2WEALTH1 is already considered making this redundant.ANC1 – ANC15Ancestry of persons is of no significance in predicting donors and non-donors.LSC1 – LSC4Language of persons is of no significance in predicting donors and non-donors.ODATEDW, OSOURCE, TCODE, DOB, NOEXCH, AGEFLAG, DATASOURCE, GEOCODE, LIFESRC, HPHONE_D, MAILCODE,These variables about Geocode, zip, Phone number and other donor’s basic info is also irrelevant for prediction of our Target variableRECHINSE,RECGVNG,RECP3,RECSWEEP,NUMCHILD, CHILD03-CHILD18,SOLP3,SOLIH,MAJOR,COLLECT1,VETERANS,BIBLE, CATALOG,HOME,PETS,CDPLAY,STEREO,FISHER,GARDEN,BOATS,WALKER,PEPSTRFL Some of the variables which were highly skewed and would over predict the results and thus eliminated.HHAGE1-HHAGE3,DW1-DW9,HV1-HV4 ,HU1-HU5,HHD1-HHD12,HHAS1-HHAS4,MC1-MC3,TPE1-TPE13,LFC1-LFC10,AFC1-AFC3,HC1-HC21,We also removed some Neighbor population attributes with skewed value and some which we found redundant with respect to target variable predictionTARGET_DAs suggested in the case we discarded this    Note: Click Here Remove useless variable for detailed view. How did we handle missing values: Map: In this step we transformed nominal variable with “?” by replacing with “N”. Replace Missing Value: In this step we replaced all numeric variables with unknown value as “0”.Below is the summary table of variables with the replace missing techniques used.Attribute ValueOriginal ValueMissing Values Replaced ByTransformation TechniqueDomain1st byte = U,C,S,T,R2nd byte=1,2,3N/ACut(Domain,0,1)collect1, cards, kidstuffY / N0if(Value=Y,1,0)CHILD03, CHILD07, CHILD12, CHILD18M, F, B0if(value=”M” || “F”|| “B”, “1”,” 0”)
June 1, 2021
