DataCollector: Αυγούστου 2016

Κυριακή 28 Αυγούστου 2016

Explanations of exercise 3

Coding out missing data

I am going to continue with the variables H1WP13 which stands for the question how close do you feel to your father/step father e.t.c . Firstly, my dataset includes the value “8” which stands for the answer “don’t Know” and I want python to ignore these values.

Secondly, I am going to replace with NaN. Nan is how Python specifies missing data.

I repeat the process for the variable H1WP18B, which stands for how many have played a sport with their father in the past 4 weeks. Here I want to ignore the value refused with the code 6.

Coding in valid data, recording variables

If someone answered to the question “How close do you feel to your father”. It is rare to have given a positive answer to the next question “Have you played a sport with your father in the past 4 weeks”.

Secondary variables

I want to find children which in the past 4 weeks have played a sport and have gone shopping with their father.

H1WP18B stands for sport

H1WP18A stands for shopping

And assign the results into the variable H1WP18_both

Output of exercise 3

counts for original H1WP13

4 1211

8 1

1 75

5 2467

2 184

6 4

3 610

7 1952

counts for missing data how close do you fell your father

NaN 1

1.0 75

2.0 184

3.0 610

4.0 1211

5.0 2467

6.0 4

7.0 1952

Name: H1WP13, dtype: int64

counts for original H1WP18B played a sport with your father in the past 4 weeks

0 3170

8 3

1 1372

9 1

6 6

7 1952

counts for missing data how many played a sport with their father in the past 4 weeks

0.0 3170

1.0 1372

NaN 6

7.0 1952

8.0 3

9.0 1

H1WP13 with Blanks recoded as 11 and 99 set to NAN

1.0 75

2.0 184

3.0 610

4.0 1211

5.0 2467

6.0 4

7.0 1952

11.0 1

Name: H1WP13, dtype: int64

percentage %

4 0.186193

8 0.000154

1 0.011531

5 0.379305

2 0.028290

6 0.000615

3 0.093788

7 0.300123

H1WP18B H1WP18A H1WP18_both

0 0.0 0 0.0

1 0.0 0 0.0

2 0.0 0 0.0

3 7.0 7 14.0

4 7.0 7 14.0

5 0.0 0 0.0

6 0.0 0 0.0

7 7.0 7 14.0

8 7.0 7 14.0

9 7.0 7 14.0

4 categories

1=0%tile 2080

2=25%tile 2467

3=50%tile 1956

4=75%tile 1

Name: H1WP13_NEW, dtype: int64

H1WP13 1.0 2.0 3.0 4.0 5.0 6.0 7.0 11.0

H1WP13_NEW

1=0%tile 75 184 610 1211 0 0 0 0

2=25%tile 0 0 0 0 2467 0 0 0

3=50%tile 0 0 0 0 0 4 1952 0

4=75%tile 0 0 0 0 0 0 0 1

exercise 3 the program

# -*- coding: utf-8 -*-
"""
Created on Sun Aug 28 21:38:12 2016

@author: User
"""

#importing libraries
import pandas
import numpy

#This reads the data into a handy dataframe format.

mydata=pandas.read_csv('addhealth_pds.csv',low_memory=False)

#setting variables you will be working with to numeric
mydata['H1WP13'] = mydata['H1WP13'].convert_objects(convert_numeric=True)
mydata['H1WP18B'] = mydata['H1WP18B'].convert_objects(convert_numeric=True)
mydata['H1WP18B'] = mydata['H1WP18A'].convert_objects(convert_numeric=True)
#counts the original values for how close do you feel to your father
print ('counts for original H1WP13')
c1 = mydata['H1WP13'].value_counts(sort=False, dropna=False)
print(c1)

print('percentage %')
p1 = mydata['H1WP13'].value_counts(sort=False, normalize=True)
print (p1)

# recode missing values to python missing (NaN)
mydata['H1WP13']=mydata['H1WP13'].replace(8, numpy.nan)

# ask for a second frequency distribution

print ('counts for missing data how close do you fell your father')
c2 = mydata['H1WP13'].value_counts(sort=False, dropna=False)
print(c2)

#I do the same for the variable H1WP18B
print('counts for original H1WP18B')
c3 = mydata['H1WP18B'].value_counts(sort=False, dropna=False)
print(c3)

# recode missing values to python missing (NaN)-refused
mydata['H1WP18B']=mydata['H1WP18B'].replace(6, numpy.nan)
print ('counts for missing data how many played a sport with their father in the past 4 weeks')
c4 = mydata['H1WP18B'].value_counts(sort=False, dropna=False)
print(c4)

#coding in valid data
#recode missing values to numeric value, in this example replace NaN with 11
mydata['H1WP13'].fillna(11, inplace=True)
#recode 99 values as missing
mydata['H1WP13']=mydata['H1WP13'].replace(8, numpy.nan)

print ('H1WP13 with Blanks recoded as 11 and 8 set to NAN')
# check coding
chk2 = mydata['H1WP13'].value_counts(sort=False, dropna=False)
print(chk2)
ds2= mydata["H1WP13"].describe()
print(ds2)

#secondary variable
mydata['H1WP18_both']=mydata['H1WP18B'] + mydata['H1WP18A']

# subset variables in new data frame, sub1
sub1=mydata[['H1WP18B','H1WP18A', 'H1WP18_both']]

a = sub1.head (n=10)
print(a)

#-----------------------------------------------------------

print ('4 categories')
mydata['H1WP13_NEW']=pandas.qcut(mydata.H1WP13, 4, labels=["1=0%tile","2=25%tile","3=50%tile","4=75%tile"])
c4 = mydata['H1WP13_NEW'].value_counts(sort=False, dropna=True)
print(c4)

#crosstabs evaluating which ages were put into which AGEGROUP3
print (pandas.crosstab(mydata['H1WP13_NEW'], mydata['H1WP13']))

Κυριακή 21 Αυγούστου 2016

Explanations about the program

A random sample of 6504 children was asked the following question, “How close do you feel to your father/adoptive father e.t.c?” Of the total number 38% answered “very much” which is a significant percentage and only 0.1% answered not at all.

For the next question the same children were asked “How many have been played a sport with their father in the past 4 weeks?” The majority of them 49% did not have played any sport with their father.

For the next question “How many have been talked about school work or grades in the past 4 weeks?” The majority answered “yes”.

Exercise for the week 2

My first program in Python

https://drive.google.com/open?id=0BwzT1QOuEXN6QzdPSHdqQWhuc2M

or direct the code

# -*- coding: utf-8 -*-

"""

Created on Sun Aug 21 00:47:23 2016

@author: User

"""

#importing libraries

import pandas

import numpy

#This reads the data into a handy dataframe format.

mydata=pandas.read_csv('addhealth_pds.csv',low_memory=False)

#So that if we enter the command

mydata

# we get a summary of the data's structure

#the number of rows which stands for the observations

print('the population is:')

print(len(mydata))

# the number of columns which stands for the variables

print('The number of variables is:')

print(len(mydata.columns))

#setting variables you will be working with to numeric

mydata['H1WP13'] = mydata['H1WP13'].convert_objects(convert_numeric=True)

#ADDING TITLES

print ('How close do you feel to your father')

print('frequency')

c1 = mydata['H1WP13'].value_counts(sort=False,dropna=False)

print (c1)

print('percentage %')

p1 = mydata['H1WP13'].value_counts(sort=False, normalize=True)

print (p1)

print('cumulative frequency')

ct1= mydata.groupby('H1WP13').size()

print (ct1)

print('cumulative percentage')

pt1 = mydata.groupby('H1WP13').size() * 100 / len(mydata)

print (pt1)

print ('How many have played a sport with their father in the past 4 weeks?')

print('frequency')

c2 = mydata['H1WP18B'].value_counts(sort=False,dropna=False)

print (c2)

print('percentage %')

p2 = mydata['H1WP18B'].value_counts(sort=False, normalize=True)

print (p2)

print('cumulative frequency')

ct2= mydata.groupby('H1WP18B').size()

print (ct2)

print('cumulative percentage')

pt2 = mydata.groupby('H1WP18B').size() * 100 / len(mydata)

print (pt2)

print ('How many have talked about school work or grades with their father in the past 4 weeks?')

print('frequency')

c3 = mydata['H1WP17H'].value_counts(sort=False,dropna=False)

print (c3)

print('percentage %')

p3 = mydata['H1WP17H'].value_counts(sort=False, normalize=True)

print (p3)

print('cumulative frequency')

ct3= mydata.groupby('H1WP17H').size()

print (ct3)

print('cumulative percentage')

pt3 = mydata.groupby('H1WP17H').size() * 100 / len(mydata)

print (pt3)

#subset data to young adults who answered very much

sub1=mydata[(mydata['H1WP13']==5)]

#make a copy of my new subsetted data

sub2 = sub1.copy()

# frequency distritions on new sub2 data frame

print ('counts for very much')

c5 = sub2['H1WP13'].value_counts(sort=False)

print(c5)

print ('percentages for very much')

p5 = sub2['H1WP13'].value_counts(sort=False, normalize=True)

print (p5)

#upper-case all DataFrame column names - place afer code for loading data aboave

mydata.columns = map(str.upper, mydata.columns)

# bug fix for display formats to avoid run time errors - put after code for loading data above

pandas.set_option('display.float_format', lambda x:'%f'%x)

Σάββατο 13 Αυγούστου 2016

First Assignement-Data Management

Data set

After looking thoroughly all the data sets, I have decided to choose addhealth one. As I am a teacher in a high school, I strongly believe that a little research on puberty will help me to become a better teacher.

Topic of Interest (code book)- Relations with Parents

I am interested in Relations with Parents.

We all know that Parents play the primary role for the growth of their child. Furthermore, there is no doubt that dad has a tremendous impact on his child and every child needs a dad to count on.

In particularly how the relationship between father children can be influenced by the quality time a father has with his children.

First question

How close do you feel to your

{FATHER/ADOPTIVE FATHER/STEPFATHER/FOSTER

FATHER/etc.} associated with Which of these things (played a sport or

talked about your school work or grades )

have you done with your {FATHER/ADOPTIVE FATHER/

STEPFATHER/FOSTER FATHER/etc.} in the past 4 weeks?

The variables I am going to use are:

How close do you feel to your FATHER H1WP13

played a sport H1WP18B

talked about your school work or grades H1WP18H

A second topic is how too much expectations from the side of a father can also have negative influence on how a child feels close to his father.

The variables I will use are:

On a scale of 1 to 5, where 1 is low and 5 is high, how disappointed

would she be if you did not graduate from high school? H1WP12

Literature review

The two topics I am going to discuss are:

First

How the quality time, which a father spends with his children, proved crucial for a strongly relationship between them.

According to a Gallup Poll, 90.3 percent of Americans agree that fathers make a unique contribution to their children’s lives.”

Source:

Gallup Poll, 1996. National Center for Fathering. “Father Figures.”

Adolescence is a turbulent period for each child. During this period children try to find themselves and become more independent in order to be able to overcome the later difficulties of everyday life. They have to make decisions for the studies e.t.c. This situation causes anxious to them and many times it is possible that children become aggressive or take bad grades in school. In my opinion, children with involved fathers will have fewer difficulties in their adolescence as they feel close to their father and feel that they have someone to support them. We can build strong relationship during day-to-day life for example, to play a sport together or to have a few free minutes talking with them about their problems. For instance, let’s have a conversation about your school.

Source:

National center for fathering

(http://www.fathers.com/)

The second topic is

How do parental and particularly father expectations affect the relationship with their kids.

We live in a competitive society and in order our children to survive and to find a dream job they should have at least a degree. As a result, many times parents pressure their children to excel. On the other hand, due to the fact that the children want to be approved by their parents feel that it is difficult for them to address the high expectations of their parents and finally isolated from them.

Source:

https://mom.me/