Generalized Linear Models for Binary Data [Categorial Data Analysis]

5월 06, 2019

기본 자료 - snoring이 heart disease의 위험요소인지 확인하기

-π(x) denotes probability of occuring heart disease

-Generalized linear model - 범주형 반응변수들에 대한 설명변수들의 효과 관측할 때 모형을 이용해서 하는 방법


-잘 적합된 모형을 만드는게 목표!



data 자료 만들기


 


> y=c(24,35,21,30) #yes> n=c(1355,603,192,224) #no> x=c(0,2,4,5) #after changing snoring level into number> snoring=data.frame(y=y,n=n,x=x) #making information into data frame

 

data 자료 확인 

 

> snoring
   y    n x
1 24 1355 0
2 35  603 2
3 21  192 4
4 30  224 5



linear probability model(using identity link function)


> fit.lm=glm(cbind(y,n)~x,family=binomial(link="identity")) #run generalized linear model > summary(fit.lm)


Call:
glm(formula = cbind(y, n) ~ x, family = binomial(link = "identity"))

Deviance Residuals: 
       1         2         3         4  
 0.04478  -0.21322   0.11010   0.09798  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.017247   0.003451   4.998 5.80e-07 ***
x           0.019778   0.002805   7.051 1.77e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 65.904481  on 3  degrees of freedom
Residual deviance:  0.069191  on 2  degrees of freedom
AIC: 24.322

Number of Fisher Scoring iterations: 3

so, fiited model is π(x) = 0.0172 + 0.0198x 


logistic regression model(using logit link function)

> fit.logit=glm(cbind(y,n)~x,family=binomial)> summary(fit.logit)

Call:
glm(formula = cbind(y, n) ~ x, family = binomial)

Deviance Residuals: 
      1        2        3        4  
-0.8346   1.2521   0.2758  -0.6845  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.86625    0.16621 -23.261  < 2e-16 ***
x            0.39734    0.05001   7.945 1.94e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 65.9045  on 3  degrees of freedom
Residual deviance:  2.8089  on 2  degrees of freedom
AIC: 27.061

Number of Fisher Scoring iterations: 4

so, fitted model is logit(π(x))=-3.86625 + 0.39734x

probit regression model(using probit link function)

> fit.probit=glm(cbind(y,n)~x,family=binomial(link="probit"))> summary(fit.probit)
Call:
glm(formula = cbind(y, n) ~ x, family = binomial(link = "probit"))

Deviance Residuals: 
      1        2        3        4  
-0.6188   1.0388   0.1684  -0.6175  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.06055    0.07017 -29.367  < 2e-16 ***
x            0.18777    0.02348   7.997 1.28e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 65.9045  on 3  degrees of freedom
Residual deviance:  1.8716  on 2  degrees of freedom
AIC: 26.124

Number of Fisher Scoring iterations: 4

so, fitted model is probit(π(x))=-2.06055+0.18777

fitted values

> predict(fit.lm) #linear model     
 1          2          3          4 
0.01724668 0.05680231 0.09635793 0.11613574 

> predict(fit.logit,type="response") #logit model        
 1          2          3          4 
0.02050742 0.04429511 0.09305411 0.13243885
  > predict(fit.probit,type="response") #probit model          
1          2          3          4 
0.01967292 0.04599325 0.09518763 0.13099515 

세 모형 모두 코고는 수준이 심해질수록 심장병 발병률도 높아진다는 동일한 결과 도출한다.

data resource : P.G Norton and E.V.Dunn, Br, Med, J., 291:630-632, 1985, published by BMJ Publishing Group

이 블로그 검색

세상을 바꾸는 스타트업 이야기

Generalized Linear Models for Binary Data [Categorial Data Analysis]

댓글

댓글 쓰기

이 블로그의 인기 게시물

[미국] 의학과 기술을 합치다, NURX

[핀란드] 신세대 음악교육플랫폼을 만들어 낸 YOUSICIAN