Generalized Linear Models for Binary Data [Categorial Data Analysis]
-π(x) denotes probability of occuring heart disease
-Generalized linear model - 범주형 반응변수들에 대한 설명변수들의 효과 관측할 때 모형을 이용해서 하는 방법
-잘 적합된 모형을 만드는게 목표!
> y=c(24,35,21,30) #yes> n=c(1355,603,192,224) #no> x=c(0,2,4,5) #after changing snoring level into number> snoring=data.frame(y=y,n=n,x=x) #making information into data frame
> snoring
y n x
1 24 1355 0
2 35 603 2
3 21 192 4
4 30 224 5
> fit.lm=glm(cbind(y,n)~x,family=binomial(link="identity")) #run generalized linear model > summary(fit.lm) Call: glm(formula = cbind(y, n) ~ x, family = binomial(link = "identity")) Deviance Residuals: 1 2 3 4 0.04478 -0.21322 0.11010 0.09798 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.017247 0.003451 4.998 5.80e-07 *** x 0.019778 0.002805 7.051 1.77e-12 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 65.904481 on 3 degrees of freedom Residual deviance: 0.069191 on 2 degrees of freedom AIC: 24.322 Number of Fisher Scoring iterations: 3 so, fiited model is π(x) = 0.0172 + 0.0198x
> fit.logit=glm(cbind(y,n)~x,family=binomial)> summary(fit.logit) Call: glm(formula = cbind(y, n) ~ x, family = binomial) Deviance Residuals: 1 2 3 4 -0.8346 1.2521 0.2758 -0.6845 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.86625 0.16621 -23.261 < 2e-16 *** x 0.39734 0.05001 7.945 1.94e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 65.9045 on 3 degrees of freedom Residual deviance: 2.8089 on 2 degrees of freedom AIC: 27.061 Number of Fisher Scoring iterations: 4 so, fitted model is logit(π(x))=-3.86625 + 0.39734x
> fit.probit=glm(cbind(y,n)~x,family=binomial(link="probit"))> summary(fit.probit) Call: glm(formula = cbind(y, n) ~ x, family = binomial(link = "probit")) Deviance Residuals: 1 2 3 4 -0.6188 1.0388 0.1684 -0.6175 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.06055 0.07017 -29.367 < 2e-16 *** x 0.18777 0.02348 7.997 1.28e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 65.9045 on 3 degrees of freedom Residual deviance: 1.8716 on 2 degrees of freedom AIC: 26.124 Number of Fisher Scoring iterations: 4 so, fitted model is probit(π(x))=-2.06055+0.18777
> predict(fit.lm) #linear model 1 2 3 4 0.01724668 0.05680231 0.09635793 0.11613574 > predict(fit.logit,type="response") #logit model 1 2 3 4 0.02050742 0.04429511 0.09305411 0.13243885 > predict(fit.probit,type="response") #probit model 1 2 3 4 0.01967292 0.04599325 0.09518763 0.13099515 | |
세 모형 모두 코고는 수준이 심해질수록 심장병 발병률도 높아진다는 동일한 결과 도출한다. | |
data resource : P.G Norton and E.V.Dunn, Br, Med, J., 291:630-632, 1985, published by BMJ Publishing Group |
댓글
댓글 쓰기