Select Page

The purpose of multiple regression analysis is to model the relationship between a dependent variable and multiple independent variables. In Chapters 12 and 13, we will study multiple regression and the general linear model as well as further regression topics. Search for a video, news item, or article (include the link in your discussion post) that gives you a better understanding of multiple regression or is an application in your field of study. Explain in your post why you chose this item and how your linked item corresponds to our Chapters 12 and 13 course objectives (Attatched). Then describe how you could use multiple regression in a life situation.note:: I need 2 different versionsChapter 12
Multiple Regression
and the
General Linear Model
STAT 441/541 Statistical Methods II
1
Sections Covered
• 12.1 Introduction
• 12.2 The General Linear Model
• 12.3 Estimating Multiple Regression Coefficients
• 12.4 Inferences in Multiple Regression
• 12.5 Testing a Subset of Regression Coefficients
• 12.6 Forecasting Using Multiple Regression
• 12.7 Comparing the Slopes of Several Regression Lines
2
Section 12.1 Introduction
• The multiple regression model relates a dependent
variable to a set of independent variables
= 0 + 1 1 + 2 2 + ⋯ + + + ε
• The only restriction is that no independent variable
is a perfect linear function of any other
independent variables
• The parameter 0 is the -intercept and is the
expected value of when each = 0. Only
meaningful if it makes sense to have each = 0
• The other parameters 1 , ⋯ are partial slope
parameters and represent the expected change in
for a unit increase in when all other ′ are held
constant.
Note: Expected value is the same as average value
3
Examples of Multiple Regression
Models
• First-order model
= 0 + 1 1 + 2 2 + 3 3 + ε
• Model with an interaction term
= 0 + 1 1 + 2 2 + 3 1 2 + ε
• Polynomial Model
= 0 + 1 1 + 2 12 + 3 13 + ε
4
Assumptions for Multiple
Regression
• The model has been properly specified
• The variance of the errors is 2 for all observations
• The errors are independent
• The error terms are normally distributed and there
are no outliers
5
Some Limitations of Regression
Analysis
• The existence of a relationship does not imply that
changes in the independent variables cause
changes in the dependent variable (cause and
effect)
• Do not use an estimated regression equation for
extrapolation outside the range of values for all
independent variables
6
Section 12.2 The General Linear
Model
• The general linear model has the form
= 0 + 1 1 + 2 2 + ⋯ + + ε
• The ′ represent
• Quantitative independent variables (this may include
polynomial and cross-product terms)
= 0 + 1 1 + 2 2 + 3 1 2 + 4 12 + 5 22 + ε
• Qualitative independent variables (dummy variables)
• Both quantitative and qualitative independent variables
• The least squares prediction equation is
� = ̂0 + ̂1 1 + ̂2 2 + ⋯ + ̂
7
Why is this called a general linear
model?
• The word “linear” in the general linear model refers
to how the ′ are entered in the model
• The word “linear” does not refer to how the
independent variables appear in the model (since
there are polynomial and interaction terms)
• A general linear model is linear in the
• The do not appear as an exponent or as the
argument of a nonlinear function
• Nonlinear example: = 1 1 2 2 +
Note: The general linear model is used in Chapters 12
through 18
8
Section 12.3 Estimating Multiple
Regression Coefficients
• The multiple regression model relates a dependent
variable to a set of quantitative independent variables.
• For a random sample of n measurements, the ith
observation is
= 0 + 1 1 + 2 2 + ⋯ + + + ε
for = 1, 2, … , ; and > ,
where =number of observations and
=number of partial slope parameters in the model for
the ’s
• The method of least-squares is used to estimate all
coefficients in the model 0 , 1 , … ,
• Each coefficient refers to the effect of changing that
variable while other independent variables stay
constant
9
Model Standard Deviation
• It is important to estimate the model standard
deviation
• Residuals, , are used to estimate
• The estimate of the model standard deviation is
the square root of MS(Residual), also called
MS(Error)
Residual: = − �
=
2
− +1
=

10
Section 12.4 Inferences in
Multiple Regression
• Inferences about any of the parameters in the
general linear model are the same as for the simple
linear regression model
• The coefficient of determination, 2 , is the
proportion of the variation in the dependent
variable, , that is explained by the model relating
to 1 , 2 , … ,
• Multicollinearity is present when the independent
variables are themselves highly correlated
11
Overall Model Test
• Hypotheses
• 0 : 1 = 2 = ⋯ = = 0
• : At least one ≠ 0
• Test Statistic: =
( )
( )
• Use p-value from output and compare to
• Check assumptions and draw conclusions
According to the null hypothesis, none of the
variables included in the model has any predictive
value
If the null hypothesis is rejected, there is good
evidence of some degree of predictive value
somewhere among the independent variables
12
Effect of Multicollinearity
• If the independent variable is highly correlated
with one or more other independent variables,
than the parameter estimates are inaccurate and
have large standard errors
• The variance inflation factor (VIF) measures how
much the variance of a coefficient is increased
because of multicollinearity
• If VIF=1, there is no multicollinearity
• If VIF>10, there may be a serious problem
13
Hypothesis Test for = 0
• Hypotheses
• Case 1: 0 : ≤ 0 versus : > 0
• Case 2: 0 : ≥ 0 versus : < 0 • Case 3: 0 : = 0 versus : ≠ 0 • Test Statistic: value from software output • Compare p-value from output to significance level • Reject the null hypothesis 0 if p-value ≤ (If p-value is low, 0 must go) • Fail to reject the null hypothesis 0 if p-value >
(If p-value is high, with 0 we must comply)
• Check assumptions and draw conclusions
14
Hypothesis Test for = 0 (continued)
• The null hypothesis does not assert that the
independent variable has no predictive value by
itself
• It asserts that it has no additional predictive value
over and above that contributed by the other
independent variables in the model
• When two or more independent variables are
highly correlated among themselves, it often
happens that no can be shown to have unique
predictive value, even though the ’s together have
been shown to be useful
15
Section 12.5 Testing a Subset of
Regression Coefficients
• The F test for a subset of regression coefficients
tests simultaneously whether several of the true
coefficients are zero
• If several of the independent variables have no
predictive value then they can be dropped from the
model
16
Test of a Subset of Independent
Variables
• Hypotheses
• 0 : +1 = +2 = ⋯ = = 0
• : The null hypothesis is not true
• Test Statistic: test comparing complete and
reduced models
• Use p-value from output and compare to
• Check assumptions and draw conclusions
17
Section 12.6 Forecasting using
Multiple Regression
• One of the major uses for multiple regression
models is in forecasting a -value given certain
values of the independent variables
• The best forecast is substituting the specified values into the estimated regression equation
• The standard error of a forecast depends on the
interpretation of the forecast
18
Two Interpretations for Forecasts
• The forecast of for given -values can be
interpreted two ways
1. As the estimate for ( ), the long-run average values from averaging many observations of when
the ′ have the specified values
2. The predicted value for one individual case having
the given -values
• We will use software to calculate confidence and
prediction intervals
19
Section 12.7 Comparing the
Slopes of Several Regression Lines
• This topic represents a special case of the general
problem of constructing a multiple regression
equation for both qualitative and quantitative
independent variables
• See Example 12.20 for a comparison of two drug
products (the qualitative variable with levels A & B)
and three doses (the quantitative variable with
levels of 5, 10, and 20 mg)
20
Chapter 13
Further Regression
Topics
STAT 441/541 Statistical Methods II
1
Sections Covered
• 13.1 Introduction
• 13.2 Selecting the Variables (Step 1)
• 13.3 Formulating the Model (Step 2)
• 13.4 Checking Model Assumptions (Step 3)
2
Section 13.1 Introduction
• This chapter is devoted to putting multiple
regression into practice.
• First, decide on the dependent variable and
candidate independent variables for the regression
equation
• Second, consideration is given to selecting the form
of the multiple regression equation
• Third, check for violation of the underlying
assumptions
Note: This is an iterative process
3
Section 13.2 Selecting the
Variables (Step 1)
• Perhaps the most critical decision in constructing a
multiple regression model is the initial selection of
independent variables
• Knowledge of the problem area is critically important in
the initial selection of data
• Identify the dependent variable
• Work with experts to determine what independent variables
affect the dependent variable
• A major consideration in selecting independent
variables is multicollinearity
• Use a scatterplot matrix and (Pearson) correlation matrix to
examine relationships among independent variables
• Use variance inflation factors (VIF) to diagnosis
multicollinearity
4
Selection Procedures for
Independent Variables
• All possible regressions: all one variable models, all two
variable models, etc.
• Best subset regression: best one variable model, best two
variable model
variables in the model and remove one variable at a time
until a reasonable regression model is found
• Stepwise regression starts by adding one variable at a time,
checks to see if any variables should be removed, and
continues until a reasonable regression model is found
Note: One procedure is not universally accepted as better
than the others
5
Criterion for selecting the bestfitting model
• Estimated error variance “# = where
smaller is better
• Coefficient of determination # where larger is better
• Adjusted # which provides a penalty for each regression
coefficient included in the model where larger is better
• Mallow’s / statistic where the best-fitting model should
have / ≈ . For explanatory variables, = + 1
• Akaike’s information criterion (AIC) where smallest is best
• Bayesian information criterion (BIC) where smallest is
best
6
Section 13.3 Formulating the
Model (Step 2)
• This step refines the information gleaned from step 1
to develop a useful multiple regression model
• One technique to determine the form of each
independent variable is to examine the scatterplots
of residuals versus each independent variable
• Consider various transformations of the data
• Logarithmic transformations (usual natural log)
• Inverse transformation of the dependent variable (1/ )
7
Section 13.4 Checking Model
Assumptions (Step 3)
• Use diagnostic plots as previously used
• Shapiro-Wilk test for normality of residuals
• Use hat values and Cook’s distance to detect data
points having high leverage and/or high influence
• Hat values greater than 2(k+1) are considered high
leverage (where k=number of independent variables)
• Cook’s distances greater than one identify observations
that have high influence
8

attachment

#### Why Choose Us

• 100% non-plagiarized Papers
• Affordable Prices
• Any Paper, Urgency, and Subject
• Will complete your papers in 6 hours
• On-time Delivery
• Money-back and Privacy guarantees
• Unlimited Amendments upon request
• Satisfaction guarantee

#### How it Works

• Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
• Fill in your paper’s requirements in the "PAPER DETAILS" section.