I'm brand to statistics and taking my 1st class in almost 40 years, so I'm quite a bit behind the times. On top of all of that, I am not very computer savvy, and have very little experience using any technical functions with computers, outside of checking email. I have no programming experience, let alone any knowledge of R Programming, so I'm having quite a bit of difficulty.

The class is just about to start in 2 weeks, however the instructor sent out some questions which she feels we need to be able to answer in order to be successful in the class.

I would appreciate anyone taking the time to go over this with me, explain the answer and how I go about completing this.

--------------

Here's what I have to do:

these are the url's to get the R function file and the .dat files which are needed to answer this (also posted at the bottom as an Embeded Code from file, from File Dropper:

http://www.filedropper.com/prefrontalgjt0z

http://www.filedropper.com/postfrontalgjt0z

http://www.filedropper.com/plotci1

http://www.filedropper.com/stepwise

These files are a set of variables from mountain windstorm events in Boulder, Colorado. The file predictor_names.txt is a list of the 28 variables, which correspond to the 28 columns of the 0z.dat files. The variables are described in mercer et al.pdf, which you have. The first step you must conduct is to read in the .dat file. One of the columns is labeled garbage data in the predictor_names.txt. This one should be removed. You should also grab the 0-12 hr peak wind speed (in m/s) column as a y. All variables in the list beginning with temperature advection, minus the garbage vector, should be used as predictors. Use them to answer the questions below.

a) Regarding the peak wind gust data (0-12 hour peak gusts)

i) How are the wind data distributed? Use a boxplot and five number summary on the prefrontal and postfrontal data to support your answer.

ii) What is the frequency of a “severe” windstorm (peak wind gusts greater than 25 m/s)? Use both datasets combined to answer this.

b) Use bootstrap confidence intervals on the means of each predictor to compare prefrontal with postfrontal environments. Which, if any, are statistically significantly different?

c) Create a matrix that combines the predictors for prefrontal and postfrontal storms (rbind them together, prefrontal then postfrontal). Compute a correlation matrix on this new matrix (use the cor function to do this, see ?cor for details). Roughly how many predictors are highly correlated (greater than 0.5)?

Now we will regress on those data vectors. Note I do not want you to do any cross-validation here.

a) Perform two separate stepwise regressions, one on the prefrontal data and one on the postfrontal data. Are the variables you kept the same for both? Which seems to be better based on the summary and anova statistics?

b) Using the predictor matrix you created in 1d, do a classification to determine the predictors’ ability to identify severe windstorms (wind speeds greater than 25 m/s). Create a logistic regression model using these data. Describe the quality of your regression using contingency statistics.

The class is just about to start in 2 weeks, however the instructor sent out some questions which she feels we need to be able to answer in order to be successful in the class.

I would appreciate anyone taking the time to go over this with me, explain the answer and how I go about completing this.

--------------

Here's what I have to do:

these are the url's to get the R function file and the .dat files which are needed to answer this (also posted at the bottom as an Embeded Code from file, from File Dropper:

http://www.filedropper.com/prefrontalgjt0z

http://www.filedropper.com/postfrontalgjt0z

http://www.filedropper.com/plotci1

http://www.filedropper.com/stepwise

These files are a set of variables from mountain windstorm events in Boulder, Colorado. The file predictor_names.txt is a list of the 28 variables, which correspond to the 28 columns of the 0z.dat files. The variables are described in mercer et al.pdf, which you have. The first step you must conduct is to read in the .dat file. One of the columns is labeled garbage data in the predictor_names.txt. This one should be removed. You should also grab the 0-12 hr peak wind speed (in m/s) column as a y. All variables in the list beginning with temperature advection, minus the garbage vector, should be used as predictors. Use them to answer the questions below.

a) Regarding the peak wind gust data (0-12 hour peak gusts)

i) How are the wind data distributed? Use a boxplot and five number summary on the prefrontal and postfrontal data to support your answer.

ii) What is the frequency of a “severe” windstorm (peak wind gusts greater than 25 m/s)? Use both datasets combined to answer this.

b) Use bootstrap confidence intervals on the means of each predictor to compare prefrontal with postfrontal environments. Which, if any, are statistically significantly different?

c) Create a matrix that combines the predictors for prefrontal and postfrontal storms (rbind them together, prefrontal then postfrontal). Compute a correlation matrix on this new matrix (use the cor function to do this, see ?cor for details). Roughly how many predictors are highly correlated (greater than 0.5)?

Now we will regress on those data vectors. Note I do not want you to do any cross-validation here.

a) Perform two separate stepwise regressions, one on the prefrontal data and one on the postfrontal data. Are the variables you kept the same for both? Which seems to be better based on the summary and anova statistics?

b) Using the predictor matrix you created in 1d, do a classification to determine the predictors’ ability to identify severe windstorms (wind speeds greater than 25 m/s). Create a logistic regression model using these data. Describe the quality of your regression using contingency statistics.

Last edited: