Posted: September 16th, 2017
Question 1 – Canadian Transport Accidents
Statistics Canada records the number of transport accidents involving dangerous goods
that occur in Canada every year. Transport accidents in Canada involving dangerous
goods must be reported to the government. Moreover, Statistics Canada collects data on
transport accidents involving dangerous goods across Canada and for each of the
provinces.
The spreadsheet Transport_Accidents has data on transport accidents nationally and by
province for the period from 1987 to 2011. Data are provided for all transport modes and
separately for road, rail and air. There is also a category “Facility”, but we will not use it
in this assignment.
a) Using data in the spreadsheet, construct a contingency table of Transport Mode
(columns) by Province/Territories (rows), which will show how accidents are distributed
across the three modes of transport and across different provinces for the year of 2010
(Hint: include only three modes of transport (road, rail and air) and don’t forget to
include row and column totals.)
b) Create another contingency table of Transport Mode by Province with row and column
percentages. Be sure to add appropriate labels for each of the rows and columns, not just
percentages.
c) Explain what the row percentages and column percentage mean in words. You can use
specific percentages as examples. Is it more appropriate to use row or column
percentages?
d) Does it appear that the proportion of accidents for different transport modes differ for
different provinces? Are Transport Mode and Province independent variables? Explain
your answer.
Notes:
1) The spreadsheet has more data than you need for this question. Only data for 2010
and only for three transport modes (road, rail and air) should be used. Do not use data
for “Facility”.
2) Use Excel’s pivot tables to extract data for 2010.
3) This problem is designed to show students how to work with “raw” data downloaded
from a statistical database.
3
Question 2 – Canadian Exports (modified from Mini Case Study, p. 133)
Statistics on Canadian exports are used in a variety of applications from forecasting
Canada’s gross domestic product to foreign exchange earnings to planning capacity at
Canadian ports. Monthly export data on exports for the period from January 1999 to
December 2008 are contained in the spreadsheet Canadian_Exports.
These data are sourced from Statistics Canada for four selected products: wheat, zinc,
fertilizer and industrial machinery. Exports are computed both based on “customs” and
“balance of payments” statistics. Customs data are based on physical movement of goods
out of Canada, while balance of payments data are based on currency exchange for goods
and services exported by Canada.
a) Using monthly data, construct four separate graphs of exports in each product category
based on the customs and the balance of payments data series. Explain what you observe
and comment on any interesting features of the four time series plots.
b) Explain what basis of calculation (customs or balance of payments) would be
appropriate for planning capacity in Canadian ports.
c) Compute annual averages for Wheat based on customs and balance of payments
monthly data. Construct a graph of exports for Wheat based on annual data (the graph
should display customs and balance of payments time series data) and comment on how
the time series plot based on monthly data differs from the plot based on annual data.
d) Using monthly data, construct two histograms of Wheat and Industrial Machinery
exports based on customs data. Label your charts clearly (title, x-axis, y-axis) and choose
appropriate intervals for the histograms. Describe the resulting distributions.
e) Using monthly data, compute the mean, standard deviation, median and five-point
summary for each of Wheat and Industrial Machinery exports based on customs data.
Report your results in a properly labeled table.
f) Create a scatterplot of exports of Wheat versus Industrial Machinery based on customs
data and describe the scatterplot (shape, direction, strength and outliers). Compute the
correlation coefficient and comment on whether it is consistent with the scatterplot. Is
there association between Wheat and Industrial Machinery exports? Can you say that one
causes the other or vice versa?
4
Question 3 – Crime in Canada (modified from Mini Case Study, p. 171)
Is crime worse in larger cities compared to smaller ones? Many people tend to believe
this, but what do the data actually say? There are many types of crime, some worse than
others. We need a way of combining all types of crime, weighted according to how
severe the crime is.
Statistics Canada has developed a “crime severity index” to measure the degree of crime
seriousness. More serious crimes are assigned higher weights, less serious offences are
assigned lower weights. As a result, the index reflects the overall severity of crime in a
given city (If interested, read the report referred to in the mini case study to understand
how the index is computed).
The spreadsheet Crime_in_Canada has the crime severity index and the population size
(in thousands) for select cities in Canada. Use the data in the spreadsheet to answer the
following questions.
a) Construct a scatterplot of Crime Severity Index on the vertical axis and Population on
the horizontal axis. Label the axes. Add the trendline.
b) State what relationship between Crime Severity Index and Population you expected to
see before constructing a scatterplot. Describe the relationship from the scatterplot.
Summarize in one or two sentences your reasoning why this type of relationship is
observed.
c) Compute the mean and standard deviation for both variables. Are the mean and the
standard deviation appropriate in summarizing the two variables? (Hint: use a histogram
to check the overall shape of the distribution for each variable).
d) Compute the correlation coefficient for the two variables. Is the correlation coefficient
consistent with the scatterplot?
e) Compute the slope and the intercept of the least-squares regression line by hand and
write the resulting regression equation. Compute the regression coefficients (slope and
intercept) using Excel and check that your results computed by hand are consistent with
the Excel output.
5
Question 4 – Association, Correlation and Simple Linear Regression
a) State whether the following statement is true or false. Explain your answer.
i. The correlation of -0.78 shows that there is almost no association between a
country’s GDP and Infant Mortality Rate.
ii. The correlation of -0.78 between GDP and Infant Mortality Rate implies that the
correlation between Infant Mortality Rates and GDP is 0.78.
iii. The correlation between GDP and Country is 0.44, showing a positive linear
relationship between the two variables.
iv. A very high correlation (r = 1.5) is observed between a country’s per capita GDP
and Living Standard Index.
b) Data on fuel consumption (y) of a car at various speeds (x) were collected. Fuel
consumption is measured in litres of gasoline and speed is measured in kilometers per
hour. A simple linear regression was fitted to the data; the residuals of the model were
computed and appear in the table below.
Residuals
10.09 2.24 -0.62 -2.47 -3.33 -4.28 -3.73 -2.94
-2.17 -1.32 -0.42 0.57 1.64 2.76 3.97
Speed (x) in km/hr
65 70 75 80 85 90 95 100
105 110 115 120 125 130 135
i. Make a scatterplot of the residuals versus speed. Describe the scatterplot.
ii. Compute the mean of the residuals. Explain why you get this result.
iii. Would you use the estimated linear regression line to predict fuel consumption
based on speed? Explain your answer.
Place an order in 3 easy steps. Takes less than 5 mins.