6

Math3022

Question 1. [12 marks]

Which university campus tends to have warmer maximum temperatures in September? To test this, a random sample of 23 September days was taken from 2016–2020 and the maximum temperatures were recorded at the Newcastle University and Gosford weather stations, the nearest stations to the

Callaghan and Central Coast campuses, respectively. The file campustemp.omv contains the data, which includes the following variables:

•   Year: the year of the recording

•   Month: the month of the recording

•   Day: the day of the recording

•   Newcastle: the reported daily maximum temperature at the Newcaslte University weather station

•   Gosford: the reported daily maximum temperature at the Gosford weather station

(a)   [1 mark] Are the September daily maximum temperature observations at the Newcastle University and Gosford stations paired or independent? Write a sentence justifying your choice.

(b)   [6 marks] Is there evidence that the average daily maximum temperature in September differs between the Newcastle University and Gosford weather stations? Conduct the appropriate test in jamovi and include all relevant output. Be sure to define any parameters you use, state the null and alternative hypotheses, observed test statistic, null distribution, p-value, decision and provide an appropriate conclusion in plain language.

(c)   [2 marks] Report the 95% confidence interval for the average difference in the daily maximum temperature in September between the Newcastle University and Gosford weather stations. Write a sentence interpreting this interval in plain language.

(d)   [1 mark] Does your confidence interval from part (c) support the decision you made in part (b)?

(e)   [2 marks] What are the assumptions of your analyses in parts (b) and (c)? Are these assumptions met? Justify why or why not for each assumption, with appropriate references to jamovi output where needed.

Question 2. [14 marks]

Over time, the way that music has been consumed has changed, shifting from promotion via physical products, radio play, and music videos to digital downloads and streaming through audio and video platforms such as Spotify, Apple Music, and YouTube. To account for changing methods of consumption, various national charts have adapted their methodologies accordingly. Is there a shift in the typical lengths of songs that become popular which is associated with the changing ways of music consumption and chart accounting? To test this, a random sample of 50 songs that reached the Top 40 of the U.S. Billboard Hot 100 was taken from each of the 1980s, 1990s, 2000s, 2010s, and 2020s, with the data recorded in the file billboard.omv.

The file includes the following variables:

•   Artist: the name of the artist(s)

•   Song: the song title

•   Peak: the peak position on the Hot 100 chart

•   Decade: the decade the song reached the top 40

•   Length: the length of the song, in seconds

(a)   [2 marks] Produce a side-by-side boxplot and descriptive statistics to explore the relationship between decade and song length of U.S. Top 40 hits. Describe the relationship between the length of the songs and the decade in which they charted on the Hot 100.

(b)   [6 marks] Is there evidence of a difference in average length of songs that charted in the U.S. Top 40 among the five decades assessed? Be sure to state the null and alternative hypotheses, test statistic, null distribution, p-value, decision and an appropriate conclusion in plain language.

(c)   [3 marks] If appropriate, perform post-hoc tests to determine which decades have significantly different average song lengths. If post-hoc tests are not appropriate, explain the purpose of a post-hoc test and why it’s not appropriate in this example.

(d)   [3 marks] What are the assumptions of the analysis performed in part (b)? State whether each assumption is reasonable with reference to appropriate jamovi output.

Question 3. [20 marks]

The file downloads.omv contains observations of the amount of time someone spent online and the amount of memory they downloaded for 40 randomly sampled clients. The variables included in the dataset are:

•   Time: the amount of time the client was online (in minutes)

•   Memory: the amount of memory downloaded while online, in megabytes

(a)   [2 marks] Generate a scatterplot with the amount of memory downloaded on the y-axis and the time online on the x-axis and add the fitted regression line. Briefly describe the relationship.

(b)   [3 marks]         Write down the equation for the estimated regression line and provide an interpretation of the intercept and the slope coefficient.

(c)   [1 mark] Predict the amount of memory downloaded for a client who spends 20 minutes online.

(d)   [6 marks] Is there a statistically significant linear relationship between the amount of memory downloaded and the time someone spends online? Be sure to state the null and alternative hypotheses, test statistic, null distribution, p-value, decision and an appropriate conclusion in plain language.

(e)   [4 marks] State the assumptions necessary for your regression analysis in part (d) to be appropriate. State whether each of them is satisfied with a brief justification. This justification may refer to appropriate output from jamovi.

(f)     [2 marks] Provide a 95% confidence interval for the slope of the population regression line of the amount of memory downloaded on the amount of time someone spends online. Write an interpretation of this interval.

(g)   [2 marks]         Write down the R2 value for this regression and give an interpretation.

Comments