If any of my solutions look wrong, please refer to the mark scheme. You can exit full-screen mode for the question paper and mark scheme by clicking the icon in the bottom-right corner or by pressing Esc on your keyboard.
Edexcel A-Level Statistics (Paper 31) – Oct 2020
Mark Scheme Legend
- M1: Method mark – knowing a method and attempting to apply it.
- A1: Accuracy mark – can only be awarded if the M mark is achieved.
- B1: Independent accuracy mark.
- ft: Follow through – marks awarded for correct working based on previous errors.
- (1): Total marks for that part.
Table of Contents
Question 1 (8 marks)
The Venn diagram shows the probabilities associated with four events, \( A, B, C \) and \( D \).
(a) Write down any pair of mutually exclusive events from \( A, B, C \) and \( D \). (1)
Given that \( P(B) = 0.4 \)
(b) Find the value of \( p \). (1)
Given also that \( A \) and \( B \) are independent
(c) Find the value of \( q \). (2)
Given further that \( P(B’ | C) = 0.64 \)
(d) Find
(i) the value of \( r \)
(ii) the value of \( s \)
(4)
Worked Solution
Step 1: Identifying Mutually Exclusive Events
What are we looking for? Events that cannot happen at the same time. Visually, their circles do not overlap.
Looking at the diagram:
- Circle \( A \) and Circle \( C \) do not overlap.
- Circle \( D \) is inside \( A \), and \( B \) is separate from \( D \).
- Circle \( D \) is inside \( A \), so it cannot touch \( C \) (since \( A \) doesn’t touch \( C \)).
Answer: \( A \) and \( C \) (or \( D \) and \( B \), or \( D \) and \( C \))
✓ (B1)
Step 2: Finding \( p \)
Why do this? We are given the total probability of event \( B \). The circle \( B \) contains three regions with probabilities \( 0.24 \), \( 0.07 \), and \( p \).
Final Answer: \( p = 0.09 \)
✓ (B1)
Step 3: Finding \( q \) using Independence
Why do this? Independence between \( A \) and \( B \) means \( P(A \cap B) = P(A) \times P(B) \). We know \( P(A \cap B) \) from the intersection and \( P(B) \).
From the diagram, \( P(A \cap B) = 0.24 \).
Using independence:
\[ P(A \cap B) = P(A) \times P(B) \] \[ 0.24 = P(A) \times 0.4 \] \[ P(A) = \frac{0.24}{0.4} = 0.6 \]Now, \( P(A) \) is the sum of all regions in circle \( A \): \( q \), \( 0.16 \), and \( 0.24 \).
\[ P(A) = q + 0.16 + 0.24 \] \[ 0.6 = q + 0.40 \] \[ q = 0.6 – 0.40 \]Final Answer: \( q = 0.20 \)
✓ (A1)
Step 4: Finding \( r \) using Conditional Probability
Why do this? We are given \( P(B’ | C) = 0.64 \). This refers to the probability of “Not B” given we are inside “C”.
Inside circle \( C \), there are two regions: \( p \) (which is in \( B \)) and \( r \) (which is in \( B’ \)).
\[ P(B’ | C) = \frac{P(B’ \cap C)}{P(C)} = \frac{r}{r + p} \]We know \( p = 0.09 \). Substitute into the equation:
\[ \frac{r}{r + 0.09} = 0.64 \] \[ r = 0.64(r + 0.09) \] \[ r = 0.64r + 0.0576 \] \[ r – 0.64r = 0.0576 \] \[ 0.36r = 0.0576 \] \[ r = \frac{0.0576}{0.36} \] \[ r = 0.16 \]Final Answer (i): \( r = 0.16 \)
✓ (M1 A1)
Step 5: Finding \( s \) (Sum of Probabilities)
Why do this? The sum of all probabilities in the Venn diagram must equal 1.
Summing all regions:
\[ (q + 0.16 + 0.24) + 0.07 + (p + r) + s = 1 \]We know \( P(A) = 0.6 \), \( P(B \text{ only}) = 0.07 \), \( P(C) = p + r = 0.09 + 0.16 = 0.25 \).
\[ 0.6 + 0.07 + r + s = 1 \]Wait, carefully summing disjoint pieces:
\[ P(A) + P(B \text{ only}) + P(C \text{ only}) + s = 1 \]We need to be careful not to double count. Let’s list the disjoint regions:
- Regions in A: \( 0.2 + 0.16 + 0.24 = 0.6 \)
- Region in B only: \( 0.07 \)
- Region in C only (r): \( 0.16 \) (Note: \( p \) was already counted in A? No, \( p \) is in \( B \cap C \). A and C are disjoint.)
Let’s list all unique values again:
\[ q(0.20) + 0.16 + 0.24 + 0.07 + p(0.09) + r(0.16) + s = 1 \] \[ 0.20 + 0.16 + 0.24 + 0.07 + 0.09 + 0.16 + s = 1 \] \[ 0.92 + s = 1 \] \[ s = 1 – 0.92 \]Final Answer (ii): \( s = 0.08 \)
✓ (M1 A1)
Question 2 (7 marks)
A random sample of 15 days is taken from the large data set for Perth in June and July 1987.
The scatter diagram below displays the values of two of the variables for these 15 days.
Figure 1
(a) Describe the correlation. (1)
The variable on the \( x \)-axis is Daily Mean Temperature measured in °C.
(b) Using your knowledge of the large data set,
(i) suggest which variable is on the \( y \)-axis,
(ii) state the units that are used in the large data set for this variable. (2)
Stav believes that there is a correlation between Daily Total Sunshine and Daily Maximum Relative Humidity at Heathrow. He calculates the product moment correlation coefficient between these two variables for a random sample of 30 days and obtains \( r = -0.377 \).
(c) Carry out a suitable test to investigate Stav’s belief at a 5% level of significance. State clearly your hypotheses and your critical value. (3)
On a random day at Heathrow the Daily Maximum Relative Humidity was 97%.
(d) Comment on the number of hours of sunshine you would expect on that day, giving a reason for your answer. (1)
Worked Solution
Step 1: Interpreting the Scatter Diagram
What does the graph tell us? As the \( x \) values (Temperature) increase from 10 to 20, the \( y \) values generally decrease.
Answer: Negative correlation.
✓ (B1)
Step 2: Large Data Set Knowledge (Perth)
Context: Perth is in the Southern Hemisphere, so June/July is winter. We are looking for a variable that decreases as daily mean temperature increases, or generally has this negative relationship.
In winter, higher temperatures might be associated with clear skies (less rain) or specific pressure systems. Common variables are Rainfall, Pressure, Wind Speed, etc.
Reasoning: Rainfall is often higher when it is colder/stormier? Actually, in meteorology, high pressure is often associated with clear skies (extreme temps) but rain is a good candidate for negative correlation with temperature in some contexts, or pressure.
Accepted Answers: Rainfall or Pressure.
(i) Variable: Rainfall (or Pressure)
(ii) Units: mm (if Rainfall) or hPa/mb (if Pressure)
✓ (B1)
✓ (B1)
Step 3: Hypothesis Testing for Correlation
Why do this? We want to test if the calculated correlation \( r = -0.377 \) is statistically significant. “Stav believes there is a correlation” implies a two-tailed test (he didn’t specify positive or negative).
Sample size: \( n = 30 \).
Level: 5%.
Hypotheses:
\[ H_0: \rho = 0 \quad (\text{No correlation}) \] \[ H_1: \rho \neq 0 \quad (\text{There is a correlation}) \]Critical Value: From the Product Moment Correlation Coefficient table for \( n=30 \) at 5% significance (two-tailed):
\[ \text{Critical Value} = \pm 0.3610 \]Comparison:
Our test statistic is \( r = -0.377 \).
Since \( |-0.377| > 0.3610 \) (or \( -0.377 < -0.3610 \)), the result is in the critical region.
Conclusion: Reject \( H_0 \).
There is sufficient evidence at the 5% level to support Stav’s belief that there is a correlation between Sunshine and Humidity.
✓ (B1 M1 A1)
Step 4: Interpreting Correlation in Context
Scenario: Humidity is very high (97%).
Relationship: We found a significant negative correlation (\( r = -0.377 \)). This means as Humidity increases, Sunshine tends to decrease.
Comment: Expect a low amount of sunshine.
Reason: Because there is a negative correlation between humidity and sunshine (or simply, high humidity often implies cloud/fog).
✓ (B1)
Question 3 (10 marks)
Each member of a group of 27 people was timed when completing a puzzle. The time taken, \( x \) minutes, for each member of the group was recorded. These times are summarised in the following box and whisker plot.
(a) Find the range of the times. (1)
(b) Find the interquartile range of the times. (1)
For these 27 people \( \sum x = 607.5 \) and \( \sum x^2 = 17623.25 \)
(c) Calculate the mean time taken to complete the puzzle. (1)
(d) Calculate the standard deviation of the times taken to complete the puzzle. (2)
Taruni defines an outlier as a value more than 3 standard deviations above the mean.
(e) State how many outliers Taruni would say there are in these data, giving a reason for your answer. (1)
Adam and Beth also completed the puzzle in \( a \) minutes and \( b \) minutes respectively, where \( a > b \). When their times are included with the data of the other 27 people:
- the median time increases
- the mean time does not change
(f) Suggest a possible value for \( a \) and a possible value for \( b \), explaining how your values satisfy the above conditions. (3)
(g) Without carrying out any further calculations, explain why the standard deviation of all 29 times will be lower than your answer to part (d). (1)
Worked Solution
Step 1: Reading the Box Plot
We read the key values from the graph:
- Minimum value: 7
- Lower Quartile (Q1): 14
- Median: 20
- Upper Quartile (Q3): 25
- Maximum value (outlier): 68
(a) Range:
\[ \text{Range} = \text{Max} – \text{Min} = 68 – 7 = 61 \](b) Interquartile Range (IQR):
\[ \text{IQR} = Q3 – Q1 = 25 – 14 = 11 \](a) 61
(b) 11
Step 2: Calculating Mean and Standard Deviation
(c) Mean (\(\bar{x}\)):
\[ \bar{x} = \frac{\sum x}{n} = \frac{607.5}{27} = 22.5 \](d) Standard Deviation (\(\sigma\)):
\[ \sigma = \sqrt{\frac{\sum x^2}{n} – \bar{x}^2} \] \[ \sigma = \sqrt{\frac{17623.25}{27} – 22.5^2} \] \[ \sigma = \sqrt{652.7129… – 506.25} \] \[ \sigma = \sqrt{146.46…} \approx 12.1 \](c) 22.5
(d) 12.1 (3 s.f.)
Step 3: Checking for Outliers
Definition: \( \text{Mean} + 3\sigma \)
\[ 22.5 + 3(12.1) = 22.5 + 36.3 = 58.8 \]Any value greater than 58.8 is an outlier.
Looking at the box plot, there is one outlier at approx 68, and another at approx 48.
48 is less than 58.8.
68 is greater than 58.8.
Answer: 1 outlier.
Reason: Only the value at 68 is greater than \( \mu + 3\sigma \) (58.8).
Step 4: New Values \( a \) and \( b \)
Constraints:
- \( a > b \)
- Mean stays the same.
- Median increases.
Mean Condition: For the mean to stay the same when adding two numbers, the mean of the two new numbers must equal the original mean.
\[ \frac{a + b}{2} = 22.5 \implies a + b = 45 \]Median Condition: Original median is 20 (from 27 values). To increase the median, we need more values above the current median than below.
Currently, 13 below, 1 at 20, 13 above. If we add two values both greater than 20, the median will shift upwards.
We need \( a > 20 \) and \( b > 20 \) so that the median moves up.
We also need \( a + b = 45 \).
Example: Let \( b = 21 \). Then \( a = 24 \).
Check: \( 21 > 20 \) and \( 24 > 20 \). Sum = 45. \( a > b \). Correct.
Answer: e.g., \( a = 24, b = 21 \)
Step 5: Effect on Standard Deviation
The standard deviation measures the average spread from the mean.
The original SD is approx 12.1.
The new values \( a \) and \( b \) (e.g., 24 and 21) are very close to the mean (22.5). Their distance from the mean is small (e.g., \( |24-22.5|=1.5 \)).
Adding values that are closer to the mean than the current standard deviation will reduce the overall spread.
Answer: The new values are closer to the mean than the standard deviation (values are within 1 SD of the mean), so the overall standard deviation will decrease.
Question 4 (10 marks)
The discrete random variable \( D \) has the following probability distribution:
| \( d \) | 10 | 20 | 30 | 40 | 50 |
| \( P(D=d) \) | \( \frac{k}{10} \) | \( \frac{k}{20} \) | \( \frac{k}{30} \) | \( \frac{k}{40} \) | \( \frac{k}{50} \) |
where \( k \) is a constant.
(a) Show that the value of \( k \) is \( \frac{600}{137} \). (2)
The random variables \( D_1 \) and \( D_2 \) are independent and each have the same distribution as \( D \).
(b) Find \( P(D_1 + D_2 = 80) \). Give your answer to 3 significant figures. (3)
A single observation of \( D \) is made. The value obtained, \( d \), is the common difference of an arithmetic sequence. The first 4 terms of this arithmetic sequence are the angles, measured in degrees, of quadrilateral \( Q \).
(c) Find the exact probability that the smallest angle of \( Q \) is more than \( 50^\circ \). (5)
Worked Solution
Step 1: Finding \( k \)
The sum of probabilities for a discrete random variable must equal 1.
Factor out \( k \) and find a common denominator (600):
\[ k \left( \frac{60}{600} + \frac{30}{600} + \frac{20}{600} + \frac{15}{600} + \frac{12}{600} \right) = 1 \] \[ k \left( \frac{60+30+20+15+12}{600} \right) = 1 \] \[ k \left( \frac{137}{600} \right) = 1 \] \[ k = \frac{600}{137} \]Shown.
✓ (M1 A1)
Step 2: Finding \( P(D_1 + D_2 = 80) \)
We need pairs of values \((d_1, d_2)\) from the table that sum to 80.
Possible values for \( D \): 10, 20, 30, 40, 50.
Pairs summing to 80:
- 30 + 50
- 40 + 40
- 50 + 30
Calculate probabilities for each pair:
\[ P(30, 50) = P(D=30) \times P(D=50) = \frac{k}{30} \times \frac{k}{50} = \frac{k^2}{1500} \] \[ P(40, 40) = \frac{k}{40} \times \frac{k}{40} = \frac{k^2}{1600} \] \[ P(50, 30) = \frac{k}{50} \times \frac{k}{30} = \frac{k^2}{1500} \]Total Probability:
\[ P = k^2 \left( \frac{1}{1500} + \frac{1}{1600} + \frac{1}{1500} \right) \] \[ P = \left( \frac{600}{137} \right)^2 \left( \frac{2}{1500} + \frac{1}{1600} \right) \] \[ P \approx 19.179 \times (0.001333 + 0.000625) \] \[ P \approx 0.03756… \]Answer: 0.0376 (3 s.f.)
✓ (M1 M1 A1)
Step 3: Quadrilateral Angles
Problem: 4 angles in Arithmetic Sequence. Common difference \( d \). Sum of angles in a quadrilateral is \( 360^\circ \).
Let the angles be \( a, a+d, a+2d, a+3d \).
\[ \text{Sum} = 4a + 6d = 360 \] \[ 2a + 3d = 180 \implies 2a = 180 – 3d \implies a = 90 – 1.5d \]We need the smallest angle (\( a \)) to be \( > 50^\circ \).
\[ 90 – 1.5d > 50 \] \[ 40 > 1.5d \] \[ d < \frac{40}{1.5} \] \[ d < 26.67 \]The possible values for \( d \) (from the random variable \( D \)) are 10, 20, 30, 40, 50.
We require \( d < 26.67 \).
So, the possible values for \( d \) are 10 and 20.
We need the probability that \( D = 10 \) or \( D = 20 \).
\[ P(D=10) + P(D=20) = \frac{k}{10} + \frac{k}{20} \] \[ = k \left( \frac{2}{20} + \frac{1}{20} \right) = k \frac{3}{20} \] \[ = \frac{600}{137} \times \frac{3}{20} \] \[ = \frac{30}{137} \times 3 = \frac{90}{137} \]Answer: \( \frac{90}{137} \)
✓ (M1 M1 A1 M1 A1)
Question 5 (15 marks)
A health centre claims that the time a doctor spends with a patient can be modelled by a normal distribution with a mean of 10 minutes and a standard deviation of 4 minutes.
(a) Using this model, find the probability that the time spent with a randomly selected patient is more than 15 minutes. (1)
Some patients complain that the mean time the doctor spends with a patient is more than 10 minutes.
The receptionist takes a random sample of 20 patients and finds that the mean time the doctor spends with a patient is 11.5 minutes.
(b) Stating your hypotheses clearly and using a 5% significance level, test whether or not there is evidence to support the patients’ complaint. (4)
The health centre also claims that the time a dentist spends with a patient during a routine appointment, \( T \) minutes, can be modelled by the normal distribution where \( T \sim N(5, 3.5^2) \).
(c) Using this model,
(i) find the probability that a routine appointment with the dentist takes less than 2 minutes (1)
(ii) find \( P(T < 2 | T > 0) \) (3)
(iii) hence explain why this normal distribution may not be a good model for \( T \). (1)
The dentist believes that she cannot complete a routine appointment in less than 2 minutes. She suggests that the health centre should use a refined model only including values of \( T > 2 \).
(d) Find the median time for a routine appointment using this new model, giving your answer correct to one decimal place. (5)
Worked Solution
Step 1: Normal Probability Calculation
Let \( X \) be the time spent.
\[ X \sim N(10, 4^2) \]We want \( P(X > 15) \).
Using calculator (Normal CD):
- Lower: 15
- Upper: 10000
- \(\sigma\): 4
- \(\mu\): 10
Result: 0.105649…
Answer: 0.106 (3 s.f.)
✓ (B1)
Step 2: Hypothesis Test for Mean
Null Hypothesis: The mean is as claimed (10).
Alternative Hypothesis: The mean is greater (complaint).
Distribution of Sample Mean \(\bar{X}\): \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)
We test the probability of getting the observed mean (11.5) or higher:
\[ P(\bar{X} > 11.5) \]Using calculator: Lower: 11.5, Upper: 10000, \(\sigma\): \(\sqrt{0.8} \approx 0.8944\), \(\mu\): 10.
\[ P(\bar{X} > 11.5) = 0.04676… \]Compare to 5% (0.05):
\[ 0.0468 < 0.05 \]The result is significant.
Conclusion: Reject \( H_0 \). There is sufficient evidence to support the patients’ complaint.
✓ (B1 M1 A1 A1)
Step 3: Dentist Model Calculations
(i) \( P(T < 2) \):
Calc: Lower: -1000, Upper: 2, \(\sigma\): 3.5, \(\mu\): 5.
Result: 0.1956…
Answer: 0.196
(ii) \( P(T < 2 | T > 0) \):
Formula: \( \frac{P(0 < T < 2)}{P(T > 0)} \)
Numerator \( P(0 < T < 2) \): Calc (0 to 2) = \( 0.1191... \)
Denominator \( P(T > 0) \): Calc (0 to 1000) = \( 0.9234… \)
\[ \frac{0.1191…}{0.9234…} = 0.12899… \]Answer: 0.129
(iii) Validity:
The model suggests a probability of \( T < 0 \) is \( 1 - 0.9234 = 0.076 \). A negative time for an appointment is impossible, so the model is not good.
(i) 0.196
(ii) 0.129
(iii) Impossible negative times.
✓ (B1 M1 A1 A1 B1)
Step 4: Refined Model Median
Goal: Find median \( m \) for the truncated distribution \( T > 2 \).
The median splits the remaining probability in half.
Total probability for \( T > 2 \) is \( P(T > 2) \).
We calculate \( P(T > 2) = 1 – P(T < 2) = 1 - 0.1956 = 0.8044 \).
We need \( m \) such that \( P(2 < T < m) = 0.5 \times P(T > 2) \).
So, the cumulative probability from \( -\infty \) to \( m \) in the original normal distribution must be:
\[ P(T < m) = P(T < 2) + 0.4022 \] \[ P(T < m) = 0.1956 + 0.4022 = 0.5978 \]Using Inverse Normal on calculator:
- Area: 0.5978
- \(\sigma\): 3.5
- \(\mu\): 5
Result: \( m = 5.867… \)
Answer: 5.9 minutes (1 d.p.)
✓ (M1 M1 A1 M1 A1)