If any of my solutions look wrong, please refer to the mark scheme. You can exit full-screen mode for the question paper and mark scheme by clicking the icon in the bottom-right corner or by pressing Esc on your keyboard.
GCSE 2022 Edexcel Statistics Paper 31
Exam Guide
- Paper: 9MA0/31 (Statistics)
- Calculator: Allowed
- Total Marks: 50
- Format: Worked solutions with pedagogical explanations
Table of Contents
Question 1 (6 marks)
George throws a ball at a target 15 times.
Each time George throws the ball, the probability of the ball hitting the target is 0.48.
The random variable \( X \) represents the number of times George hits the target in 15 throws.
(a) Find
(i) \( P(X = 3) \)
(ii) \( P(X \ge 5) \)
(3)
George now throws the ball at the target 250 times.
(b) Use a normal approximation to calculate the probability that he will hit the target more than 110 times.
(3)
Worked Solution
Step 1: Part (a) – Binomial Distribution
What are we asking?
We need to calculate probabilities for a Binomial distribution where \( n=15 \) trials and probability of success \( p=0.48 \). So, \( X \sim B(15, 0.48) \).
(i) Find \( P(X = 3) \):
Using the binomial formula \( P(X=r) = \binom{n}{r} p^r (1-p)^{n-r} \):
\[ P(X=3) = \binom{15}{3} (0.48)^3 (1-0.48)^{12} \]Calculator method: Binomial PDF with \( x=3, n=15, p=0.48 \).
\[ P(X=3) = 0.019668… \] \[ P(X=3) \approx 0.0197 \text{ (3 s.f.)} \](ii) Find \( P(X \ge 5) \):
Cumulative probabilities on calculators usually find \( P(X \le x) \).
\[ P(X \ge 5) = 1 – P(X \le 4) \]Using calculator Binomial CDF with \( x=4, n=15, p=0.48 \):
\[ P(X \le 4) = 0.07986… \] \[ P(X \ge 5) = 1 – 0.07986… = 0.92013… \] \[ P(X \ge 5) \approx 0.920 \text{ (3 s.f.)} \]Why do we do \( 1 – P(X \le 4) \)?
Because the total probability is 1. To find the probability of getting 5 or more, we remove the probability of getting 4 or fewer.
Step 2: Part (b) – Normal Approximation
Method:
When \( n \) is large and \( p \) is close to 0.5, we can approximate the Binomial distribution with a Normal distribution \( Y \sim N(\mu, \sigma^2) \).
We need to find the mean \( \mu \) and variance \( \sigma^2 \).
Given \( n = 250 \) and \( p = 0.48 \):
\[ \mu = np = 250 \times 0.48 = 120 \] \[ \sigma^2 = np(1-p) = 250 \times 0.48 \times 0.52 = 62.4 \] \[ \sigma = \sqrt{62.4} \approx 7.899 \]So, \( Y \sim N(120, 62.4) \).
We want \( P(X > 110) \).
Continuity Correction: Since the Binomial is discrete and Normal is continuous, “more than 110” (meaning 111, 112…) starts at the boundary 110.5.
\[ P(X > 110) \approx P(Y > 110.5) \]Standardise to find \( Z \):
\[ Z = \frac{110.5 – 120}{\sqrt{62.4}} = \frac{-9.5}{7.899…} \approx -1.2026 \]Using calculator Normal CDF (Lower: 110.5, Upper: 9999, \( \mu: 120 \), \( \sigma: \sqrt{62.4} \)):
\[ P(Y > 110.5) = 0.88544… \]Final Answer:
(a)(i) 0.0197
(a)(ii) 0.920
(b) 0.885
Question 2 (12 marks)
A manufacturer uses a machine to make metal rods.
The length of a metal rod, \( L \) cm, is normally distributed with:
- a mean of 8 cm
- a standard deviation of \( x \) cm
Given that the proportion of metal rods less than 7.902 cm in length is 2.5%
(a) show that \( x = 0.05 \) to 2 decimal places.
(2)
(b) Calculate the proportion of metal rods that are between 7.94 cm and 8.09 cm in length.
(1)
The cost of producing a single metal rod is 20p.
A metal rod
- where \( L < 7.94 \) is sold for scrap for 5p
- where \( 7.94 \le L \le 8.09 \) is sold for 50p
- where \( L > 8.09 \) is shortened for an extra cost of 10p and then sold for 50p
(c) Calculate the expected profit per 500 of the metal rods.
Give your answer to the nearest pound.
(5)
The same manufacturer makes metal hinges in large batches.
The hinges each have a probability of 0.015 of having a fault.
A random sample of 200 hinges is taken from each batch and the batch is accepted if fewer than 6 hinges are faulty.
The manufacturer’s aim is for 95% of batches to be accepted.
(d) Explain whether the manufacturer is likely to achieve its aim.
(4)
Worked Solution
Step 1: Part (a) – Finding Standard Deviation
We are given a Normal distribution \( L \sim N(8, x^2) \) and a probability \( P(L < 7.902) = 0.025 \).
We need to use the inverse normal function to find the Z-value corresponding to the bottom 2.5%.
Using Inverse Normal on calculator (Area = 0.025, \( \mu = 0, \sigma = 1 \)):
\[ z = -1.96 \]Using the standardisation formula \( Z = \frac{X – \mu}{\sigma} \):
\[ -1.96 = \frac{7.902 – 8}{x} \] \[ -1.96 = \frac{-0.098}{x} \] \[ x = \frac{-0.098}{-1.96} = 0.05 \]Therefore, \( x = 0.05 \) cm.
Step 2: Part (b) – Finding Probability
We want \( P(7.94 \le L \le 8.09) \) with \( L \sim N(8, 0.05^2) \).
Using Normal CD on calculator:
- Lower: 7.94
- Upper: 8.09
- \( \sigma \): 0.05
- \( \mu \): 8
Step 3: Part (c) – Expected Profit
We need to find the probability for each outcome category and then calculate the profit (Selling Price – Cost) for each.
Base Cost: 20p per rod.
Category 1: Scrap (\( L < 7.94 \))
Probability \( p_1 = P(L < 7.94) \). Using calculator: \( p_1 = 0.11507... \)
Profit = Selling Price – Cost = \( 5p – 20p = -15p \) (Loss)
Category 2: Good Rods (\( 7.94 \le L \le 8.09 \))
Probability \( p_2 = 0.8490… \) (from part b)
Profit = \( 50p – 20p = 30p \)
Category 3: Shortened (\( L > 8.09 \))
Probability \( p_3 = P(L > 8.09) \). Using calculator: \( p_3 = 0.03593… \)
Profit = Selling Price – Base Cost – Extra Cost = \( 50p – 20p – 10p = 20p \)
Expected Profit per Rod:
\[ E(\text{Profit}) = (p_1 \times -15) + (p_2 \times 30) + (p_3 \times 20) \] \[ E = (0.11507 \times -15) + (0.8490 \times 30) + (0.03593 \times 20) \] \[ E = -1.726 + 25.47 + 0.7186 = 24.46 \text{ pence} \]Total Profit for 500 rods:
\[ 500 \times 24.46 \text{p} = 12231 \text{ pence} \] \[ = £122.31 \]To the nearest pound: £122
Step 4: Part (d) – Binomial Hypothesis Check
We are dealing with a sample of hinges where \( X \) is the number of faulty hinges.
\( X \sim B(200, 0.015) \)
A batch is accepted if \( X < 6 \) (meaning \( X \le 5 \)).
Calculate probability of acceptance:
\[ P(\text{Accepted}) = P(X \le 5) \]Using calculator Binomial CD (\( x=5, n=200, p=0.015 \)):
\[ P(X \le 5) = 0.9176… \]Comparison:
The manufacturer wants 95% (0.95) acceptance.
\[ 0.9176 < 0.95 \]The probability of acceptance is 0.918, which is less than the target of 0.95.
Therefore, the manufacturer is unlikely to achieve its aim.
Question 3 (7 marks)
Dian uses the large data set to investigate the Daily Total Rainfall, \( r \) mm, for Camborne.
(a) Write down how a value of \( 0 < r \le 0.05 \) is recorded in the large data set.
(1)
Dian uses the data for the 31 days of August 2015 for Camborne and calculates the following statistics:
\[ n = 31 \quad \sum r = 174.9 \quad \sum r^2 = 3523.283 \](b) Use these statistics to calculate
(i) the mean of the Daily Total Rainfall in Camborne for August 2015,
(ii) the standard deviation of the Daily Total Rainfall in Camborne for August 2015.
(3)
Dian believes that the mean Daily Total Rainfall in August is less in the South of the UK than in the North of the UK.
The mean Daily Total Rainfall in Leuchars for August 2015 is 1.72 mm to 2 decimal places.
(c) State, giving a reason, whether this provides evidence to support Dian’s belief.
(2)
Dian uses the large data set to estimate the proportion of days with no rain in Camborne for 1987 to be 0.27 to 2 decimal places.
(d) Explain why the distribution \( B(14, 0.27) \) might not be a reasonable model for the number of days without rain for a 14-day summer event.
(1)
Worked Solution
Step 1: Part (a) – Large Data Set Knowledge
In the Edexcel Large Data Set, very small amounts of rainfall (less than 0.05mm) are recorded as a “Trace”.
It is recorded as tr (or trace).
Step 2: Part (b) – Mean and Standard Deviation
(i) Mean (\( \bar{r} \)):
\[ \bar{r} = \frac{\sum r}{n} = \frac{174.9}{31} \] \[ \bar{r} = 5.6419… \approx 5.64 \text{ mm} \](ii) Standard Deviation (\( \sigma \)):
Formula: \( \sigma = \sqrt{\frac{\sum r^2}{n} – (\bar{r})^2} \)
\[ \sigma = \sqrt{\frac{3523.283}{31} – (5.6419…)^2} \] \[ \sigma = \sqrt{113.654… – 31.831…} \] \[ \sigma = \sqrt{81.82…} \] \[ \sigma = 9.045… \approx 9.05 \text{ mm} \]Step 3: Part (c) – Comparison
Dian believes South mean < North mean.
We need to know the locations: Camborne is in the South (Cornwall), Leuchars is in the North (Scotland).
Camborne Mean (South) = 5.64 mm
Leuchars Mean (North) = 1.72 mm
Camborne is in the South and Leuchars is in the North.
The mean for Camborne (5.64) is greater than the mean for Leuchars (1.72).
This contradicts Dian’s belief, so there is no evidence to support it.
Step 4: Part (d) – Binomial Assumptions
The Binomial distribution requires independent trials.
Weather patterns usually persist for several days.
The days are consecutive, so the weather on one day is not independent of the weather on the next (e.g., a dry spell).
Therefore, the independence assumption for the Binomial model is not valid.
Question 4 (6 marks)
A dentist knows from past records that 10% of customers arrive late for their appointment.
A new manager believes that there has been a change in the proportion of customers who arrive late for their appointment.
A random sample of 50 of the dentist’s customers is taken.
(a) Write down
- a null hypothesis corresponding to no change in the proportion of customers who arrive late
- an alternative hypothesis corresponding to the manager’s belief
(1)
(b) Using a 5% level of significance, find the critical region for a two-tailed test of the null hypothesis in (a).
You should state the probability of rejection in each tail, which should be less than 0.025.
(3)
(c) Find the actual level of significance of the test based on your critical region from part (b).
(1)
The manager observes that 15 of the 50 customers arrived late for their appointment.
(d) With reference to part (b), comment on the manager’s belief.
(1)
Worked Solution
Step 1: Part (a) – Hypotheses
\( H_0 \) is the default position (no change). \( H_1 \) reflects the change.
\( H_0: p = 0.1 \)
\( H_1: p \neq 0.1 \)
Step 2: Part (b) – Critical Region
We assume \( X \sim B(50, 0.1) \).
We need a two-tailed test at 5% significance, so we look for tails with probability \( < 0.025 \) (2.5%) at each end.
Lower Tail: Look for \( P(X \le L) < 0.025 \)
Using cumulative binomial tables or calculator:
\[ P(X=0) = 0.00515… \] \[ P(X \le 1) = 0.0337… \]Since \( 0.0337 > 0.025 \), the critical region is only \( X = 0 \).
Upper Tail: Look for \( P(X \ge U) < 0.025 \)
This means \( 1 – P(X \le U-1) < 0.025 \) or \( P(X \le U-1) > 0.975 \).
\[ P(X \le 8) = 0.9421… \] \[ P(X \le 9) = 0.9754… \]So \( U-1 = 9 \Rightarrow U = 10 \).
Check: \( P(X \ge 10) = 1 – 0.9754 = 0.0245… \), which is \( < 0.025 \).
Critical Region: \( X = 0 \) or \( X \ge 10 \)
Step 3: Part (c) – Actual Significance Level
The actual significance level is the sum of the actual probabilities of the critical regions found in part (b).
Actual Significance Level = 0.0297 (or 2.97%)
Step 4: Part (d) – Conclusion
Observed value is 15. We check if 15 lies in the critical region.
The critical region is \( X = 0 \) or \( X \ge 10 \).
15 is in the critical region (\( 15 \ge 10 \)).
Therefore, we reject \( H_0 \).
There is evidence to support the manager’s belief that there has been a change in the proportion.
Question 5 (10 marks)
A company has 1825 employees. The employees are classified as professional, skilled or elementary.
The following table shows the number of employees in each classification and the two areas, A or B, where they live.
| A | B | |
|---|---|---|
| Professional | 740 | 380 |
| Skilled | 275 | 90 |
| Elementary | 260 | 80 |
An employee is chosen at random. Find the probability that this employee
(a) is skilled,
(1)
(b) lives in area B and is not a professional.
(1)
Some classifications of employees are more likely to work from home.
- 65% of professional employees in both area A and area B work from home
- 40% of skilled employees in both area A and area B work from home
- 5% of elementary employees in both area A and area B work from home
Event \( F \) is that the employee is a professional.
Event \( H \) is that the employee works from home.
Event \( R \) is that the employee is from area A.
(c) Using this information, complete the Venn diagram below.
(4)
(d) Find \( P(R’ \cap F) \)
(1)
(e) Find \( P([H \cup R]’) \)
(1)
(f) Find \( P(F | H) \)
(2)
Worked Solution
Step 1: Parts (a) and (b) – Table Probabilities
Total employees = 1825.
(a) P(Skilled):
Total Skilled = \( 275 (A) + 90 (B) = 365 \).
\[ P(\text{Skilled}) = \frac{365}{1825} = \frac{1}{5} = 0.2 \](b) P(Area B and Not Professional):
Area B Not Prof = Skilled in B + Elementary in B = \( 90 + 80 = 170 \).
\[ P = \frac{170}{1825} = \frac{34}{365} \approx 0.0932 \]Step 2: Part (c) – Completing the Venn Diagram
We need to calculate the missing values for the intersection of sets H (Home), R (Area A), and F (Professional).
Given logic:
- F is Professional.
- R is Area A (so R’ is Area B).
- H is Works from Home.
Center Region (\( H \cap R \cap F \)): Professional, Area A, Works from Home.
Total Prof in A = 740. Rate = 65%.
\[ 0.65 \times 740 = 481 \]Region \( R \cap F \) outside H: Professional, Area A, Not Home.
\[ 740 – 481 = 259 \]Region \( H \) only (outside R and F): Not Prof (Skilled/Elem), Area B, Works from Home.
Skilled in B (90) home rate 40%: \( 0.4 \times 90 = 36 \)
Elem in B (80) home rate 5%: \( 0.05 \times 80 = 4 \)
\[ \text{Total} = 36 + 4 = 40 \]Region Outside All (\( [H \cup R \cup F]’ \)): Not Prof, Area B, Not Home.
Skilled in B not home: \( 90 – 36 = 54 \)
Elem in B not home: \( 80 – 4 = 76 \)
\[ \text{Total} = 54 + 76 = 130 \]Step 3: Part (d) – \( P(R’ \cap F) \)
\( R’ \) is Area B. \( F \) is Professional. So we want Professionals in Area B.
Looking at the diagram, this corresponds to the regions in F but outside R.
These are \( H \cap F \cap R’ \) (247) and \( F \text{ only} \) (133).
\[ 247 + 133 = 380 \]Or directly from table: Professionals in B = 380.
\[ P = \frac{380}{1825} \approx 0.208 \]Step 4: Part (e) – \( P([H \cup R]’) \)
This is everything outside circles H and R.
From the diagram, these are the regions: F only (133) and Outside everything (130).
\[ 133 + 130 = 263 \] \[ P = \frac{263}{1825} \approx 0.144 \]Step 5: Part (f) – \( P(F | H) \)
Conditional probability: \( P(F | H) = \frac{n(F \cap H)}{n(H)} \).
We are restricting our total to just circle H.
Total in H: \( 40 + 123 + 481 + 247 = 891 \)
In intersection \( F \cap H \): \( 481 + 247 = 728 \)
\[ P(F | H) = \frac{728}{891} \approx 0.817 \]Question 6 (9 marks)
Anna is investigating the relationship between exercise and resting heart rate.
She takes a random sample of 19 people in her year at school and records for each person:
- their resting heart rate, \( h \) beats per minute
- the number of minutes, \( m \), spent exercising each week
Her results are shown on the scatter diagram.
(a) Interpret the nature of the relationship between \( h \) and \( m \).
(1)
Anna codes the data using the formulae:
\[ x = \log_{10} m \] \[ y = \log_{10} h \]The product moment correlation coefficient between \( x \) and \( y \) is –0.897.
(b) Test whether or not there is significant evidence of a negative correlation between \( x \) and \( y \).
You should
- state your hypotheses clearly
- use a 5% level of significance
- state the critical value used
(3)
The equation of the line of best fit of \( y \) on \( x \) is
\[ y = -0.05x + 1.92 \](c) Use the equation of the line of best fit of \( y \) on \( x \) to find a model for \( h \) on \( m \) in the form
\[ h = am^k \]where \( a \) and \( k \) are constants to be found.
(5)
Worked Solution
Step 1: Part (a) – Interpretation
We look at the trend in the scatter diagram.
As the minutes of exercise (\( m \)) increase, the resting heart rate (\( h \)) decreases.
It suggests a negative correlation, but the curve flattens out, suggesting the effect diminishes as exercise increases.
Step 2: Part (b) – Hypothesis Test
We are testing for negative correlation using the PMCC (\( \rho \)).
Sample size \( n = 19 \). Significance level 5%.
Hypotheses:
\[ H_0: \rho = 0 \] \[ H_1: \rho < 0 \]Critical Value:
Using PMCC tables for \( n=19 \) at 5% (one-tailed):
Critical Value = -0.3887
Conclusion:
Test statistic \( r = -0.897 \).
Since \( -0.897 < -0.3887 \), the result is in the critical region.
Reject \( H_0 \). There is significant evidence of a negative correlation.
Step 3: Part (c) – Exponential Model
We are given a linear equation in logs: \( y = -0.05x + 1.92 \).
We need to convert this back to the form \( h = am^k \).
Recall: \( y = \log_{10} h \) and \( x = \log_{10} m \).
Substitute the log definitions into the equation:
\[ \log_{10} h = -0.05 \log_{10} m + 1.92 \]Use log laws to rewrite \( -0.05 \log_{10} m \):
\[ \log_{10} h = \log_{10} (m^{-0.05}) + 1.92 \]To combine the 1.92, write it as a log:
\[ 1.92 = \log_{10} (10^{1.92}) \]So:
\[ \log_{10} h = \log_{10} (m^{-0.05}) + \log_{10} (10^{1.92}) \] \[ \log_{10} h = \log_{10} (10^{1.92} \times m^{-0.05}) \]Remove logs:
\[ h = 10^{1.92} \times m^{-0.05} \]Calculate \( a = 10^{1.92} \):
\[ a \approx 83.176… \]Comparing to \( h = am^k \):
\[ a \approx 83.2 \quad \text{and} \quad k = -0.05 \](Accept \( a \) in range 83.17 – 83.2)