If any of my solutions look wrong, please refer to the mark scheme. You can exit full-screen mode for the question paper and mark scheme by clicking the icon in the bottom-right corner or by pressing Esc on your keyboard.

GCSE 2022 Edexcel Statistics Paper 31

Exam Guide

  • Paper: 9MA0/31 (Statistics)
  • Calculator: Allowed
  • Total Marks: 50
  • Format: Worked solutions with pedagogical explanations

Question 1 (6 marks)

George throws a ball at a target 15 times.

Each time George throws the ball, the probability of the ball hitting the target is 0.48.

The random variable \( X \) represents the number of times George hits the target in 15 throws.

(a) Find

(i) \( P(X = 3) \)

(ii) \( P(X \ge 5) \)

(3)


George now throws the ball at the target 250 times.

(b) Use a normal approximation to calculate the probability that he will hit the target more than 110 times.

(3)

Worked Solution

Step 1: Part (a) – Binomial Distribution

What are we asking?

We need to calculate probabilities for a Binomial distribution where \( n=15 \) trials and probability of success \( p=0.48 \). So, \( X \sim B(15, 0.48) \).

(i) Find \( P(X = 3) \):

Using the binomial formula \( P(X=r) = \binom{n}{r} p^r (1-p)^{n-r} \):

\[ P(X=3) = \binom{15}{3} (0.48)^3 (1-0.48)^{12} \]

Calculator method: Binomial PDF with \( x=3, n=15, p=0.48 \).

\[ P(X=3) = 0.019668… \] \[ P(X=3) \approx 0.0197 \text{ (3 s.f.)} \]

(ii) Find \( P(X \ge 5) \):

Cumulative probabilities on calculators usually find \( P(X \le x) \).

\[ P(X \ge 5) = 1 – P(X \le 4) \]

Using calculator Binomial CDF with \( x=4, n=15, p=0.48 \):

\[ P(X \le 4) = 0.07986… \] \[ P(X \ge 5) = 1 – 0.07986… = 0.92013… \] \[ P(X \ge 5) \approx 0.920 \text{ (3 s.f.)} \]

Why do we do \( 1 – P(X \le 4) \)?

Because the total probability is 1. To find the probability of getting 5 or more, we remove the probability of getting 4 or fewer.

Step 2: Part (b) – Normal Approximation

Method:

When \( n \) is large and \( p \) is close to 0.5, we can approximate the Binomial distribution with a Normal distribution \( Y \sim N(\mu, \sigma^2) \).

We need to find the mean \( \mu \) and variance \( \sigma^2 \).

Given \( n = 250 \) and \( p = 0.48 \):

\[ \mu = np = 250 \times 0.48 = 120 \] \[ \sigma^2 = np(1-p) = 250 \times 0.48 \times 0.52 = 62.4 \] \[ \sigma = \sqrt{62.4} \approx 7.899 \]

So, \( Y \sim N(120, 62.4) \).

We want \( P(X > 110) \).

Continuity Correction: Since the Binomial is discrete and Normal is continuous, “more than 110” (meaning 111, 112…) starts at the boundary 110.5.

\[ P(X > 110) \approx P(Y > 110.5) \]

Standardise to find \( Z \):

\[ Z = \frac{110.5 – 120}{\sqrt{62.4}} = \frac{-9.5}{7.899…} \approx -1.2026 \]

Using calculator Normal CDF (Lower: 110.5, Upper: 9999, \( \mu: 120 \), \( \sigma: \sqrt{62.4} \)):

\[ P(Y > 110.5) = 0.88544… \]

Final Answer:

(a)(i) 0.0197

(a)(ii) 0.920

(b) 0.885

↑ Back to Top

Question 2 (12 marks)

A manufacturer uses a machine to make metal rods.

The length of a metal rod, \( L \) cm, is normally distributed with:

  • a mean of 8 cm
  • a standard deviation of \( x \) cm

Given that the proportion of metal rods less than 7.902 cm in length is 2.5%

(a) show that \( x = 0.05 \) to 2 decimal places.

(2)


(b) Calculate the proportion of metal rods that are between 7.94 cm and 8.09 cm in length.

(1)


The cost of producing a single metal rod is 20p.

A metal rod

  • where \( L < 7.94 \) is sold for scrap for 5p
  • where \( 7.94 \le L \le 8.09 \) is sold for 50p
  • where \( L > 8.09 \) is shortened for an extra cost of 10p and then sold for 50p

(c) Calculate the expected profit per 500 of the metal rods.

Give your answer to the nearest pound.

(5)


The same manufacturer makes metal hinges in large batches.

The hinges each have a probability of 0.015 of having a fault.

A random sample of 200 hinges is taken from each batch and the batch is accepted if fewer than 6 hinges are faulty.

The manufacturer’s aim is for 95% of batches to be accepted.

(d) Explain whether the manufacturer is likely to achieve its aim.

(4)

Worked Solution

Step 1: Part (a) – Finding Standard Deviation

We are given a Normal distribution \( L \sim N(8, x^2) \) and a probability \( P(L < 7.902) = 0.025 \).

We need to use the inverse normal function to find the Z-value corresponding to the bottom 2.5%.

Using Inverse Normal on calculator (Area = 0.025, \( \mu = 0, \sigma = 1 \)):

\[ z = -1.96 \]

Using the standardisation formula \( Z = \frac{X – \mu}{\sigma} \):

\[ -1.96 = \frac{7.902 – 8}{x} \] \[ -1.96 = \frac{-0.098}{x} \] \[ x = \frac{-0.098}{-1.96} = 0.05 \]

Therefore, \( x = 0.05 \) cm.

Step 2: Part (b) – Finding Probability

We want \( P(7.94 \le L \le 8.09) \) with \( L \sim N(8, 0.05^2) \).

Using Normal CD on calculator:

  • Lower: 7.94
  • Upper: 8.09
  • \( \sigma \): 0.05
  • \( \mu \): 8
\[ P = 0.8490… \] \[ P \approx 0.849 \text{ (3 s.f.)} \]
Step 3: Part (c) – Expected Profit

We need to find the probability for each outcome category and then calculate the profit (Selling Price – Cost) for each.

Base Cost: 20p per rod.

Category 1: Scrap (\( L < 7.94 \))

Probability \( p_1 = P(L < 7.94) \). Using calculator: \( p_1 = 0.11507... \)

Profit = Selling Price – Cost = \( 5p – 20p = -15p \) (Loss)

Category 2: Good Rods (\( 7.94 \le L \le 8.09 \))

Probability \( p_2 = 0.8490… \) (from part b)

Profit = \( 50p – 20p = 30p \)

Category 3: Shortened (\( L > 8.09 \))

Probability \( p_3 = P(L > 8.09) \). Using calculator: \( p_3 = 0.03593… \)

Profit = Selling Price – Base Cost – Extra Cost = \( 50p – 20p – 10p = 20p \)

Expected Profit per Rod:

\[ E(\text{Profit}) = (p_1 \times -15) + (p_2 \times 30) + (p_3 \times 20) \] \[ E = (0.11507 \times -15) + (0.8490 \times 30) + (0.03593 \times 20) \] \[ E = -1.726 + 25.47 + 0.7186 = 24.46 \text{ pence} \]

Total Profit for 500 rods:

\[ 500 \times 24.46 \text{p} = 12231 \text{ pence} \] \[ = £122.31 \]

To the nearest pound: £122

Step 4: Part (d) – Binomial Hypothesis Check

We are dealing with a sample of hinges where \( X \) is the number of faulty hinges.

\( X \sim B(200, 0.015) \)

A batch is accepted if \( X < 6 \) (meaning \( X \le 5 \)).

Calculate probability of acceptance:

\[ P(\text{Accepted}) = P(X \le 5) \]

Using calculator Binomial CD (\( x=5, n=200, p=0.015 \)):

\[ P(X \le 5) = 0.9176… \]

Comparison:

The manufacturer wants 95% (0.95) acceptance.

\[ 0.9176 < 0.95 \]

The probability of acceptance is 0.918, which is less than the target of 0.95.

Therefore, the manufacturer is unlikely to achieve its aim.

↑ Back to Top

Question 3 (7 marks)

Dian uses the large data set to investigate the Daily Total Rainfall, \( r \) mm, for Camborne.

(a) Write down how a value of \( 0 < r \le 0.05 \) is recorded in the large data set.

(1)


Dian uses the data for the 31 days of August 2015 for Camborne and calculates the following statistics:

\[ n = 31 \quad \sum r = 174.9 \quad \sum r^2 = 3523.283 \]

(b) Use these statistics to calculate

(i) the mean of the Daily Total Rainfall in Camborne for August 2015,

(ii) the standard deviation of the Daily Total Rainfall in Camborne for August 2015.

(3)


Dian believes that the mean Daily Total Rainfall in August is less in the South of the UK than in the North of the UK.

The mean Daily Total Rainfall in Leuchars for August 2015 is 1.72 mm to 2 decimal places.

(c) State, giving a reason, whether this provides evidence to support Dian’s belief.

(2)


Dian uses the large data set to estimate the proportion of days with no rain in Camborne for 1987 to be 0.27 to 2 decimal places.

(d) Explain why the distribution \( B(14, 0.27) \) might not be a reasonable model for the number of days without rain for a 14-day summer event.

(1)

Worked Solution

Step 1: Part (a) – Large Data Set Knowledge

In the Edexcel Large Data Set, very small amounts of rainfall (less than 0.05mm) are recorded as a “Trace”.

It is recorded as tr (or trace).

Step 2: Part (b) – Mean and Standard Deviation

(i) Mean (\( \bar{r} \)):

\[ \bar{r} = \frac{\sum r}{n} = \frac{174.9}{31} \] \[ \bar{r} = 5.6419… \approx 5.64 \text{ mm} \]

(ii) Standard Deviation (\( \sigma \)):

Formula: \( \sigma = \sqrt{\frac{\sum r^2}{n} – (\bar{r})^2} \)

\[ \sigma = \sqrt{\frac{3523.283}{31} – (5.6419…)^2} \] \[ \sigma = \sqrt{113.654… – 31.831…} \] \[ \sigma = \sqrt{81.82…} \] \[ \sigma = 9.045… \approx 9.05 \text{ mm} \]
Step 3: Part (c) – Comparison

Dian believes South mean < North mean.

We need to know the locations: Camborne is in the South (Cornwall), Leuchars is in the North (Scotland).

Camborne Mean (South) = 5.64 mm

Leuchars Mean (North) = 1.72 mm

Camborne is in the South and Leuchars is in the North.

The mean for Camborne (5.64) is greater than the mean for Leuchars (1.72).

This contradicts Dian’s belief, so there is no evidence to support it.

Step 4: Part (d) – Binomial Assumptions

The Binomial distribution requires independent trials.

Weather patterns usually persist for several days.

The days are consecutive, so the weather on one day is not independent of the weather on the next (e.g., a dry spell).

Therefore, the independence assumption for the Binomial model is not valid.

↑ Back to Top

Question 4 (6 marks)

A dentist knows from past records that 10% of customers arrive late for their appointment.

A new manager believes that there has been a change in the proportion of customers who arrive late for their appointment.

A random sample of 50 of the dentist’s customers is taken.

(a) Write down

  • a null hypothesis corresponding to no change in the proportion of customers who arrive late
  • an alternative hypothesis corresponding to the manager’s belief

(1)


(b) Using a 5% level of significance, find the critical region for a two-tailed test of the null hypothesis in (a).

You should state the probability of rejection in each tail, which should be less than 0.025.

(3)


(c) Find the actual level of significance of the test based on your critical region from part (b).

(1)


The manager observes that 15 of the 50 customers arrived late for their appointment.

(d) With reference to part (b), comment on the manager’s belief.

(1)

Worked Solution

Step 1: Part (a) – Hypotheses

\( H_0 \) is the default position (no change). \( H_1 \) reflects the change.

\( H_0: p = 0.1 \)

\( H_1: p \neq 0.1 \)

Step 2: Part (b) – Critical Region

We assume \( X \sim B(50, 0.1) \).

We need a two-tailed test at 5% significance, so we look for tails with probability \( < 0.025 \) (2.5%) at each end.

Lower Tail: Look for \( P(X \le L) < 0.025 \)

Using cumulative binomial tables or calculator:

\[ P(X=0) = 0.00515… \] \[ P(X \le 1) = 0.0337… \]

Since \( 0.0337 > 0.025 \), the critical region is only \( X = 0 \).

Upper Tail: Look for \( P(X \ge U) < 0.025 \)

This means \( 1 – P(X \le U-1) < 0.025 \) or \( P(X \le U-1) > 0.975 \).

\[ P(X \le 8) = 0.9421… \] \[ P(X \le 9) = 0.9754… \]

So \( U-1 = 9 \Rightarrow U = 10 \).

Check: \( P(X \ge 10) = 1 – 0.9754 = 0.0245… \), which is \( < 0.025 \).

Critical Region: \( X = 0 \) or \( X \ge 10 \)

Step 3: Part (c) – Actual Significance Level

The actual significance level is the sum of the actual probabilities of the critical regions found in part (b).

\[ \text{Lower probability} = 0.00515 \] \[ \text{Upper probability} = 0.02453 \] \[ \text{Total} = 0.00515 + 0.02453 = 0.02968 \]

Actual Significance Level = 0.0297 (or 2.97%)

Step 4: Part (d) – Conclusion

Observed value is 15. We check if 15 lies in the critical region.

The critical region is \( X = 0 \) or \( X \ge 10 \).

15 is in the critical region (\( 15 \ge 10 \)).

Therefore, we reject \( H_0 \).

There is evidence to support the manager’s belief that there has been a change in the proportion.

↑ Back to Top

Question 5 (10 marks)

A company has 1825 employees. The employees are classified as professional, skilled or elementary.

The following table shows the number of employees in each classification and the two areas, A or B, where they live.

A B
Professional 740 380
Skilled 275 90
Elementary 260 80

An employee is chosen at random. Find the probability that this employee

(a) is skilled,

(1)

(b) lives in area B and is not a professional.

(1)


Some classifications of employees are more likely to work from home.

  • 65% of professional employees in both area A and area B work from home
  • 40% of skilled employees in both area A and area B work from home
  • 5% of elementary employees in both area A and area B work from home

Event \( F \) is that the employee is a professional.

Event \( H \) is that the employee works from home.

Event \( R \) is that the employee is from area A.

(c) Using this information, complete the Venn diagram below.

H R F 123 247 412 133

(4)


(d) Find \( P(R’ \cap F) \)

(1)

(e) Find \( P([H \cup R]’) \)

(1)

(f) Find \( P(F | H) \)

(2)

Worked Solution

Step 1: Parts (a) and (b) – Table Probabilities

Total employees = 1825.

(a) P(Skilled):

Total Skilled = \( 275 (A) + 90 (B) = 365 \).

\[ P(\text{Skilled}) = \frac{365}{1825} = \frac{1}{5} = 0.2 \]

(b) P(Area B and Not Professional):

Area B Not Prof = Skilled in B + Elementary in B = \( 90 + 80 = 170 \).

\[ P = \frac{170}{1825} = \frac{34}{365} \approx 0.0932 \]
Step 2: Part (c) – Completing the Venn Diagram

We need to calculate the missing values for the intersection of sets H (Home), R (Area A), and F (Professional).

Given logic:

  • F is Professional.
  • R is Area A (so R’ is Area B).
  • H is Works from Home.

Center Region (\( H \cap R \cap F \)): Professional, Area A, Works from Home.

Total Prof in A = 740. Rate = 65%.

\[ 0.65 \times 740 = 481 \]

Region \( R \cap F \) outside H: Professional, Area A, Not Home.

\[ 740 – 481 = 259 \]

Region \( H \) only (outside R and F): Not Prof (Skilled/Elem), Area B, Works from Home.

Skilled in B (90) home rate 40%: \( 0.4 \times 90 = 36 \)

Elem in B (80) home rate 5%: \( 0.05 \times 80 = 4 \)

\[ \text{Total} = 36 + 4 = 40 \]

Region Outside All (\( [H \cup R \cup F]’ \)): Not Prof, Area B, Not Home.

Skilled in B not home: \( 90 – 36 = 54 \)

Elem in B not home: \( 80 – 4 = 76 \)

\[ \text{Total} = 54 + 76 = 130 \]
H R F 40 123 412 247 481 259 133 130
Step 3: Part (d) – \( P(R’ \cap F) \)

\( R’ \) is Area B. \( F \) is Professional. So we want Professionals in Area B.

Looking at the diagram, this corresponds to the regions in F but outside R.

These are \( H \cap F \cap R’ \) (247) and \( F \text{ only} \) (133).

\[ 247 + 133 = 380 \]

Or directly from table: Professionals in B = 380.

\[ P = \frac{380}{1825} \approx 0.208 \]
Step 4: Part (e) – \( P([H \cup R]’) \)

This is everything outside circles H and R.

From the diagram, these are the regions: F only (133) and Outside everything (130).

\[ 133 + 130 = 263 \] \[ P = \frac{263}{1825} \approx 0.144 \]
Step 5: Part (f) – \( P(F | H) \)

Conditional probability: \( P(F | H) = \frac{n(F \cap H)}{n(H)} \).

We are restricting our total to just circle H.

Total in H: \( 40 + 123 + 481 + 247 = 891 \)

In intersection \( F \cap H \): \( 481 + 247 = 728 \)

\[ P(F | H) = \frac{728}{891} \approx 0.817 \]
↑ Back to Top

Question 6 (9 marks)

Anna is investigating the relationship between exercise and resting heart rate.

She takes a random sample of 19 people in her year at school and records for each person:

  • their resting heart rate, \( h \) beats per minute
  • the number of minutes, \( m \), spent exercising each week

Her results are shown on the scatter diagram.

m h 0 200 400 60 70 80

(a) Interpret the nature of the relationship between \( h \) and \( m \).

(1)


Anna codes the data using the formulae:

\[ x = \log_{10} m \] \[ y = \log_{10} h \]

The product moment correlation coefficient between \( x \) and \( y \) is –0.897.

(b) Test whether or not there is significant evidence of a negative correlation between \( x \) and \( y \).

You should

  • state your hypotheses clearly
  • use a 5% level of significance
  • state the critical value used

(3)


The equation of the line of best fit of \( y \) on \( x \) is

\[ y = -0.05x + 1.92 \]

(c) Use the equation of the line of best fit of \( y \) on \( x \) to find a model for \( h \) on \( m \) in the form

\[ h = am^k \]

where \( a \) and \( k \) are constants to be found.

(5)

Worked Solution

Step 1: Part (a) – Interpretation

We look at the trend in the scatter diagram.

As the minutes of exercise (\( m \)) increase, the resting heart rate (\( h \)) decreases.

It suggests a negative correlation, but the curve flattens out, suggesting the effect diminishes as exercise increases.

Step 2: Part (b) – Hypothesis Test

We are testing for negative correlation using the PMCC (\( \rho \)).

Sample size \( n = 19 \). Significance level 5%.

Hypotheses:

\[ H_0: \rho = 0 \] \[ H_1: \rho < 0 \]

Critical Value:

Using PMCC tables for \( n=19 \) at 5% (one-tailed):

Critical Value = -0.3887

Conclusion:

Test statistic \( r = -0.897 \).

Since \( -0.897 < -0.3887 \), the result is in the critical region.

Reject \( H_0 \). There is significant evidence of a negative correlation.

Step 3: Part (c) – Exponential Model

We are given a linear equation in logs: \( y = -0.05x + 1.92 \).

We need to convert this back to the form \( h = am^k \).

Recall: \( y = \log_{10} h \) and \( x = \log_{10} m \).

Substitute the log definitions into the equation:

\[ \log_{10} h = -0.05 \log_{10} m + 1.92 \]

Use log laws to rewrite \( -0.05 \log_{10} m \):

\[ \log_{10} h = \log_{10} (m^{-0.05}) + 1.92 \]

To combine the 1.92, write it as a log:

\[ 1.92 = \log_{10} (10^{1.92}) \]

So:

\[ \log_{10} h = \log_{10} (m^{-0.05}) + \log_{10} (10^{1.92}) \] \[ \log_{10} h = \log_{10} (10^{1.92} \times m^{-0.05}) \]

Remove logs:

\[ h = 10^{1.92} \times m^{-0.05} \]

Calculate \( a = 10^{1.92} \):

\[ a \approx 83.176… \]

Comparing to \( h = am^k \):

\[ a \approx 83.2 \quad \text{and} \quad k = -0.05 \]
\[ h = 83.2 m^{-0.05} \]

(Accept \( a \) in range 83.17 – 83.2)

↑ Back to Top