A Level - Edexcel - 2022 - Statisitics

Question 1 (6 marks)

George throws a ball at a target 15 times.

Each time George throws the ball, the probability of the ball hitting the target is 0.48.

The random variable \( X \) represents the number of times George hits the target in 15 throws.

(a) Find

(i) \( P(X = 3) \)

(ii) \( P(X \ge 5) \)

(3)

George now throws the ball at the target 250 times.

(b) Use a normal approximation to calculate the probability that he will hit the target more than 110 times.

(3)

Worked Solution

Step 1: Part (a) – Binomial Distribution

What are we asking?

We need to calculate probabilities for a Binomial distribution where \( n=15 \) trials and probability of success \( p=0.48 \). So, \( X \sim B(15, 0.48) \).

(i) Find \( P(X = 3) \):

Using the binomial formula \( P(X=r) = \binom{n}{r} p^r (1-p)^{n-r} \):

\[ P(X=3) = \binom{15}{3} (0.48)^3 (1-0.48)^{12} \]

Calculator method: Binomial PDF with \( x=3, n=15, p=0.48 \).

\[ P(X=3) = 0.019668… \] \[ P(X=3) \approx 0.0197 \text{ (3 s.f.)} \]

(ii) Find \( P(X \ge 5) \):

Cumulative probabilities on calculators usually find \( P(X \le x) \).

\[ P(X \ge 5) = 1 – P(X \le 4) \]

Using calculator Binomial CDF with \( x=4, n=15, p=0.48 \):

\[ P(X \le 4) = 0.07986… \] \[ P(X \ge 5) = 1 – 0.07986… = 0.92013… \] \[ P(X \ge 5) \approx 0.920 \text{ (3 s.f.)} \]

Why do we do \( 1 – P(X \le 4) \)?

Because the total probability is 1. To find the probability of getting 5 or more, we remove the probability of getting 4 or fewer.

Step 2: Part (b) – Normal Approximation

Method:

When \( n \) is large and \( p \) is close to 0.5, we can approximate the Binomial distribution with a Normal distribution \( Y \sim N(\mu, \sigma^2) \).

We need to find the mean \( \mu \) and variance \( \sigma^2 \).

Given \( n = 250 \) and \( p = 0.48 \):

\[ \mu = np = 250 \times 0.48 = 120 \] \[ \sigma^2 = np(1-p) = 250 \times 0.48 \times 0.52 = 62.4 \] \[ \sigma = \sqrt{62.4} \approx 7.899 \]

So, \( Y \sim N(120, 62.4) \).

We want \( P(X > 110) \).

Continuity Correction: Since the Binomial is discrete and Normal is continuous, “more than 110” (meaning 111, 112…) starts at the boundary 110.5.

\[ P(X > 110) \approx P(Y > 110.5) \]

Standardise to find \( Z \):

\[ Z = \frac{110.5 – 120}{\sqrt{62.4}} = \frac{-9.5}{7.899…} \approx -1.2026 \]

Using calculator Normal CDF (Lower: 110.5, Upper: 9999, \( \mu: 120 \), \( \sigma: \sqrt{62.4} \)):

\[ P(Y > 110.5) = 0.88544… \]

Final Answer:

(a)(i) 0.0197

(a)(ii) 0.920

(b) 0.885

↑ Back to Top

Question 2 (12 marks)

A manufacturer uses a machine to make metal rods.

The length of a metal rod, \( L \) cm, is normally distributed with:

a mean of 8 cm
a standard deviation of \( x \) cm

Given that the proportion of metal rods less than 7.902 cm in length is 2.5%

(a) show that \( x = 0.05 \) to 2 decimal places.

(2)

(b) Calculate the proportion of metal rods that are between 7.94 cm and 8.09 cm in length.

(1)

The cost of producing a single metal rod is 20p.

A metal rod

where \( L < 7.94 \) is sold for scrap for 5p
where \( 7.94 \le L \le 8.09 \) is sold for 50p
where \( L > 8.09 \) is shortened for an extra cost of 10p and then sold for 50p

(c) Calculate the expected profit per 500 of the metal rods.

Give your answer to the nearest pound.

(5)

The same manufacturer makes metal hinges in large batches.

The hinges each have a probability of 0.015 of having a fault.

A random sample of 200 hinges is taken from each batch and the batch is accepted if fewer than 6 hinges are faulty.

The manufacturer’s aim is for 95% of batches to be accepted.

(d) Explain whether the manufacturer is likely to achieve its aim.

(4)

Worked Solution

Step 1: Part (a) – Finding Standard Deviation

We are given a Normal distribution \( L \sim N(8, x^2) \) and a probability \( P(L < 7.902) = 0.025 \).

We need to use the inverse normal function to find the Z-value corresponding to the bottom 2.5%.

Using Inverse Normal on calculator (Area = 0.025, \( \mu = 0, \sigma = 1 \)):

\[ z = -1.96 \]

Using the standardisation formula \( Z = \frac{X – \mu}{\sigma} \):

\[ -1.96 = \frac{7.902 – 8}{x} \] \[ -1.96 = \frac{-0.098}{x} \] \[ x = \frac{-0.098}{-1.96} = 0.05 \]

Therefore, \( x = 0.05 \) cm.

Step 2: Part (b) – Finding Probability

We want \( P(7.94 \le L \le 8.09) \) with \( L \sim N(8, 0.05^2) \).

Using Normal CD on calculator:

Lower: 7.94
Upper: 8.09
\( \sigma \): 0.05
\( \mu \): 8

\[ P = 0.8490… \] \[ P \approx 0.849 \text{ (3 s.f.)} \]

Step 3: Part (c) – Expected Profit

We need to find the probability for each outcome category and then calculate the profit (Selling Price – Cost) for each.

Base Cost: 20p per rod.

Category 1: Scrap (\( L < 7.94 \))

Probability \( p_1 = P(L < 7.94) \). Using calculator: \( p_1 = 0.11507... \)

Profit = Selling Price – Cost = \( 5p – 20p = -15p \) (Loss)

Category 2: Good Rods (\( 7.94 \le L \le 8.09 \))

Probability \( p_2 = 0.8490… \) (from part b)

Profit = \( 50p – 20p = 30p \)

Category 3: Shortened (\( L > 8.09 \))

Probability \( p_3 = P(L > 8.09) \). Using calculator: \( p_3 = 0.03593… \)

Profit = Selling Price – Base Cost – Extra Cost = \( 50p – 20p – 10p = 20p \)

Expected Profit per Rod:

\[ E(\text{Profit}) = (p_1 \times -15) + (p_2 \times 30) + (p_3 \times 20) \] \[ E = (0.11507 \times -15) + (0.8490 \times 30) + (0.03593 \times 20) \] \[ E = -1.726 + 25.47 + 0.7186 = 24.46 \text{ pence} \]

Total Profit for 500 rods:

\[ 500 \times 24.46 \text{p} = 12231 \text{ pence} \] \[ = £122.31 \]

To the nearest pound: £122

Step 4: Part (d) – Binomial Hypothesis Check

We are dealing with a sample of hinges where \( X \) is the number of faulty hinges.

\( X \sim B(200, 0.015) \)

A batch is accepted if \( X < 6 \) (meaning \( X \le 5 \)).

Calculate probability of acceptance:

\[ P(\text{Accepted}) = P(X \le 5) \]

Using calculator Binomial CD (\( x=5, n=200, p=0.015 \)):

\[ P(X \le 5) = 0.9176… \]

Comparison:

The manufacturer wants 95% (0.95) acceptance.

\[ 0.9176 < 0.95 \]

The probability of acceptance is 0.918, which is less than the target of 0.95.

Therefore, the manufacturer is unlikely to achieve its aim.

↑ Back to Top

Question 3 (7 marks)

Dian uses the large data set to investigate the Daily Total Rainfall, \( r \) mm, for Camborne.

(a) Write down how a value of \( 0 < r \le 0.05 \) is recorded in the large data set.

(1)

Dian uses the data for the 31 days of August 2015 for Camborne and calculates the following statistics:

\[ n = 31 \quad \sum r = 174.9 \quad \sum r^2 = 3523.283 \]

(b) Use these statistics to calculate

(i) the mean of the Daily Total Rainfall in Camborne for August 2015,

(ii) the standard deviation of the Daily Total Rainfall in Camborne for August 2015.

(3)

Dian believes that the mean Daily Total Rainfall in August is less in the South of the UK than in the North of the UK.

The mean Daily Total Rainfall in Leuchars for August 2015 is 1.72 mm to 2 decimal places.

(c) State, giving a reason, whether this provides evidence to support Dian’s belief.

(2)

Dian uses the large data set to estimate the proportion of days with no rain in Camborne for 1987 to be 0.27 to 2 decimal places.

(d) Explain why the distribution \( B(14, 0.27) \) might not be a reasonable model for the number of days without rain for a 14-day summer event.

(1)

Worked Solution

Step 1: Part (a) – Large Data Set Knowledge

In the Edexcel Large Data Set, very small amounts of rainfall (less than 0.05mm) are recorded as a “Trace”.

It is recorded as tr (or trace).

Step 2: Part (b) – Mean and Standard Deviation

(i) Mean (\( \bar{r} \)):

\[ \bar{r} = \frac{\sum r}{n} = \frac{174.9}{31} \] \[ \bar{r} = 5.6419… \approx 5.64 \text{ mm} \]

(ii) Standard Deviation (\( \sigma \)):

Formula: \( \sigma = \sqrt{\frac{\sum r^2}{n} – (\bar{r})^2} \)

\[ \sigma = \sqrt{\frac{3523.283}{31} – (5.6419…)^2} \] \[ \sigma = \sqrt{113.654… – 31.831…} \] \[ \sigma = \sqrt{81.82…} \] \[ \sigma = 9.045… \approx 9.05 \text{ mm} \]

Step 3: Part (c) – Comparison

Dian believes South mean < North mean.

We need to know the locations: Camborne is in the South (Cornwall), Leuchars is in the North (Scotland).

Camborne Mean (South) = 5.64 mm

Leuchars Mean (North) = 1.72 mm

Camborne is in the South and Leuchars is in the North.

The mean for Camborne (5.64) is greater than the mean for Leuchars (1.72).

This contradicts Dian’s belief, so there is no evidence to support it.

Step 4: Part (d) – Binomial Assumptions

The Binomial distribution requires independent trials.

Weather patterns usually persist for several days.

The days are consecutive, so the weather on one day is not independent of the weather on the next (e.g., a dry spell).

Therefore, the independence assumption for the Binomial model is not valid.

↑ Back to Top

Question 4 (6 marks)

A dentist knows from past records that 10% of customers arrive late for their appointment.

A new manager believes that there has been a change in the proportion of customers who arrive late for their appointment.

A random sample of 50 of the dentist’s customers is taken.

(a) Write down

a null hypothesis corresponding to no change in the proportion of customers who arrive late
an alternative hypothesis corresponding to the manager’s belief

(1)

(b) Using a 5% level of significance, find the critical region for a two-tailed test of the null hypothesis in (a).

You should state the probability of rejection in each tail, which should be less than 0.025.

(3)

(c) Find the actual level of significance of the test based on your critical region from part (b).

(1)

The manager observes that 15 of the 50 customers arrived late for their appointment.

(d) With reference to part (b), comment on the manager’s belief.

(1)

Worked Solution

Step 1: Part (a) – Hypotheses

\( H_0 \) is the default position (no change). \( H_1 \) reflects the change.

\( H_0: p = 0.1 \)

\( H_1: p \neq 0.1 \)

Step 2: Part (b) – Critical Region

We assume \( X \sim B(50, 0.1) \).

We need a two-tailed test at 5% significance, so we look for tails with probability \( < 0.025 \) (2.5%) at each end.

Lower Tail: Look for \( P(X \le L) < 0.025 \)

Using cumulative binomial tables or calculator:

\[ P(X=0) = 0.00515… \] \[ P(X \le 1) = 0.0337… \]

Since \( 0.0337 > 0.025 \), the critical region is only \( X = 0 \).

Upper Tail: Look for \( P(X \ge U) < 0.025 \)

This means \( 1 – P(X \le U-1) < 0.025 \) or \( P(X \le U-1) > 0.975 \).

\[ P(X \le 8) = 0.9421… \] \[ P(X \le 9) = 0.9754… \]

So \( U-1 = 9 \Rightarrow U = 10 \).

Check: \( P(X \ge 10) = 1 – 0.9754 = 0.0245… \), which is \( < 0.025 \).

Critical Region: \( X = 0 \) or \( X \ge 10 \)

Step 3: Part (c) – Actual Significance Level

The actual significance level is the sum of the actual probabilities of the critical regions found in part (b).

\[ \text{Lower probability} = 0.00515 \] \[ \text{Upper probability} = 0.02453 \] \[ \text{Total} = 0.00515 + 0.02453 = 0.02968 \]

Actual Significance Level = 0.0297 (or 2.97%)

Step 4: Part (d) – Conclusion

Observed value is 15. We check if 15 lies in the critical region.

The critical region is \( X = 0 \) or \( X \ge 10 \).

15 is in the critical region (\( 15 \ge 10 \)).

Therefore, we reject \( H_0 \).

There is evidence to support the manager’s belief that there has been a change in the proportion.

↑ Back to Top

Question 5 (10 marks)

A company has 1825 employees. The employees are classified as professional, skilled or elementary.

The following table shows the number of employees in each classification and the two areas, A or B, where they live.

	A	B
Professional	740	380
Skilled	275	90
Elementary	260	80

An employee is chosen at random. Find the probability that this employee

(a) is skilled,

(1)

(b) lives in area B and is not a professional.

(1)

Some classifications of employees are more likely to work from home.

65% of professional employees in both area A and area B work from home
40% of skilled employees in both area A and area B work from home
5% of elementary employees in both area A and area B work from home

Event \( F \) is that the employee is a professional.

Event \( H \) is that the employee works from home.

Event \( R \) is that the employee is from area A.

(c) Using this information, complete the Venn diagram below.

(4)

(d) Find \( P(R’ \cap F) \)

(1)

(e) Find \( P([H \cup R]’) \)

(1)

(f) Find \( P(F | H) \)

(2)

Worked Solution

Step 1: Parts (a) and (b) – Table Probabilities

Total employees = 1825.

(a) P(Skilled):

Total Skilled = \( 275 (A) + 90 (B) = 365 \).

\[ P(\text{Skilled}) = \frac{365}{1825} = \frac{1}{5} = 0.2 \]

(b) P(Area B and Not Professional):

Area B Not Prof = Skilled in B + Elementary in B = \( 90 + 80 = 170 \).

\[ P = \frac{170}{1825} = \frac{34}{365} \approx 0.0932 \]

Step 2: Part (c) – Completing the Venn Diagram

We need to calculate the missing values for the intersection of sets H (Home), R (Area A), and F (Professional).

Given logic:

F is Professional.
R is Area A (so R’ is Area B).
H is Works from Home.

Center Region (\( H \cap R \cap F \)): Professional, Area A, Works from Home.

Total Prof in A = 740. Rate = 65%.

\[ 0.65 \times 740 = 481 \]

Region \( R \cap F \) outside H: Professional, Area A, Not Home.

\[ 740 – 481 = 259 \]

Region \( H \) only (outside R and F): Not Prof (Skilled/Elem), Area B, Works from Home.

Skilled in B (90) home rate 40%: \( 0.4 \times 90 = 36 \)

Elem in B (80) home rate 5%: \( 0.05 \times 80 = 4 \)

\[ \text{Total} = 36 + 4 = 40 \]

Region Outside All (\( [H \cup R \cup F]’ \)): Not Prof, Area B, Not Home.

Skilled in B not home: \( 90 – 36 = 54 \)

Elem in B not home: \( 80 – 4 = 76 \)

\[ \text{Total} = 54 + 76 = 130 \]

Step 3: Part (d) – \( P(R’ \cap F) \)

\( R’ \) is Area B. \( F \) is Professional. So we want Professionals in Area B.

Looking at the diagram, this corresponds to the regions in F but outside R.

These are \( H \cap F \cap R’ \) (247) and \( F \text{ only} \) (133).

\[ 247 + 133 = 380 \]

Or directly from table: Professionals in B = 380.

\[ P = \frac{380}{1825} \approx 0.208 \]

Step 4: Part (e) – \( P([H \cup R]’) \)

This is everything outside circles H and R.

From the diagram, these are the regions: F only (133) and Outside everything (130).

\[ 133 + 130 = 263 \] \[ P = \frac{263}{1825} \approx 0.144 \]

Step 5: Part (f) – \( P(F | H) \)

Conditional probability: \( P(F | H) = \frac{n(F \cap H)}{n(H)} \).

We are restricting our total to just circle H.

Total in H: \( 40 + 123 + 481 + 247 = 891 \)

In intersection \( F \cap H \): \( 481 + 247 = 728 \)

\[ P(F | H) = \frac{728}{891} \approx 0.817 \]

↑ Back to Top

Question 6 (9 marks)

Anna is investigating the relationship between exercise and resting heart rate.

She takes a random sample of 19 people in her year at school and records for each person:

their resting heart rate, \( h \) beats per minute
the number of minutes, \( m \), spent exercising each week

Her results are shown on the scatter diagram.

(a) Interpret the nature of the relationship between \( h \) and \( m \).

(1)

Anna codes the data using the formulae:

\[ x = \log_{10} m \] \[ y = \log_{10} h \]

The product moment correlation coefficient between \( x \) and \( y \) is –0.897.

(b) Test whether or not there is significant evidence of a negative correlation between \( x \) and \( y \).

You should

state your hypotheses clearly
use a 5% level of significance
state the critical value used

(3)

The equation of the line of best fit of \( y \) on \( x \) is

\[ y = -0.05x + 1.92 \]

(c) Use the equation of the line of best fit of \( y \) on \( x \) to find a model for \( h \) on \( m \) in the form

\[ h = am^k \]

where \( a \) and \( k \) are constants to be found.

(5)

Worked Solution

Step 1: Part (a) – Interpretation

We look at the trend in the scatter diagram.

As the minutes of exercise (\( m \)) increase, the resting heart rate (\( h \)) decreases.

It suggests a negative correlation, but the curve flattens out, suggesting the effect diminishes as exercise increases.

Step 2: Part (b) – Hypothesis Test

We are testing for negative correlation using the PMCC (\( \rho \)).

Sample size \( n = 19 \). Significance level 5%.

Hypotheses:

\[ H_0: \rho = 0 \] \[ H_1: \rho < 0 \]

Critical Value:

Using PMCC tables for \( n=19 \) at 5% (one-tailed):

Critical Value = -0.3887

Conclusion:

Test statistic \( r = -0.897 \).

Since \( -0.897 < -0.3887 \), the result is in the critical region.

Reject \( H_0 \). There is significant evidence of a negative correlation.

Step 3: Part (c) – Exponential Model

We are given a linear equation in logs: \( y = -0.05x + 1.92 \).

We need to convert this back to the form \( h = am^k \).

Recall: \( y = \log_{10} h \) and \( x = \log_{10} m \).

Substitute the log definitions into the equation:

\[ \log_{10} h = -0.05 \log_{10} m + 1.92 \]

Use log laws to rewrite \( -0.05 \log_{10} m \):

\[ \log_{10} h = \log_{10} (m^{-0.05}) + 1.92 \]

To combine the 1.92, write it as a log:

\[ 1.92 = \log_{10} (10^{1.92}) \]

So:

\[ \log_{10} h = \log_{10} (m^{-0.05}) + \log_{10} (10^{1.92}) \] \[ \log_{10} h = \log_{10} (10^{1.92} \times m^{-0.05}) \]

Remove logs:

\[ h = 10^{1.92} \times m^{-0.05} \]

Calculate \( a = 10^{1.92} \):

\[ a \approx 83.176… \]

Comparing to \( h = am^k \):

\[ a \approx 83.2 \quad \text{and} \quad k = -0.05 \]

\[ h = 83.2 m^{-0.05} \]

(Accept \( a \) in range 83.17 – 83.2)

↑ Back to Top

GCSE 2022 Edexcel Statistics Paper 31

Exam Guide

Table of Contents

Question 1 (6 marks)

Worked Solution

Step 1: Part (a) – Binomial Distribution

Step 2: Part (b) – Normal Approximation

Question 2 (12 marks)

Worked Solution

Step 1: Part (a) – Finding Standard Deviation

Step 2: Part (b) – Finding Probability

Step 3: Part (c) – Expected Profit

Step 4: Part (d) – Binomial Hypothesis Check

Question 3 (7 marks)

Worked Solution

Step 1: Part (a) – Large Data Set Knowledge

Step 2: Part (b) – Mean and Standard Deviation

Step 3: Part (c) – Comparison

Step 4: Part (d) – Binomial Assumptions

Question 4 (6 marks)

Worked Solution

Step 1: Part (a) – Hypotheses

Step 2: Part (b) – Critical Region

Step 3: Part (c) – Actual Significance Level

Step 4: Part (d) – Conclusion

Question 5 (10 marks)

Worked Solution

Step 1: Parts (a) and (b) – Table Probabilities

Step 2: Part (c) – Completing the Venn Diagram

Step 3: Part (d) – \( P(R’ \cap F) \)

Step 4: Part (e) – \( P([H \cup R]’) \)

Step 5: Part (f) – \( P(F | H) \)

Question 6 (9 marks)

Worked Solution

Step 1: Part (a) – Interpretation

Step 2: Part (b) – Hypothesis Test

Step 3: Part (c) – Exponential Model