If any of my solutions look wrong, please refer to the mark scheme. You can exit full-screen mode for the question paper and mark scheme by clicking the icon in the bottom-right corner or by pressing Esc on your keyboard.

Edexcel A-Level Statistics Paper 31 (June 2024)

Edexcel A-Level Mathematics Paper 31: Statistics (June 2024)

💡 How to use this page

Try the question first: Write your answer on paper before checking.
Show Solution: Click the green button to reveal the step-by-step worked solution.
Check your reasoning: The solutions explain why we do each step, not just how.

Question 1 (Binomial & Normal Approximation)
Question 2 (Regression & Correlation)
Question 3 (Large Data Set)
Question 4 (Hypothesis Testing)
Question 5 (Normal Distribution)
Question 6 (Venn Diagrams)

Question 1 (11 marks)

Xian rolls a fair die 10 times.

The random variable \( X \) represents the number of times the die lands on a six.

(a) Using a suitable distribution for \( X \), find

(i) \( P(X = 3) \)

(ii) \( P(X < 3) \)

(3)

Xian repeats this experiment each day for 60 days and records the number of days when \( X = 3 \).

(b) Find the probability that there were at least 12 days when \( X = 3 \).

(3)

(1)

(d) Use a normal approximation to estimate the probability that Xian rolls a total of more than 95 sixes during these 60 days.

(4)

📝 Worked Solution

Part (a): Binomial Probabilities

💡 Step 1: Identify the Distribution

The die is fair, so the probability of rolling a six is \( p = \frac{1}{6} \). The number of trials is fixed at \( n = 10 \). The rolls are independent. This fits the Binomial distribution.

✏️ Model:

\[ X \sim B(10, \frac{1}{6}) \]

(i) Find \( P(X = 3) \)

Using the formula \( P(X=x) = \binom{n}{x} p^x (1-p)^{n-x} \) or calculator Bpd:

\[ P(X = 3) = \binom{10}{3} \left(\frac{1}{6}\right)^3 \left(\frac{5}{6}\right)^7 \]

Calculator:

Menu -> Distribution -> Binomial CD/PD -> PD
x: 3, N: 10, p: 0.1666…

\[ P(X = 3) = 0.155045… \]

Answer: 0.155 (3 s.f.)

(ii) Find \( P(X < 3) \)

Because the binomial distribution is discrete, “less than 3” means 0, 1, or 2. It does NOT include 3.

\[ P(X < 3) = P(X \leq 2) \]

Using Calculator (Binomial CD):

x: 2, N: 10, p: 1/6

\[ P(X \leq 2) = 0.775226… \]

Answer: 0.775 (3 s.f.)

Part (b): Probability over 60 Days

💡 Step 1: Define the new variable

We are looking at the number of days out of 60 where \( X = 3 \). Let \( D \) be the number of days \( X=3 \).

The probability of “success” (getting \( X=3 \)) on any given day is the answer from (a)(i), which is \( 0.155045… \).

✏️ Model:

\[ D \sim B(60, 0.155045…) \]

We need “at least 12 days”, which is \( P(D \geq 12) \).

Since calculators compute \( P(D \leq x) \), we use the complement rule:

\[ P(D \geq 12) = 1 – P(D \leq 11) \]

Using Calculator (Binomial CD):

x: 11, N: 60, p: 0.155045…
\( P(D \leq 11) = 0.78819… \)

\[ P(D \geq 12) = 1 – 0.78819… = 0.2118… \]

Answer:

0.212 (3 s.f.)

Part (c): Estimate Total Sixes

💡 Step 1: Calculate Total Trials

Xian rolls the die 10 times each day for 60 days.

Total rolls \( N = 10 \times 60 = 600 \).

The expected number of sixes is \( E(Y) = N \times p \).

\[ \text{Estimate} = 600 \times \frac{1}{6} = 100 \]

100

Part (d): Normal Approximation

💡 Step 1: Setup Normal Approximation parameters

Let \( S \) be the total number of sixes in 600 rolls.

\( S \sim B(600, \frac{1}{6}) \).

Since \( n \) is large and \( p \) is not too close to 0 or 1, we can approximate with \( N(\mu, \sigma^2) \).

Mean \( \mu = np \)

Variance \( \sigma^2 = np(1-p) \)

\[ \mu = 600 \times \frac{1}{6} = 100 \] \[ \sigma^2 = 600 \times \frac{1}{6} \times \frac{5}{6} = \frac{500}{6} = 83.33… \] \[ \sigma = \sqrt{83.33…} \approx 9.1287… \]

So, \( S \approx N(100, 83.33…) \).

💡 Step 2: Apply Continuity Correction

We want \( P(S > 95) \) for a discrete variable.

On a continuous scale, “greater than 95” starts at 95.5 (the upper bound of 95). Remember, > 95 means 96, 97… so we start collecting area from 95.5 upwards.

\[ P(S > 95) \approx P(Y > 95.5) \text{ where } Y \sim N(100, 83.33…) \]

💡 Step 3: Calculate Normal Probability

Using Calculator (Normal CD):

Lower: 95.5
Upper: 10000 (a large number)
\( \sigma \): 9.1287…
\( \mu \): 100

\[ P(Y > 95.5) = 0.68897… \]

Answer:

0.689 (3 s.f.)

↑ Back to Top

Question 2 (6 marks)

Amar is studying the flight of a bird from its nest.

He measures the bird’s height above the ground, \( h \) metres, at time \( t \) seconds for 10 values of \( t \).

Amar finds the equation of the regression line for the data to be \( h = 38.6 – 1.28t \).

(a) Interpret the gradient of this line.

(1)

The product moment correlation coefficient between \( h \) and \( t \) is -0.510.

(b) Test whether or not there is evidence of a negative correlation between the height above the ground and the time during the flight.

You should

state your hypotheses clearly
use a 5% level of significance
state the critical value used

(3)

Jane draws the following scatter diagram for Amar’s data.

(c) With reference to the scatter diagram, state, giving a reason, whether or not the regression line \( h = 38.6 – 1.28t \) is an appropriate model for these data.

(1)

Jane suggests an improved model using the variable \( u = (t – k)^2 \) where \( k \) is a constant.

She obtains the equation \( h = 38.1 – 0.78u \).

(d) Choose a suitable value for \( k \) to write Jane’s improved model for \( h \) in terms of \( t \) only.

(1)

📝 Worked Solution

Part (a): Interpretation

The gradient is -1.28. This represents the rate of change of height with respect to time.

In context, it means the height decreases by 1.28 meters for every second.

The height of the bird decreases by approximately 1.28 m per second.

Part (b): Hypothesis Test for Correlation

💡 Step 1: Hypotheses

We are testing for negative correlation. The population correlation coefficient is \( \rho \).

\( H_0: \rho = 0 \) (No correlation)
\( H_1: \rho < 0 \) (Negative correlation)

💡 Step 2: Critical Value

Sample size \( n = 10 \). Significance level 5% (0.05). One-tailed test.

Using the Product Moment Correlation Coefficient table for \( n=10, p=0.05 \):

Critical Value (r) = 0.5494

Since we are looking for negative correlation, the Critical Value is \( -0.5494 \).

💡 Step 3: Comparison and Conclusion

Observed value \( r = -0.510 \).

We compare -0.510 with -0.5494. Since -0.510 is closer to 0 than -0.5494 (i.e., \( -0.510 > -0.5494 \)), it falls in the acceptance region.

Since \( -0.510 > -0.5494 \), we do not reject \( H_0 \).

There is insufficient evidence of a negative correlation between height and time.

Part (c): Model Appropriateness

Look at the scatter diagram. The points follow a curved path (likely quadratic), rising and then falling. They do not follow a straight line.

No, the regression line is not appropriate because the points follow a non-linear (curved) pattern.

Part (d): Improved Model

The model is \( h = 38.1 – 0.78(t-k)^2 \).

This is the equation of a parabola (quadratic) opening downwards.

The term \( (t-k)^2 \) suggests the vertex (maximum height) occurs when \( t = k \).

Looking at the scatter graph, the maximum height seems to occur around \( t = 3.5 \) to \( t = 4 \).

By visual inspection of the peak of the curve in the scatter diagram, the axis of symmetry is between 3 and 4.

A value like \( k = 3.5 \) or \( k = 4 \) is suitable.

Acceptable range: \( 3 \leq k \leq 4.5 \). Example: \( k = 3.5 \).

Model: \( h = 38.1 – 0.78(t – 3.5)^2 \)

↑ Back to Top

Question 3 (6 marks)

Ming is studying the large data set for Perth in 2015.

He intended to use all the data available to find summary statistics for the Daily Mean Air Temperature, \( x \) °C.

Unfortunately, Ming selected an incorrect variable on the spreadsheet.

This incorrect variable gave a mean of 5.3 and a standard deviation of 12.4.

(a) Using your knowledge of the large data set, suggest which variable Ming selected.

(1)

The correct values for the Daily Mean Air Temperature are summarised as

\[ n = 184 \quad \sum x = 2801.2 \quad \sum x^2 = 44695.4 \]

(b) Calculate the mean and standard deviation for these data.

(3)

One of the months from the large data set for Perth in 2015 has

mean \( \bar{x} = 19.4 \)
standard deviation \( \sigma_x = 2.83 \)

for Daily Mean Air Temperature.

(2)

📝 Worked Solution

Part (a): Identifying the Variable

We need a variable that can have a mean of 5.3 and a high standard deviation (12.4). Temperature in Perth (Australia) is usually higher. Wind speed? Rainfall?

Rainfall is often low but has occasional high values, giving a high standard deviation relative to the mean. However, Perth in 2015 is part of the LDS. The LDS variables are Daily Mean Temp, Daily Total Rainfall, Daily Mean Pressure, Daily Mean Wind Speed, etc.

A standard deviation of 12.4 is very large compared to a mean of 5.3. This suggests a variable with many zeros and some large values, typical of Daily Total Rainfall.

Daily Total Rainfall

Part (b): Mean and Standard Deviation

Mean (\( \bar{x} \)):

\[ \bar{x} = \frac{\sum x}{n} = \frac{2801.2}{184} = 15.2239… \]

Mean = 15.2 (3 s.f.)

Standard Deviation (\( \sigma \)):

Formula: \( \sigma = \sqrt{\frac{\sum x^2}{n} – \bar{x}^2} \)

\[ \sigma = \sqrt{\frac{44695.4}{184} – (15.2239…)^2} \] \[ \sigma = \sqrt{242.909… – 231.767…} \] \[ \sigma = \sqrt{11.142…} = 3.338… \]

Standard Deviation = 3.34 (3 s.f.)

Part (c): Identifying the Month

The overall mean for 2015 is 15.2°C.

The specific month has a mean of 19.4°C.

This is significantly hotter than the yearly average.

Perth is in the Southern Hemisphere.

Summer: December, January, February (Hot)
Winter: June, July, August (Cool)
Spring: September, October, November (Warming up)

The LDS typically covers May to October for the UK, but for Perth it covers the same calendar months (May-Oct).

In Perth, May to October is late Autumn/Winter/Spring.

Since 19.4°C is warmer than the average (15.2°C), we need a month close to summer.

October is spring (getting warmer). July is winter (coldest). May is autumn (cooling down).

October would be the warmest month in the May-Oct dataset for the Southern Hemisphere.

October. Because Perth is in the southern hemisphere, October is spring and would be warmer than the winter months (June/July).

↑ Back to Top

Question 4 (6 marks)

The proportion of left-handed adults in a country is 10%.

Freya believes that the proportion of left-handed adults under the age of 25 in this country is different from 10%.

She takes a random sample of 40 adults under the age of 25 from this country to investigate her belief.

(a) Find the critical region for a suitable test to assess Freya’s belief.

You should

state your hypotheses clearly
use a 5% level of significance
state the probability of rejection in each tail

(4)

(b) Write down the actual significance level of your test in part (a).

(1)

In Freya’s sample 7 adults were left-handed.

(1)

📝 Worked Solution

Part (a): Critical Region

💡 Step 1: Hypotheses

Let \( p \) be the probability of an adult under 25 being left-handed.

\( H_0: p = 0.10 \)
\( H_1: p \neq 0.10 \) (Different from, so Two-tailed test)

💡 Step 2: Distribution

Let \( X \) be the number of left-handed adults in the sample.

\[ X \sim B(40, 0.1) \]

💡 Step 3: Critical Region Calculation

Significance level = 5% total. Since it’s two-tailed, we want 2.5% (0.025) in each tail.

Lower Tail: Find largest \( x \) such that \( P(X \leq x) \leq 0.025 \).

\( P(X \leq 0) = 0.0148 \) ( < 0.025 )
\( P(X \leq 1) = 0.0805 \) ( > 0.025 )

So lower critical region is \( X = 0 \).

Upper Tail: Find smallest \( x \) such that \( P(X \geq x) \leq 0.025 \).

This is equivalent to \( 1 – P(X \leq x-1) \leq 0.025 \), or \( P(X \leq x-1) \geq 0.975 \).

\( P(X \leq 7) = 0.9581 \)
\( P(X \leq 8) = 0.9845 \)

So \( x-1 = 8 \implies x = 9 \).

Check: \( P(X \geq 9) = 1 – P(X \leq 8) = 1 – 0.9845 = 0.0155 \) ( < 0.025 ).

Check previous: \( P(X \geq 8) = 1 – P(X \leq 7) = 1 – 0.9581 = 0.0419 \) ( > 0.025 ).

So upper critical region is \( X \geq 9 \).

Critical Region: \( \{0\} \cup \{9, 10, …, 40\} \)

Rejection probabilities:

Lower tail: 0.0148
Upper tail: 0.0155

Part (b): Actual Significance Level

Sum of probabilities in the critical region:

\[ 0.0148 + 0.0155 = 0.0303 \]

0.0303 (or 3.03%)

Part (c): Conclusion

Observed value: 7 left-handed adults.

Is 7 in the critical region (\( X=0 \) or \( X \geq 9 \))?

No, 7 is in the acceptance region.

7 is not in the critical region. There is insufficient evidence to support Freya’s belief that the proportion is different from 10%.

↑ Back to Top

Question 5 (10 marks)

The records for a school athletics club show that the height, \( H \) metres, achieved by students in the high jump is normally distributed with mean 1.4 metres and standard deviation 0.15 metres.

(a) Find the proportion of these students achieving a height of more than 1.6 metres.

(1)

The records also show that the time, \( T \) seconds, to run 1500 metres is normally distributed with mean 330 seconds and standard deviation 26 seconds.

The school’s Head would like to use these distributions to estimate the proportion of students from the school athletics club who can jump higher than 1.6 metres and can run 1500 metres in less than 5 minutes.

(b) State a necessary assumption about \( H \) and \( T \) for the Head to calculate an estimate of this proportion.

(1)

(3)

Students in the school athletics club also throw the discus.

The random variable \( D \sim N(\mu, \sigma^2) \) represents the distance, in metres, that a student can throw the discus.

Given that \( P(D < 16.3) = 0.30 \) and \( P(D < 29.0) = 0.10 \)

(d) calculate the value of \( \mu \) and the value of \( \sigma \).

(5)

📝 Worked Solution

Part (a): Normal Probability

Model: \( H \sim N(1.4, 0.15^2) \)

Find \( P(H > 1.6) \).

Calculator (Normal CD):

Lower: 1.6
Upper: 1000
\( \sigma \): 0.15
\( \mu \): 1.4

\[ P(H > 1.6) = 0.09121… \]

0.0912 (3 s.f.)

Part (b): Assumption

To multiply probabilities of two events occurring together (“and”), the events must be independent.

The height achieved in high jump and the time taken to run 1500m are independent.

Part (c): Combined Probability

First, convert “5 minutes” to seconds: \( 5 \times 60 = 300 \) seconds.

Model: \( T \sim N(330, 26^2) \)

Find \( P(T < 300) \).

Calculator (Normal CD):

Lower: -1000
Upper: 300
\( \sigma \): 26
\( \mu \): 330

\[ P(T < 300) = 0.12428... \]

Total Probability = \( P(H > 1.6) \times P(T < 300) \)

\[ 0.09121… \times 0.12428… = 0.01133… \]

0.0113 (3 s.f.)

Part (d): Finding Mean and Standard Deviation

We are given:

\( P(D < 16.3) = 0.30 \)
\( P(D > 29.0) = 0.10 \implies P(D < 29.0) = 0.90 \)

We need to standardize using \( Z = \frac{X – \mu}{\sigma} \).

Equation 1:

Inverse Normal for area 0.30:

\[ Z_1 = -0.5244… \] \[ \frac{16.3 – \mu}{\sigma} = -0.5244 \] \[ 16.3 – \mu = -0.5244\sigma \quad \text{(1)} \]

Equation 2:

Inverse Normal for area 0.90:

\[ Z_2 = 1.2815… \] \[ \frac{29.0 – \mu}{\sigma} = 1.2816 \] \[ 29.0 – \mu = 1.2816\sigma \quad \text{(2)} \]

Solve Simultaneous Equations:

Subtract (1) from (2):

\[ (29.0 – \mu) – (16.3 – \mu) = 1.2816\sigma – (-0.5244\sigma) \] \[ 12.7 = 1.806\sigma \] \[ \sigma = \frac{12.7}{1.806} = 7.032… \]

Substitute \( \sigma \) back into (2):

\[ \mu = 29.0 – 1.2816(7.032…) \] \[ \mu = 29.0 – 9.012… \] \[ \mu = 19.98… \]

\( \mu = 20.0 \) (3 s.f.)

\( \sigma = 7.03 \) (3 s.f.)

↑ Back to Top

Question 6 (11 marks)

The Venn diagram, where \( p, q \) and \( r \) are probabilities, shows the events \( A, B, C \) and \( D \) and associated probabilities.

(a) State any pair of mutually exclusive events from \( A, B, C \) and \( D \).

(1)

The events \( B \) and \( C \) are independent.

(b) Find the value of \( p \).

(2)

(3)

Given that \( P(B | A’) = 0.5 \)

(d) find the value of \( q \) and the value of \( r \).

(3)

(e) Find \( P([A \cup B]’ \cap C) \).

(1)

(f) Use set notation to write an expression for the event with probability \( p \).

(1)

📝 Worked Solution

Part (a): Mutually Exclusive Events

Mutually exclusive means the events have no intersection (overlap). In the Venn diagram, circles that do not touch or overlap are mutually exclusive.

Look at \( A \) and \( C \). They do not overlap.

Also \( A \) and \( D \). Also \( B \) and \( D \) (if D is entirely inside the non-B part of C? The diagram shows B and C overlap, but D is small inside C. Does B overlap D? The diagram separates them visually).

Safest bet: \( A \) and \( C \).

\( A \) and \( C \) (or \( A \) and \( D \), or \( B \) and \( D \))

Part (b): Independence

If \( B \) and \( C \) are independent, then:

\[ P(B \cap C) = P(B) \times P(C) \]

From the diagram:

\( P(B \cap C) = 0.27 \)
\( P(B) = 0.05 + p + 0.27 = p + 0.32 \)
\( P(C) = 0.27 + 0.08 + 0.25 = 0.60 \)

\[ 0.27 = (p + 0.32) \times 0.60 \] \[ 0.27 = 0.6p + 0.192 \] \[ 0.078 = 0.6p \] \[ p = \frac{0.078}{0.6} = 0.13 \]

\( p = 0.13 \)

Part (c): Greatest value of P(A | B’)

Formula for conditional probability:

\[ P(A | B’) = \frac{P(A \cap B’)}{P(B’)} \]

From the diagram:

\( P(A \cap B’) = q \) (The part of A not in B)
\( P(B’) = 1 – P(B) = 1 – (0.13 + 0.32) = 1 – 0.45 = 0.55 \)

So, \( P(A | B’) = \frac{q}{0.55} \).

To maximize this, we need the maximum possible value of \( q \).

We know the sum of all probabilities is 1:

\[ q + 0.05 + p + 0.27 + 0.25 + 0.08 + r = 1 \] \[ q + 0.05 + 0.13 + 0.27 + 0.25 + 0.08 + r = 1 \] \[ q + 0.78 + r = 1 \] \[ q + r = 0.22 \]

Since \( r \geq 0 \), the maximum value of \( q \) occurs when \( r = 0 \).

Max \( q = 0.22 \).

\[ \text{Max } P(A | B’) = \frac{0.22}{0.55} \] \[ \frac{0.22}{0.55} = \frac{2}{5} = 0.4 \]

0.4

Part (d): Finding q and r

We are given \( P(B | A’) = 0.5 \).

\[ \frac{P(B \cap A’)}{P(A’)} = 0.5 \]

From the diagram:

\( P(B \cap A’) \) is the part of B not in A. That is \( p + 0.27 \).
\( p = 0.13 \), so \( P(B \cap A’) = 0.13 + 0.27 = 0.40 \).

So:

\[ \frac{0.40}{P(A’)} = 0.5 \] \[ P(A’) = \frac{0.40}{0.5} = 0.80 \]

Since \( P(A) = 1 – P(A’) \), we have \( P(A) = 1 – 0.80 = 0.20 \).

From the diagram, \( P(A) = q + 0.05 \).

Find q:

\[ q + 0.05 = 0.20 \] \[ q = 0.15 \]

Find r:

Using \( q + r = 0.22 \) from part (c):

\[ 0.15 + r = 0.22 \] \[ r = 0.07 \]

\( q = 0.15 \)

\( r = 0.07 \)

Part (e): Set Notation Probability

We need \( P([A \cup B]’ \cap C) \).

\( [A \cup B]’ \) is everything OUTSIDE A and B.

Intersecting this with C means “Everything inside C that is NOT in A or B”.

Looking at the diagram, C overlaps with B. The part of C that is not in B is the region with \( 0.08 \) and \( 0.25 \). A does not overlap C.

\[ 0.08 + 0.25 = 0.33 \]

0.33

Part (f): Expression for p

\( p \) represents the region inside B that is NOT in A and NOT in C.

\( B \cap A’ \cap C’ \) or \( B \cap (A \cup C)’ \)

↑ Back to Top

Edexcel A-Level Mathematics Paper 31: Statistics (June 2024)

💡 How to use this page

Table of Contents

Question 1 (11 marks)

📝 Worked Solution

Part (a): Binomial Probabilities

Part (b): Probability over 60 Days

Part (c): Estimate Total Sixes

Part (d): Normal Approximation

Question 2 (6 marks)

📝 Worked Solution

Part (a): Interpretation

Part (b): Hypothesis Test for Correlation

Part (c): Model Appropriateness

Part (d): Improved Model

Question 3 (6 marks)

📝 Worked Solution

Part (a): Identifying the Variable

Part (b): Mean and Standard Deviation

Part (c): Identifying the Month

Question 4 (6 marks)

📝 Worked Solution

Part (a): Critical Region

Part (b): Actual Significance Level

Part (c): Conclusion

Question 5 (10 marks)

📝 Worked Solution

Part (a): Normal Probability

Part (b): Assumption

Part (c): Combined Probability

Part (d): Finding Mean and Standard Deviation

Question 6 (11 marks)

📝 Worked Solution

Part (a): Mutually Exclusive Events

Part (b): Independence

Part (c): Greatest value of P(A | B’)

Part (d): Finding q and r

Part (e): Set Notation Probability

Part (f): Expression for p