If any of my solutions look wrong, please refer to the mark scheme. You can exit full-screen mode for the question paper and mark scheme by clicking the icon in the bottom-right corner or by pressing Esc on your keyboard.

A-Level Mathematics: Statistics (Paper 31) June 2023

๐Ÿ“ Legend

  • (M1) Method Mark: For knowing the method and attempting to apply it.
  • (A1) Accuracy Mark: For correct answers derived from correct methods.
  • (B1) Independent Mark: For correct answers independent of method.
  • (ft) Follow Through: Marks awarded for correct work based on previous errors.

Question 1 (6 marks)

The Venn diagram, where \( p \) and \( q \) are probabilities, shows the three events \( A \), \( B \) and \( C \) and their associated probabilities.

A B C 0.13 0.25 0.05 0.3 p q

(a) Find \( P(A) \).

(1)

The events \( B \) and \( C \) are independent.

(b) Find the value of \( p \) and the value of \( q \).

(3)

(c) Find \( P(A|B’) \).

(2)

Worked Solution

Step 1: Finding P(A)

๐Ÿ’ก What are we looking for?

We need to find the total probability of event \( A \) by summing all the mutually exclusive regions inside circle \( A \).

โœ Working:

\[ P(A) = 0.13 + 0.25 \] \[ P(A) = 0.38 \]

โœ“ (B1)

Step 2: Using Independence to find p

๐Ÿ’ก Strategy:

We are told \( B \) and \( C \) are independent. The definition of independence is \( P(B \cap C) = P(B) \times P(C) \).

From the diagram:

  • \( P(B \cap C) = 0.3 \) (the region where B and C overlap)
  • \( P(B) = 0.25 + 0.05 + 0.3 \) (sum of all regions in B)
  • \( P(C) = 0.3 + p \) (sum of all regions in C)

โœ Working:

\[ P(B) = 0.25 + 0.05 + 0.3 = 0.6 \] \[ P(C) = 0.3 + p \]

Using the independence formula \( P(B \cap C) = P(B) \times P(C) \):

\[ 0.3 = 0.6 \times (0.3 + p) \] \[ 0.3 = 0.18 + 0.6p \] \[ 0.12 = 0.6p \] \[ p = \frac{0.12}{0.6} = 0.2 \]

โœ“ (M1, A1)

Step 3: Finding q

๐Ÿ’ก Strategy:

The sum of all probabilities in a Venn diagram must equal 1. We sum all distinct regions and set the total to 1.

โœ Working:

\[ 0.13 + 0.25 + 0.05 + 0.3 + p + q = 1 \]

Substitute \( p = 0.2 \):

\[ 0.13 + 0.25 + 0.05 + 0.3 + 0.2 + q = 1 \] \[ 0.93 + q = 1 \] \[ q = 1 – 0.93 = 0.07 \]

โœ“ (B1)

Step 4: Finding P(A|B’)

๐Ÿ’ก What this asks:

We need the probability of \( A \) given that \( B \) has not happened. The formula is:

\[ P(A|B’) = \frac{P(A \cap B’)}{P(B’)} \]

โœ Working:

Region \( A \cap B’ \) is the part of A that is NOT in B: \( 0.13 \).

Region \( B’ \) is everything outside B: \( 1 – P(B) = 1 – 0.6 = 0.4 \).

\[ P(A|B’) = \frac{0.13}{0.4} \] \[ P(A|B’) = 0.325 \]

โœ“ (M1, A1)

Final Answers:

(a) \( P(A) = 0.38 \)

(b) \( p = 0.2, \quad q = 0.07 \)

(c) \( 0.325 \) (or \( \frac{13}{40} \))

โ†‘ Back to Top

Question 2 (9 marks)

A machine fills packets with sweets and \( \frac{1}{7} \) of the packets also contain a prize.

The packets of sweets are placed in boxes before being delivered to shops. There are 40 packets of sweets in each box.

The random variable \( T \) represents the number of packets of sweets that contain a prize in each box.

(a) State a condition needed for \( T \) to be modelled by \( B(40, \frac{1}{7}) \).

(1)

A box is selected at random.

(b) Using \( T \sim B(40, \frac{1}{7}) \) find

(i) the probability that the box has exactly 6 packets containing a prize,

(ii) the probability that the box has fewer than 3 packets containing a prize.

(2)

Kamilโ€™s sweet shop buys 5 boxes of these sweets.

(c) Find the probability that exactly 2 of these 5 boxes have fewer than 3 packets containing a prize.

(2)

Kamil claims that the proportion of packets containing a prize is less than \( \frac{1}{7} \).

A random sample of 110 packets is taken and 9 packets contain a prize.

(d) Use a suitable test to assess Kamilโ€™s claim. You should

  • state your hypotheses clearly
  • use a 5% level of significance

(4)

Worked Solution

Step 1: Conditions for Binomial Distribution

๐Ÿ’ก Understanding:

For a binomial model to be valid, trials must be independent and the probability of success must remain constant.

โœ Answer:

Any one of:

  • The packets are filled/packed independently.
  • The probability of a packet containing a prize is constant for each packet.
  • Prizes are placed in packets at random.

โœ“ (B1)

Step 2: Calculating Probabilities

๐Ÿ’ก Strategy:

We use the distribution \( T \sim B(40, \frac{1}{7}) \).

(i) We need \( P(T = 6) \). Use binomial formula or calculator (Binomial PD).

(ii) “Fewer than 3” means \( T < 3 \), which is \( T \leq 2 \). Use calculator (Binomial CD).

โœ Working:

(i) \( P(T = 6) \):

\[ P(T=6) = \binom{40}{6} \left(\frac{1}{7}\right)^6 \left(\frac{6}{7}\right)^{34} \approx 0.173 \]

โœ“ (B1)

(ii) \( P(T < 3) = P(T \leq 2) \):

\[ P(T \leq 2) = P(T=0) + P(T=1) + P(T=2) \]

Using cumulative distribution function on calculator:

\[ P(T \leq 2) \approx 0.0616 \]

โœ“ (B1)

Step 3: Distribution of Boxes

๐Ÿ’ก Strategy:

We are now looking at 5 boxes. Let \( K \) be the number of boxes with fewer than 3 prizes.

The probability of “success” (a box having fewer than 3 prizes) is the answer from (b)(ii), \( p = 0.0616 \).

We model this as a new binomial: \( K \sim B(5, 0.0616) \).

We need \( P(K = 2) \).

โœ Working:

\[ P(K = 2) = \binom{5}{2} (0.0616)^2 (1 – 0.0616)^3 \] \[ P(K = 2) = 10 \times (0.0616)^2 \times (0.9384)^3 \] \[ P(K = 2) \approx 0.0313 \]

โœ“ (M1, A1)

Step 4: Hypothesis Testing

๐Ÿ’ก Strategy:

We are testing if the proportion \( p \) is less than \( 1/7 \). This is a one-tailed test.

Sample size \( n = 110 \). Observed value \( x = 9 \).

Under \( H_0 \), \( X \sim B(110, \frac{1}{7}) \).

We calculate the p-value: \( P(X \leq 9) \).

โœ Working:

Hypotheses:

\[ H_0: p = \frac{1}{7} \] \[ H_1: p < \frac{1}{7} \]

โœ“ (B1)

Test Statistic:

Using \( X \sim B(110, \frac{1}{7}) \), calculate \( P(X \leq 9) \):

\[ P(X \leq 9) \approx 0.0383 \]

โœ“ (M1, A1)

Conclusion:

Compare p-value to significance level \( \alpha = 0.05 \):

\[ 0.0383 < 0.05 \]

The result is significant. We reject \( H_0 \).

There is sufficient evidence to support Kamil’s claim that the proportion of packets containing a prize is less than \( 1/7 \).

โœ“ (A1)

Final Answers:

(a) Packets filled independently / constant probability.

(b)(i) 0.173

(b)(ii) 0.0616

(c) 0.0313

(d) Reject \( H_0 \). Evidence supports claim.

โ†‘ Back to Top

Question 3 (7 marks)

Ben is studying the Daily Total Rainfall, \( x \) mm, in Leeming for 1987. He used all the data from the large data set and summarised the information in the following table.

x 0 0.1โ€“0.5 0.6โ€“1.0 1.1โ€“1.9 2.0โ€“4.0 4.1โ€“6.9 7.0โ€“12.0 12.1โ€“20.9 21.0โ€“32.0 tr
Frequency 55 18 18 21 17 9 9 6 2 29

(a) Explain how the data will need to be cleaned before Ben can start to calculate statistics such as the mean and standard deviation.

(2)

Using all 184 of these values, Ben estimates \( \sum x = 390 \) and \( \sum x^2 = 4336 \)

(b) Calculate estimates for

(i) the mean Daily Total Rainfall,

(ii) the standard deviation of the Daily Total Rainfall.

(3)

Ben suggests using the statistic calculated in part (b)(i) to estimate the annual mean Daily Total Rainfall in Leeming for 1987.

(c) Using your knowledge of the large data set,

(i) give a reason why these data would not be suitable,

(ii) state, giving a reason, how you would expect the estimate in part (b)(i) to differ from the actual annual mean Daily Total Rainfall in Leeming for 1987.

(2)

Worked Solution

Step 1: Data Cleaning

๐Ÿ’ก Knowledge:

In the Large Data Set, “tr” stands for “trace”, which means a small amount of rain (less than 0.05mm). To perform calculations, we must replace this non-numerical value with a number.

โœ Answer:

The “tr” (trace) values need to be replaced by a numerical value. A suitable value would be between 0 and 0.05, such as 0.025.

โœ“ (M1, A1)

Step 2: Calculating Mean and Standard Deviation

๐Ÿ’ก Strategy:

We are given the summary statistics \( \sum x = 390 \) and \( \sum x^2 = 4336 \) and \( n = 184 \).

Mean \( \bar{x} = \frac{\sum x}{n} \).

Standard Deviation \( \sigma = \sqrt{\frac{\sum x^2}{n} – \bar{x}^2} \).

โœ Working:

(i) Mean:

\[ \bar{x} = \frac{390}{184} \approx 2.1195… \]

Rounding to 3 s.f.: 2.12

โœ“ (B1)

(ii) Standard Deviation:

\[ \sigma = \sqrt{\frac{4336}{184} – \left(\frac{390}{184}\right)^2} \] \[ \sigma = \sqrt{23.565… – 4.492…} \] \[ \sigma = \sqrt{19.07…} \approx 4.37 \]

โœ“ (M1, A1)

Step 3: Large Data Set Knowledge

๐Ÿ’ก Understanding:

The Large Data Set typically covers the months May to October only. A full year includes winter months (November to April).

โœ Answer:

(i) The data only covers the months from May to October, so it is not a random sample of the whole year.

โœ“ (B1)

(ii) Winter months usually have higher rainfall than summer months. Therefore, excluding them implies the calculated mean (2.12) is likely an underestimate of the true annual mean.

โœ“ (B1)

Final Answers:

(a) Replace ‘tr’ with a numerical value (e.g., 0.025).

(b)(i) 2.12

(b)(ii) 4.37

(c)(i) Data only covers May-Oct.

(c)(ii) Underestimate (Winter months have more rain).

โ†‘ Back to Top

Question 4 (6 marks)

A study was made of adult men from region A of a country.

It was found that their heights were normally distributed with a mean of 175.4 cm and standard deviation 6.8 cm.

(a) Find the proportion of these men that are taller than 180 cm.

(1)

A student claimed that the mean height of adult men from region B of this country was different from the mean height of adult men from region A.

A random sample of 52 adult men from region B had a mean height of 177.2 cm.

The student assumed that the standard deviation of heights of adult men was 6.8 cm both for region A and region B.

(b) Use a suitable test to assess the studentโ€™s claim. You should

  • state your hypotheses clearly
  • use a 5% level of significance

(4)

(c) Find the p-value for the test in part (b).

(1)

Worked Solution

Step 1: Normal Probability

๐Ÿ’ก Strategy:

We use the Normal Distribution calculator function.

\( X \sim N(175.4, 6.8^2) \).

Find \( P(X > 180) \).

โœ Working:

Lower: 180, Upper: 10000, \( \sigma: 6.8 \), \( \mu: 175.4 \)

\[ P(X > 180) \approx 0.249 \]

โœ“ (B1)

Step 2: Hypothesis Test

๐Ÿ’ก Strategy:

We are testing the sample mean \( \bar{X} \). The claim is “different from”, so this is a two-tailed test.

The sample mean distribution follows \( \bar{X} \sim N(\mu, \frac{\sigma^2}{n}) \).

\( \mu = 175.4 \), \( \sigma = 6.8 \), \( n = 52 \).

โœ Working:

Hypotheses:

\[ H_0: \mu = 175.4 \] \[ H_1: \mu \neq 175.4 \]

โœ“ (B1)

Distribution of Sample Mean:

\[ \bar{X} \sim N\left(175.4, \frac{6.8^2}{52}\right) \] \[ \text{Standard deviation of } \bar{X} = \frac{6.8}{\sqrt{52}} \approx 0.943 \]

โœ“ (M1)

Test Statistic / Probability:

Find \( P(\bar{X} > 177.2) \) (as 177.2 > 175.4):

Lower: 177.2, Upper: 10000, \( \sigma: 0.943 \), \( \mu: 175.4 \)

\[ P(\bar{X} > 177.2) \approx 0.0281 \]

โœ“ (A1)

Conclusion:

Since it is a two-tailed test, compare with \( \alpha/2 = 0.025 \).

\[ 0.0281 > 0.025 \]

(Or compare p-value \( 2 \times 0.0281 = 0.0562 \) with \( 0.05 \))

The result is not significant. Do not reject \( H_0 \).

There is insufficient evidence to support the student’s claim that the mean height is different.

โœ“ (A1)

Step 3: Calculating p-value

๐Ÿ’ก Strategy:

For a two-tailed test, the p-value is double the single-tail probability we calculated in part (b).

โœ Working:

\[ \text{p-value} = 2 \times 0.02814… \] \[ \text{p-value} \approx 0.0563 \]

โœ“ (B1ft)

Final Answers:

(a) 0.249

(b) Do not reject \( H_0 \). Insufficient evidence.

(c) 0.0563 (or 5.6%)

โ†‘ Back to Top

Question 5 (8 marks)

Tisam is playing a game. She uses a ball, a cup and a spinner.

The random variable \( X \) represents the number the spinner lands on when it is spun. The probability distribution of \( X \) is given in the following table:

\( x \) 20 50 80 100
\( P(X = x) \) \( a \) \( b \) \( c \) \( d \)

where \( a, b, c \) and \( d \) are probabilities.

To play the game:

  • the spinner is spun to obtain a value of \( x \)
  • Tisam then stands \( x \) cm from the cup and tries to throw the ball into the cup

The event \( S \) represents the event that Tisam successfully throws the ball into the cup.

To model this game Tisam assumes that:

  • \( P(S | \{X = x\}) = \frac{k}{x} \) where \( k \) is a constant
  • \( P(S \cap \{X = x\}) \) should be the same whatever value of \( x \) is obtained from the spinner

(a) Using Tisamโ€™s model, show that \( c = \frac{8}{5}b \).

(2)

(b) Find the probability distribution of \( X \).

(5)

Nav tries, a large number of times, to throw the ball into the cup from a distance of 100 cm. He successfully gets the ball in the cup 30% of the time.

(c) State, giving a reason, why Tisamโ€™s model of this game is not suitable to describe Nav playing the game for all values of \( X \).

(1)

Worked Solution

Step 1: Understanding the Condition

๐Ÿ’ก Strategy:

The condition says \( P(S \cap \{X = x\}) \) is constant. Let’s call this constant \( V \).

We know \( P(S \cap X=x) = P(S | X=x) \times P(X=x) \).

Given \( P(S | X=x) = \frac{k}{x} \), we have:

\[ P(S \cap X=x) = \frac{k}{x} \times P(X=x) = V \]

This means \( P(X=x) = \frac{Vx}{k} \), so \( P(X=x) \) is directly proportional to \( x \).

โœ Working:

For \( x=50 \), \( P(X=50) = b \). So \( b \times \frac{k}{50} = V \).

For \( x=80 \), \( P(X=80) = c \). So \( c \times \frac{k}{80} = V \).

Since both equal \( V \):

\[ \frac{bk}{50} = \frac{ck}{80} \]

Cancel \( k \) (assuming \( k \neq 0 \)):

\[ \frac{b}{50} = \frac{c}{80} \] \[ c = \frac{80}{50}b = \frac{8}{5}b \]

โœ“ (M1, A1)

Step 2: Finding a, b, c, d

๐Ÿ’ก Strategy:

Since \( P(X=x) \) is proportional to \( x \), we can express all probabilities in terms of one variable (e.g., \( a \)) or use the relationship derived.

From step 1, \( P(X=x) = \text{const} \times x \).

So \( \frac{a}{20} = \frac{b}{50} = \frac{c}{80} = \frac{d}{100} \).

โœ Working:

Express everything in terms of \( a \):

\[ b = \frac{50}{20}a = 2.5a \] \[ c = \frac{80}{20}a = 4a \] \[ d = \frac{100}{20}a = 5a \]

The sum of probabilities is 1:

\[ a + b + c + d = 1 \] \[ a + 2.5a + 4a + 5a = 1 \] \[ 12.5a = 1 \] \[ a = \frac{1}{12.5} = \frac{2}{25} = 0.08 \]

โœ“ (M1, A1)

Now find the others:

\[ b = 2.5 \times 0.08 = 0.20 \] \[ c = 4 \times 0.08 = 0.32 \] \[ d = 5 \times 0.08 = 0.40 \]

โœ“ (A1)

Step 3: Evaluating the Model

๐Ÿ’ก Strategy:

Check if the model holds for the new data point. For Nav, \( P(S|X=100) = 0.3 \).

Using Tisam’s model \( P(S|X=x) = \frac{k}{x} \):

\[ \frac{k}{100} = 0.3 \implies k = 30 \]

Now check what this implies for smaller distances, e.g., \( x = 20 \).

โœ Working:

If \( k = 30 \), then for \( x = 20 \):

\[ P(S|X=20) = \frac{30}{20} = 1.5 \]

A probability cannot be greater than 1. Therefore, the model is not suitable.

โœ“ (B1)

Final Answers:

(a) Shown.

(b) \( a=0.08, b=0.20, c=0.32, d=0.40 \).

(c) Model gives probability > 1 for small distances (e.g., \( x=20 \)).

โ†‘ Back to Top

Question 6 (14 marks)

A medical researcher is studying the number of hours, \( T \), a patient stays in hospital following a particular operation.

The histogram below summarises the results for a random sample of 90 patients.

0 5 10 15 20 25 30 35 40 Time in hours 0 1 2 3 4 5 Frequency density

(a) Use the histogram to estimate \( P(10 < T < 30) \).

(2)

For these 90 patients the time spent in hospital following the operation had:

  • a mean of 14.9 hours
  • a standard deviation of 9.3 hours

Tomas suggests that \( T \) can be modelled by \( N(14.9, 9.3^2) \).

(b) With reference to the histogram, state, giving a reason, whether or not Tomasโ€™ model could be suitable.

(1)

Xiang suggests that the frequency polygon based on this histogram could be modelled by a curve with equation

\[ y = kxe^{-x}, \quad 0 \leq x \leq 4 \]

where \( x \) is measured in tens of hours and \( k \) is a constant.

(c) Use algebraic integration to show that

\[ \int_0^n xe^{-x} dx = 1 – (n+1)e^{-n} \]

(4)

(d) Show that, for Xiangโ€™s model, \( k = 99 \) to the nearest integer.

(3)

(e) Estimate \( P(10 < T < 30) \) using

(i) Tomasโ€™ model,

(ii) Xiangโ€™s curve.

(3)

(f) State one limitation of Xiangโ€™s model.

(1)

Worked Solution

Step 1: Reading Histogram Areas

๐Ÿ’ก Strategy:

In a histogram, Area is proportional to Frequency. We assume Total Area represents the 90 patients.

We need the area for \( 10 < T < 30 \).

Reading from the graph:

  • 10-15: Width 5, Height 4.2. Area = \( 5 \times 4.2 = 21 \)
  • 15-20: Width 5, Height 4.0. Area = \( 5 \times 4.0 = 20 \)
  • 20-30: This is half the 20-40 bar. Width 10, Height 1.0. Area = \( 10 \times 1.0 = 10 \)

โœ Working:

\[ \text{Frequency} = 21 + 20 + 10 = 51 \] \[ \text{Total Frequency} = 90 \] \[ P(10 < T < 30) \approx \frac{51}{90} \approx 0.567 \]

(Accept answers in range 0.53 – 0.54 depending on exact reading of graph heights)

โœ“ (M1, A1)

Step 2: Assessing Normal Model

๐Ÿ’ก Understanding:

The Normal distribution is symmetric and bell-shaped.

โœ Answer:

The histogram is positively skewed (not symmetric), so a Normal model is not suitable.

โœ“ (B1)

Step 3: Integration by Parts

๐Ÿ’ก Strategy:

We need to integrate \( \int xe^{-x} dx \). Use Integration by Parts: \( \int u dv = uv – \int v du \).

Let \( u = x \Rightarrow du = dx \)

Let \( dv = e^{-x} dx \Rightarrow v = -e^{-x} \)

โœ Working:

\[ \int xe^{-x} dx = -xe^{-x} – \int (-e^{-x}) dx \] \[ = -xe^{-x} + \int e^{-x} dx \] \[ = -xe^{-x} – e^{-x} \] \[ = -e^{-x}(x+1) \]

Now apply limits \( 0 \) to \( n \):

\[ \left[ -e^{-x}(x+1) \right]_0^n \] \[ = (-e^{-n}(n+1)) – (-e^{-0}(0+1)) \] \[ = -(n+1)e^{-n} – (-1 \times 1) \] \[ = 1 – (n+1)e^{-n} \]

โœ“ (M1, A1, A1)

Step 4: Finding Constant k

๐Ÿ’ก Strategy:

The area under the probability density curve multiplied by total frequency (or scaling factor) represents the population.

The variable \( x \) is in “tens of hours”, so range \( 0 \leq x \leq 4 \) covers 0 to 40 hours.

The total area corresponds to 90 patients.

We integrate from 0 to 4 (since max time is 40 hours, \( x=4 \)).

โœ Working:

\[ \text{Total Area} = \int_0^4 k x e^{-x} dx = 90 \] \[ k \int_0^4 x e^{-x} dx = 90 \]

Using part (c) with \( n=4 \):

\[ k \left( 1 – (4+1)e^{-4} \right) = 90 \] \[ k (1 – 5e^{-4}) = 90 \] \[ k (1 – 0.09157…) = 90 \] \[ k (0.9084…) = 90 \] \[ k \approx 99.07 \]

To the nearest integer, \( k = 99 \).

โœ“ (M1, A1)

Step 5: Estimating Probabilities

(i) Tomas’ Model (Normal):

\( T \sim N(14.9, 9.3^2) \). Find \( P(10 < T < 30) \).

Using calculator:

\[ P(10 < T < 30) \approx 0.649 \]

โœ“ (B1)

(ii) Xiang’s Model:

Variable \( x \) is in tens of hours. So \( T=10 \) means \( x=1 \), and \( T=30 \) means \( x=3 \).

We need to integrate from 1 to 3:

\[ \text{Count} = k \int_1^3 x e^{-x} dx \] \[ = 99 \left( [1-(3+1)e^{-3}] – [1-(1+1)e^{-1}] \right) \] \[ = 99 \left( (1 – 4e^{-3}) – (1 – 2e^{-1}) \right) \] \[ = 99 \left( 2e^{-1} – 4e^{-3} \right) \] \[ = 99 \left( 0.7357… – 0.1991… \right) \] \[ \approx 53.1 \]

Probability \( = \frac{53.1}{90} \approx 0.590 \).

โœ“ (M1, A1)

Step 6: Limitations

โœ Answer:

The model is restricted to \( x \leq 4 \) (40 hours), but it is possible for patients to stay longer than 40 hours.

โœ“ (B1)

Final Answers:

(a) \( 0.53 – 0.57 \)

(b) Not suitable (skewed)

(c) Proof shown

(d) \( k=99 \)

(e)(i) 0.649

(e)(ii) 0.590

(f) Max time limit / patients stay longer than 40h

โ†‘ Back to Top