Publication, Part of Health Survey England Additional Analyses
Health Survey England Additional Analyses, Ethnicity and Health, 2011-2019 Experimental statistics
Experimental statistics, Official statistics in development
Technical appendix
Sampling and data collection
Sample design
The sample for Health Survey for England (HSE) is designed to be representative of the population living in private households in England. It uses a multi-stage stratified probability design. Those living in institutions such as student hostels and care homes are outside the scope of the survey.
The sampling frame is the small user Postcode Address File (PAF). The very small proportion of households living at addresses not on PAF (estimated to be less than 1%) is not covered. In the years between 2011 and 2019, the sample size varied between 9,000 and 10,000 addresses selected at random in between 500 and 600 postcode sectors.
All HSE surveys cover the adult population aged 16 and over living in private households in England (up to a maximum of ten adults per household). From 1995, the survey has included children aged 2 to 15, and from 2001, infants aged under 2 have also been included.
Sampled addresses for each year of the HSE are issued across the year in twelve monthly batches, with fieldwork completed by March of the following year.
Further information about the HSE sample can be found in the Health Survey for England 2019: Methods report
Data collection
Data collection uses both interviewing and self-completion methods. The household interview includes questions on household size, composition and relationships; type of dwelling, tenure, and the number of bedrooms; car ownership; smoking within the home; the economic status and occupation of the household reference person; and household income.
Adults are asked to participate in a face-to-face interview which includes a self-completion questionnaire. The interview collects core information each year, including general health, social care (for adults aged 65 and over) drinking alcohol and smoking. Additional questions are included each year that focus on other topics.
The content of the self-completion booklets varies with age. Young adults aged 16 to 17 are asked about smoking and drinking behaviour and interviewers also have the option of using this for those aged 18 to 24 if they feel that it would be difficult for anyone in this age group to give honest answers to the questions face-to-face with other household members present.
Where possible, interviewers also measure the weight of all participants and the height of everyone aged 2 and over.
A follow-up nurse visit is also carried out. Until 2017, all participating households were offered a nurse visit; from 2018, 89% of households were randomly selected to take part in this stage of the interview. In these households, nurse visits were offered to all participants who were interviewed.
At the nurse visit, questions are asked about prescribed medicines, and adults are asked about folic acid. Nurses take waist and hip measurements for those aged 11 and over and measure the blood pressure of those aged 5 and over.
Adults are also asked to provide blood samples for the analysis of total cholesterol and HDL cholesterol and glycated haemoglobin (a marker of untreated diabetes). In some years, samples of saliva are taken from adults and children aged 4 and over for the analysis of cotinine (a derivative of nicotine that shows recent exposure to tobacco or tobacco smoke). Written consent is obtained for these samples.
Participants aged 16 and over give informed consent for all stages of the interview and nurse visit process. For some elements of the survey, verbal consent is asked for: taking part in the survey at all, answering modules of questions (and any individual question), completing the self-completion booklet, and measurements such as height, weight, blood pressure and waist and hip circumference. Verbal consent is not recorded; it is assumed that those who take part in the survey, and answer individual questions or provide physical measurements have consented to do so.
Written consent is obtained for the following during the interview or nurse visit:
- Taking biological measurements (blood samples)
- Passing on information to others, for instance sending biological sample results to the participant’s GP
- Storing blood samples for future use
- Using personal details for matching to administrative data.
Further information about HSE data collection can be found in the Health Survey for England 2019: Methods report
The questionnaires and data collection protocols can be found here: HSE2019 Survey documentation
Questions on fruit and vegetable consumption and physical activity, which were not asked in 2019, can be found here: HSE 2018 Survey Documentation
Survey response
Overall survey response
The response to the Health Survey for England between 2011 and 2019 varied between 59% and 66% of eligible households.
|
|
2011 |
2012 |
2013 |
2014 |
2015 |
2016 |
2017 |
2018 |
2019 |
|
% co-operating households |
66% |
64% |
64% |
62% |
60% |
59% |
60% |
59% |
60% |
|
Number of co-operating households |
5,338 |
5,219 |
5,416 |
5,051 |
5,111 |
5,096 |
5,137 |
5,129 |
5,226 |
All adults aged 16 and over within productive households were eligible to take part. Within these households, between 84% and 87% of adults took part each year.
|
|
2011 |
2012 |
2013 |
2014 |
2015 |
2016 |
2017 |
2018 |
2019 |
|
% participating adults |
87% |
85% |
87% |
85% |
85% |
85% |
84% |
86% |
84% |
|
Number of participating adults |
8,610 |
8,291 |
8,795 |
8,077 |
8,034 |
8,011 |
7,997 |
8,178 |
8,205 |
Response was generally higher among women and older adults. The HSE does not collect information on the ethnicity of non-participating households and so it was not possible to calculate response rates by ethnicity.
For more detail on survey response and how it is calculated, see the Methods reports for each survey year available from the Health Survey for England series page.
Response to survey elements by ethnicity
Response to the HSE varies across the different survey elements; among participants in the interview, not everyone has their height and weight measured, or is interviewed by the nurse, or gives a usable blood sample.
Table A3 shows participation to each stage of the survey within each ethnic group, based on those who were interviewed (unweighted data).
|
Ethnic group |
Interview |
|
Valid height and weight measure |
|
Nurse interview |
|
Valid blood sample |
|
|
White British |
60,669 | 100% | 50,357 | 83% | 40,438 | 67% | 30,642 | 51% |
|
White Irish |
592 | 100% | 501 | 85% | 397 | 67% | 314 | 53% |
|
Other white |
4,015 | 100% | 3,335 | 83% | 2,474 | 62% | 1,870 | 47% |
|
Mixed/multiple |
1,043 | 100% | 873 | 84% | 602 | 58% | 413 | 40% |
|
Indian |
1,933 | 100% | 1,631 | 84% | 1,122 | 58% | 803 | 42% |
|
Pakistani |
1,289 | 100% | 1,042 | 81% | 646 | 50% | 418 | 32% |
|
Bangladeshi |
518 | 100% | 419 | 81% | 259 | 50% | 153 | 30% |
|
Chinese |
358 | 100% | 304 | 85% | 206 | 58% | 148 | 41% |
|
Black African |
1,135 | 100% | 926 | 82% | 656 | 58% | 392 | 35% |
|
Black Caribbean |
708 | 100% | 552 | 78% | 425 | 60% | 270 | 38% |
|
Any other background |
1,687 | 100% | 1,429 | 85% | 994 | 59% | 698 | 41% |
|
All groups |
73,947 | 100% | 61,369 | 83% | 48,219 | 65% | 36,121 | 49% |
Response to all stages of the interview was generally highest among white British and white Irish participants. Pakistani and black Caribbean participants were less likely than other groups to have had their height and weight measured.
Response to the nurse interview was lowest among Pakistani and Bangladeshi adults, and consequently these groups were least likely to provide a valid blood sample, along with those from black African backgrounds.
Analysis and reporting methodology
Weighting the data
All data in this report are weighted to ensure that the estimates are representative of the population. Both weighted and unweighted bases are given in each table. The weighted numbers show the relative size of each group in the population, so that data from different columns can be combined in their correct proportions. The unweighted bases show the actual number of participants in each group.
Weighting is applied to HSE data to correct for probabilities of selection and to minimise bias from non-response. Each year’s data is weighted individually and these weights have been used for this report.
Selection weights have been applied to HSE samples to correct for the probability of selection in two situations:
- If there were multiple dwelling units or households at a selected address, in which case only one was selected at random
- If there were more than two children aged between 0 and 12 and/or between 13 and 15 at the selected address, in which case two in each age band were selected at random.
From 2003 a non-response adjustment was also incorporated into the weighting strategy. Both selection and non-response weights are applied to HSE data, and an interview weight is calculated. To account for sample attrition, further separate weights are calculated for data from different stages of the survey:
- Interview
- Nurse visit
- Blood sample (adults)
Age standardisation
Data have been age-standardised throughout this report to allow comparisons between groups after adjusting for the effects of any differences in their age distributions. This is because the prevalence of many health conditions and health-related behaviours vary with age, and so observed differences between groups may be due to their different age profiles.
All age standardisation was undertaken separately for men and women. The standard population to which the age distribution was adjusted was the mid-year 2019 population estimates for England. Age standardisation was carried out using the age groups 16-34, 35-54, 55 and over.
Statistical significance
Significance tests have been used in this report to determine whether differences between prevalence estimates are genuine differences (statistically significant) or the result of random natural variation.
The significance testing methodology used in HSE reports, including this one, tests the relationship between variables in a cross tabulation, usually an outcome variable nested within sex, cross-tabulated with an explanatory variable, in this case ethnic group. The test is for the main effects only using a Wald test. The Wald test is statistical test used to calculate the significance of parameters in a statistical model. For example the test might examine whether there is a statistically significant relationship between smoking prevalence and ethnicity (after controlling for sex) and between smoking prevalence and sex (after controlling for ethnicity).
It is worth noting that the test does not establish whether there is a statistically significant difference between any particular pair of subgroups (e.g. the highest and lowest subgroups). Rather it seeks to establish whether the variation in the outcome between groups that is observed could have happened by chance or whether it is likely to reflect some 'real' differences in the population.
P-values for comparisons are shown as footnotes. A p-value is the probability of the observed result occurring due to chance alone. A p-value of less than 5% is conventionally taken to indicate a statistically significant result (p<0.05). It should be noted that the p-value is dependent on the sample size, so that with large samples differences or associations which are very small may still be statistically significant.
Using this method of statistical testing, differences which are significant at the 5% level indicate that there is sufficient evidence in the data to suggest that the differences in the sample reflect a true difference in the population.
A second test of significance looks at the interaction between sex and the variable under consideration. If the interaction is statistically significant (p<0.05) this indicates that there is likely to be an underlying difference in the pattern of results for men and women, and this will normally be commented on in the report text.
Accuracy and reliability of survey estimates and confidence intervals
HSE, in common with other surveys, collects information from a sample of the population. The sample is designed to represent the whole population as accurately as possible within practical constraints, such as time and cost. Consequently, statistics based on the survey are estimates, rather than precise figures, and are subject to a margin of error, also known as a 95% confidence interval. For example the survey estimate might be 24% with a 95% confidence interval of 22% to 26%. A different sample might have given a different estimate, but we expect that the true value of the statistic in the population would be within the range given by the 95% confidence interval in 95 cases out of 100.
Where the text states that estimates of prevalence vary between groups, this means that there is variation by ethnicity overall, and that this is statistically significant at the 95% confidence level. In other words, there is a 95% certainty that the variation across all groups is real, and not just within the margins of sampling error. This does not describe differences between any two individual groups. Note, that statistical significance does not imply substantive importance; differences that are statistically significant are not necessarily meaningful or relevant.
Confidence intervals are affected by the size of the sample on which the estimate is based. Generally, the larger the sample, the smaller the confidence interval, and hence the more precise the estimate.
Where individual groups are identified within the text as having high or low levels of prevalence, this reflects the overall pattern of difference, but the estimate for a single group may not be significantly different from estimates for other groups if the confidence intervals overlap.
Design effects and true standard errors
The HSE uses a complex survey and weighting design. One of the effects of this is that standard errors and confidence intervals for survey estimates are generally larger than those that would be derived from an unweighted simple random sample of the same size.
The ratio of the standard error of the complex sample to that of a simple random sample of the same size is known as the design factor. It is the factor by which the standard error of an estimate from a simple random sample has to be multiplied to give the true standard error of the complex design.
For all comparisons discussed in the text, true standard errors and design factors for age-standardised estimates are shown in Excel tables accompanying the report.
Rounding of estimates
Estimates presented in the text are rounded to the nearest whole number. Where categories are combined the sum of two estimates may sometimes appear to be greater or less than expected. This reflects the effect of rounding; for example, estimates of 10.6% and 12.7% would round respectively to 11% and 13%, but the sum (23.3%) will round to 23% rather than 24%.
The charts are based on unrounded estimates. Consequently values given in the text may appear different in the corresponding chart. For example, an estimate of 10% in the text may represent a value between 9.5% and 10.4%, and it is the latter that would be reflected in the chart data points.
Last edited: 30 June 2022 9:33 am