Publication, Part of Rare Condition Registration Statistics
Rare Condition Registration Statistics updated to 2022
What is prevalence?
Prevalence refers to the total number of people in a specific population who have a particular condition at a given time – like a snapshot to see how widespread the condition is within that group.
Limited duration prevalence
This publication shows observed, limited-duration prevalence estimates. This means that rather than counting the number of people who have ever been diagnosed with a condition, we count only those within a specific window of time (for example, 20 years). The observation period chosen is the maximum length that it is possible to reliably count individuals over time in the available data. This varies by condition due to differing data sources and methods of collection as indicated in the data downloads. The prevalence estimate period reported for each condition is listed in the data downloads.
For rare (non-cancer, non-congenital anomaly) conditions, we report maximum duration prevalence (the total number of people ever diagnosed who are alive at a given point in time (the index date)), to the extent that we are aware of the cases, and therefore 1900-01-01 is chosen as a cut off date to ensure that no cases will be excluded.
Index date
An index date is a specific point in time used as a reference to measure how many people have a condition. For example, if we want to know how many people were alive after a diagnosis of cancer of the gallbladder on 31 December 2021, that date would be the index date. It helps users understand the prevalence estimate of a condition at that exact moment.
The index date chosen is the date of the latest available data for that condition – this varies according to the data sources used. For further details, see the data downloads.
Calculation of prevalence estimates rates
Numerator definition
Patients were included in the cohort if they met the following criteria:
- diagnosed with the condition of interest or with the condition of interest coded while receiving care in the observation period or were born alive in the period and diagnosed with a congenital anomaly (in which case date of birth is used alternatively to the date the diagnosis was acquired.
- could be traced on the NHS Spine - because rare disease records are currently not acquired to a consistent specification, it is necessary to carry out patient tracing on NHS Spine to validate records.
- resident in England at time of diagnosis of rare cancer or at time of tracing for other rare disease or at birth for congenital anomalies.
- alive at last day of follow up (index date).
- the patient is diagnosed with a finalised cancer registration or with a confirmed or probable diagnosis of a (non-cancer) rare disease or diagnosed with a confirmed or probable congenital anomaly (not necessarily prior to the index date).
Denominator definition
Prevalence estimates rates are calculated overall using the mid-year population estimates relevant to the index year. They are expressed per 1,000,000 population.
The rare disease and rare cancer rates use the all-age total population for England while the congenital anomaly rates use only those patients whose age is less than the observation period length (for example, where the index date is 31 December 2021, this would be children up to the age of 4 years). This is because national congenital anomaly registration has only been in place for babies born since 1 January 2018. The point prevalence estimate data presented represents all individuals born between 1 January 2018 and the index date, that were still alive on the index date.
Confidence intervals
A confidence interval is a range of values that is used to quantify the imprecision in the estimate of a particular value. Specifically, it quantifies the imprecision that results from random variation in the estimation of the value; it does not include imprecision resulting from systematic error (bias).
In public health many indicators, such as this, are based on what can be considered to be complete data sets and not samples. In these instances, the imprecision arises not as a result of sampling variation but of ‘natural’ variation. The indicator is considered to be the outcome of a stochastic process, i.e., one which can be influenced by the random occurrences that are inherent in the world around us. In such instances the value actually observed is only one of the set that could occur under the same circumstances. Generally, in public health, it is the underlying circumstances or process that is of interest and the actual value observed gives only an imprecise estimate of this ‘underlying risk’.
For further information, see APHO Technical Briefing 3 – Commonly used public health statistics and their confidence intervals.
In this output, CIs are based on a 95% confidence level. For rare cancers the phe_rates function is used: where the numerator >= 10 Byar's method is used, while for smaller numerators an exact method based on the Poisson distribution is used. For rare congenital anomalies and other rare diseases the methodology used is the Poisson distribution (Begaud et al, 2005). Different calculations have been used to align with pre-existing related statistical publications, including those in the ‘Related Statistics’ section above. Future releases will consider aligning the approach to calculating confidence intervals.
Last edited: 16 June 2025 11:56 am