Submitting your cohort
Details of account and folder requirements for the cohort submitter, file size limits and minimisation information.
Requirements for submitting your cohort file
The following information provides guidance on the requirements of the cohort file(s). These must be adhered to, as this ensures the cohort can progress successfully through the cohort validation process.
- include as many identifiers as possible (as agreed in the DSA), one of which must be date of birth. You may also be required to provide a minimum and maximum date
- PERSON_MIN_DATE specifies the earliest date from which you would like data about the participant
- PERSON_MAX_DATE specifies the latest date up to which you would like data about the participant
- there is a maximum cohort size of 1 million participants or rows in the cohort file. A cohort file will be rejected if more than 1 million rows are submitted. If your cohort is over 1 million participants, split your cohort and upload multiple submissions
- to upload your cohort, you must access your Cohort Submitter SEFT Account. This account contains your initials at the end of your User ID - such as NIC-12345-ABCDE_XX. Please check you are accessing the correct NIC number if you have multiple agreements. Our system will only accept files dropped into this location.
- file must contain a ‘UNIQUE REFERENCE’ for each row of data (this is also referred to as ‘Study ID’)
- file must contain information within the ‘STATUS’ column which confirms whether the participant should be added to or deleted from the cohort. For the first submission, the file must contain ‘add’ in the status column for every row. You must not submit the same participant with a delete and add status in the same file
- file must contain ‘DATE OF BIRTH’ for each row of data. Check the table below for formats which are accepted
Date minimisation required
If your DSA requires your extracts to be minimised by date, each participant needs at least one date filled in to be included in the extract. You can either:
- fill in just PERSON_MIN_DATE
- fill in both PERSON_MIN_DATE and PERSON_MAX_DATE – note that the date entered for PERSON|_MIN_DATE cannot be later than PERSON_MAX_DATE
Date minimisation not required
In the case that you do not require date minimisation you should not include a PERSON_MIN_DATE or PERSON_MAX_DATE for any participant in the cohort submission file.
Formatting requirements for the cohort submission file
If mandatory data items are not provided, then the file will be rejected.
| Column | Field heading | Mandatory or optional | Description | Format |
|---|---|---|---|---|
| A | UNIQUE_REFERENCE | Mandatory | This is the Unique Reference (Study ID) that you use to identify a particular record in your extracts. You must supply a value for each row of data. The file must not contain the same unique reference more than once. |
This value must be unique within this field. Any combination of letters or numbers is allowed but must not include identifiable data.
100-character limit. Special characters are not allowed with exception of a hyphen ( - ) |
| B | NHS_NO | Optional | NHS Number | 10 numeric digits |
| C | FAMILY_NAME | Optional | Surname, or family name | Text – 40 character limit |
| D | GIVEN_NAME | Optional | Forename, or given name | Text – 40 character limit |
| E | OTHER_GIVEN_NAME | Optional | Other given names or middle names | Text – 100 character limit |
| F | Gender | Optional | Gender of the participant |
Use one of the following values: 1 for Male 2 for Female 0 for Unknown 9 for Not Specified |
| G | DATE_OF_BIRTH | Mandatory | Participants’ date of birth. This is required in every minimum set of details used in each of the tracing or matching steps and must be provided |
YYYYMMDD or YYYY/MM/DD or YYYY.MM.DD |
| H | POSTCODE | Optional | Standard UK postcode of participants’ address |
Alpha numeric With or without a space in between 8 character limit |
| I | ADDRESS_LINE1 | Optional | First line of participants’ address | Text |
| J | ADDRESS_LINE2 | Optional | Second line of participants’ address | Text |
| K | ADDRESS_LINE3 | Optional | Third line of participants’ address | Text |
| L | ADDRESS_LINE4 | Optional | Fourth line of participants’ address | Text |
| M | ADDRESS_LINE5 | Optional | Fifth line of participants’ address | Text |
| N | STATUS | Mandatory | A status of ADD or DELETE must be included for each row of data |
Use one of the following text values: ADD or DELETE |
| O | PERSON_MIN_DATE | Optional | The earliest date from which you require data about the participant |
YYYYMMDD or YYYY/MM/DD or YYYY.MM.DD |
| P | PERSON_MAX_DATE | Optional | The latest date up to which you require data about the participant |
YYYYMMDD or YYYY/MM/DD or YYYY.MM.DD |
Ensuring your submission receives the best match rate
The best quality tracing is when the NHS number and date of birth are provided, shown by option 1 in the table below.
| Options | NHS Number | Date of birth | Family name | Given name | Gender | Postcode |
|---|---|---|---|---|---|---|
| Option 1 | ✔ | ✔ | ||||
| Option 2 | ✔ | ✔ | ✔ | |||
| Option 3 | ✔ | ✔ | ✔ | |||
| Option 4 | ✔ | ✔ | ✔ | ✔ | ||
| Option 5 | ✔ | ✔ | ✔ | ✔ |
Other combinations of identifiers will also generate a trace but the quality will depend on the data. Too little identifier information could result in your file submission being accepted, but your cohort may not be traced.
Related sections in Guidance for submitting a cohort submission file
- Introduction
- Preparing your submission
- Pre-submission checklist
- Submitting your cohort
- Managing your cohort
- Outputs and results
Last edited: 26 January 2026 9:52 am