Skip to main content

Submitting your cohort

Details of account and folder requirements for the cohort submitter, file size limits and minimisation information. 

Requirements for submitting your cohort file

The following information provides guidance on the requirements of the cohort file(s). These must be adhered to, as this ensures the cohort can progress successfully through the cohort validation process.

  • include as many identifiers as possible (as agreed in the DSA), one of which must be date of birth. You may also be required to provide a minimum and maximum date
  • PERSON_MIN_DATE specifies the earliest date from which you would like data about the participant
  • PERSON_MAX_DATE specifies the latest date up to which you would like data about the participant
  • there is a maximum cohort size of 1 million participants or rows in the cohort file. A cohort file will be rejected if more than 1 million rows are submitted. If your cohort is over 1 million participants, split your cohort and upload multiple submissions
  • to upload your cohort, you must access your Cohort Submitter SEFT Account. This account contains your initials at the end of your User ID - such as NIC-12345-ABCDE_XX. Please check you are accessing the correct NIC number if you have multiple agreements. Our system will only accept files dropped into this location.
Mandatory fields
  • file must contain a ‘UNIQUE REFERENCE’ for each row of data (this is also referred to as ‘Study ID’)
  • file must contain information within the ‘STATUS’ column which confirms whether the participant should be added to or deleted from the cohort.  For the first submission, the file must contain ‘add’ in the status column for every row. You must not submit the same participant with a delete and add status in the same file
  • file must contain ‘DATE OF BIRTH’ for each row of data. Check the table below for formats which are accepted

Date minimisation required

If your DSA requires your extracts to be minimised by date, each participant needs at least one date filled in to be included in the extract. You can either:

  • fill in just PERSON_MIN_DATE
  • fill in both PERSON_MIN_DATE and PERSON_MAX_DATE – note that the date entered for PERSON|_MIN_DATE cannot be later than PERSON_MAX_DATE

Date minimisation not required

In the case that you do not require date minimisation you should not include a PERSON_MIN_DATE or PERSON_MAX_DATE for any participant in the cohort submission file.


Formatting requirements for the cohort submission file

If mandatory data items are not provided, then the file will be rejected.

Column  Field heading Mandatory or optional  Description Format 
A UNIQUE_REFERENCE Mandatory  This is the Unique Reference (Study ID) that you use to identify a particular record in your extracts. You must supply a value for each row of data. The file must not contain the same unique reference more than once.

This value must be unique within this field. Any combination of letters or numbers is allowed but must not include identifiable data.

 

100-character limit.

 

Special characters are not allowed with exception of a hyphen ( - )
B NHS_NO Optional NHS Number 10 numeric digits
C FAMILY_NAME Optional Surname, or family name Text – 40 character limit
D GIVEN_NAME Optional Forename, or given name Text – 40 character limit
E OTHER_GIVEN_NAME Optional Other given names or middle names Text – 100 character limit
F Gender Optional  Gender of the participant

Use one of the following values:

1 for Male

2 for Female

0 for Unknown

9 for Not Specified
G DATE_OF_BIRTH Mandatory Participants’ date of birth. This is required in every minimum set of details used in each of the tracing or matching steps and must be provided

YYYYMMDD

or

YYYY/MM/DD

or

YYYY.MM.DD
H POSTCODE Optional Standard UK postcode of participants’ address

Alpha numeric

With or without a space in between

8 character limit
I ADDRESS_LINE1 Optional First line of participants’ address Text
J ADDRESS_LINE2 Optional Second line of participants’ address Text
K ADDRESS_LINE3 Optional Third line of participants’ address Text
L ADDRESS_LINE4 Optional Fourth line of participants’ address Text
M ADDRESS_LINE5 Optional Fifth line of participants’ address Text
N STATUS Mandatory A status of ADD or DELETE must be included for each row of data

Use one of the following text values:

ADD

or

DELETE
O PERSON_MIN_DATE Optional The earliest date from which you require data about the participant

YYYYMMDD

or

YYYY/MM/DD

or

YYYY.MM.DD
P PERSON_MAX_DATE Optional The latest date up to which you require data about the participant

YYYYMMDD

or

YYYY/MM/DD

or

YYYY.MM.DD

 


Ensuring your submission receives the best match rate

The best quality tracing is when the NHS number and date of birth are provided, shown by option 1 in the table below. 

Options NHS Number Date of birth Family name Given name  Gender Postcode
Option 1         
Option 2      
Option 3       
Option 4    
Option 5    

Other combinations of identifiers will also generate a trace but the quality will depend on the data. Too little identifier information could result in your file submission being accepted, but your cohort may not be traced.


Last edited: 26 January 2026 9:52 am