Methodology

LEO Graduate outcomes provider level data

Published

Introduction

Background to the Longitudinal Educational Outcomes (LEO) dataset  

The Small Business, Employment and Enterprise Act 2015 enabled government, for the first time, to link higher education and tax data together to chart the transition of graduates from higher education into the workplace (for more information on the legal powers governing the dataset please see section 78 of the Small Business, Enterprise and Employment Act 2015 and sections 87-91 of the Education and Skills Act 2008).  

One of the advantages of linking data from existing administrative sources is that it provides a unique insight into the destinations of graduates without imposing any additional data collection burdens on universities, employers or members of the public. Compared to existing sources of graduate outcomes data, it is also based on a considerably larger sample, does not rely on survey methodology, and can track outcomes across time to a greater extent than was previously possible.  

The LEO dataset links information about students, including; 

  • personal characteristics such as sex, ethnic group and age
  • education, including schools, colleges and higher education institution attended, courses taken, and qualifications achieved
  • employment and income
  • benefits claimed

It is created by combining data from the following sources: 

  • the National Pupil Database (NPD), held by the Department for Education (DfE)
  • Higher Education Statistics Agency (HESA) data on students at UK publicly funded higher education institutions and some alternative providers, held by DfE
  • Individualised Learner Record data (ILR) on students at further education institutions, held by DfE
  • employment data from the Real Time Information System (RTI). RTI contains information formerly collected on the P45 and P14 forms, held by His Majesty’s Revenue and Customs (HMRC)
  • data from the Self-Assessment tax return, held by HMRC
  • the National Benefit Database, Labour Market System and Juvos data, held by the Department for Work and Pensions (DWP)

By combining these sources, we can look at the progress of higher education leavers into the labour market.   

The privacy notice explaining how personal data in this project is shared and used can be found at Longitudinal education outcomes study: how we use and share data - GOV.UK (www.gov.uk (opens in a new tab)). 

Data quality

Employment and earnings data

The employment data covers those with P45 and P14 records submitted through the Pay As You Earn (PAYE) system. These figures have been derived from administrative IT systems that, as with any large-scale recording system, are subject to possible errors with data entry and processing. While some data cleaning was necessary, the resulting data looks to provide a good reflection of an individual’s employment and earnings for the year.

For the purposes of collecting taxes only the tax year of employment is needed; accurate start and end dates within the tax year are not required. For this reason, issues encountered with the employment data included records with duplicate dates and records with dates which were invalid for our intended use (for example, where an employment start date occurred after the end date).

Additionally, a number of returns had missing start or end dates due, for example, to the employer not forwarding a timely P45. The default start dates recorded in the dataset are either 6 April (the first day of the tax year) or, where an end date is known, the day before that end date. Similarly, for records where the employment is known to have come to an end within a tax year but the end date is not known, the record is given a default 5 April end date, the last day of the tax year.

Individuals can also have overlapping spells of employment. Before carrying out analysis, the P45 and P14 records for each individual were cleaned and then merged into a single record to give a longitudinal picture of their employment and a total sum of their earnings in each tax year.

Before cleaning, the dataset contained just under 73 million P45 records. Of these, just over 6.5 million invalid records were removed (the majority were duplicate records). Of the remaining records, around 20% had an uncertain start date and around 20% had an uncertain end date. For each uncertain date, we used dates from other employment or benefits records for that individual to create a merged employment spell with a known start and end date.

Example 1: Two employment spells

Spell A                                          Start |---------| End 

Spell B              Unknown start |-----------------|--------------| End

Merged result                           Start |---------------------------| End

In example 1, the start date of spell B is uncertain with its possible range shown in bold. In this instance we can merge the two records resulting in an employment spell with the start date of spell A and an end date from spell B.

Any remaining uncertain dates were imputed through random sampling of gap lengths from a frequency distribution that was constructed from gaps with a known length.

DWP/HMRC Coverage:

Beginning in April 2013, the P45 reporting system was phased out in favour of the Real Time Information (RTI) system, which requires employers to submit information to HMRC each time an employee is paid. This system is now fully deployed. RTI offers substantial improvements to the P45 system in terms of data coverage, since employers must now provide information on all their employees even if only one employee of the company is paid above the Lower Earnings Limit. The move to RTI means that data coverage is high for the 2014/15 to 2021/22 tax years used in this publication.  

We can not currently distinguish between part-time and full-time work in the LEO data. This is further discussed in “Methodology - Annualised earnings”.   

As well as employment data for those who pay tax through PAYE, the employment data additionally includes those who pay tax through self-assessment.  

Self-assessment forms are completed by a range of people including those who are self-employed, have received income from investments, savings or shares and by people who have complicated tax affairs. A list of people who are required to complete a self-assessment return can be found at www.gov.uk/self-assessment-tax-returns/who-must-send-a-tax-return (opens in a new tab). We have self-assessment earnings dataset from HMRC, which contains variables about:  

  • Earnings received through employment (PAYE)
  • Income from partnership enterprises
  • Income from sole-trader enterprises
  • Total earnings for the tax year from the self-assessment form.

We have used the income from partnership enterprises and income from sole-trader enterprises to identify graduates who are self-employed and their earnings from self-employment enterprises. We have summed earnings from partnership and sole enterprises, and where this sum is greater than £0, it is the value used as earnings from self-employment and graduates are classified as self-employed.  

The data received from DWP includes an overseas flag, which identifies individuals who are known to be living overseas. Details about when an individual must inform HMRC that they are leaving the UK and living abroad can be found at Tax if you leave the UK to live abroad - GOV.UK (www.gov.uk) (opens in a new tab). For this analysis, individuals who are known to be overseas are excluded as their earnings and outcomes data is likely to be incomplete.   

Graduate and provider coverage  

This publication provides information about first degree, UK domiciled graduates attending Higher Education Providers (Higher Education Institutes (HEIs), Further Education Colleges (FECs) and Alternative Providers (APs) in Great Britain.

First degree graduates are identified using the “XPQUAL01”  (see XPQUAL01_2.20.1 | HESA (opens in a new tab)) and “XQLEV501” (see XQLEV501_1.14.1 | HESA (opens in a new tab)) HESA variables. We filter for XPQUAL01 to be ‘1’, and XQLEV501 to be ‘3’ (first degree). Integrated Masters degrees are included.

More details about the qualifications that are in each XQLEV501 grouping can be found from HESA's “QUAL” variable, from which XQLEV501 is derived (Student 2020/21 - Qualification awarded | HESA (opens in a new tab))

Full cycle movement only covers young graduates (under 21 at the start of their course). This is because the full cycle movements of mature students and young graduates are not directly comparable. For young graduates, ‘home region’ is calculated using information about where graduates lived prior to study. Mature graduates are less likely to have lived in their home region before starting their course because they are more likely to have geographically relocated between leaving school and starting their course.  

NPD data coverage 

For prior attainment, LEO data is linked to the KS5 attainment datasets in the National Pupil Database (NPD).  More information on the NPD can be found at Find and explore data in the National Pupil Database - GOV.UK (education.gov.uk) (opens in a new tab). Not all graduates will have been matched to the NPD, either through an inability to match the HESA and NPD records or because they are not present in the NPD e.g. students who went to school in Scotland/Wales. In the underlying data, the number and proportion of graduates included in the prior attainment calculations is provided. 

Methodology

Time period  

The earliest time period for which employment and earnings data is reported is one year after graduation. This refers to the first full tax year after graduation (YAG). Hence, for the 2019/20 graduation cohort, the figures one year after graduation refer to employment and earnings outcomes in the 2021/22 tax year. This time period was used because the previous tax year overlaps with the graduation date, and so graduates are unlikely to have been engaged in economic activity for the whole tax year.  

Academic year 2018/19   |------------------|     

Tax year 2019/20                                    |------------------| 

Tax year 2020/21                                                            |------------------|   

In this publication, we include outcomes one, three and five years after graduation, focussing on the 2021/22 tax year but with some comparisons with tax years between 2015/16 and 2020/21. Thus, we look at employment and earnings outcomes during the 2021/22 tax year for graduates from the 2015/16, 2017/18 and 2019/20 academic years. For earlier tax years, we calculate years after graduation in the same way. For example, for outcomes during the 2015/16 tax year, we include graduates from the 2009/10, 2011/2012 and 2013/2014 academic years. We do not include outcomes ten years after graduation because prior attainment data is unavailable for cohorts who graduated in the 2009/10 academic year (ten years before the 2021/22 tax year), or before.

The table below shows this for all tax years and academic years. The cells represent years after graduation (YAG). Bold indicates it is a cohort available in this publication: 

Tax Year
2015/162016/172017/182018/192019/202020/212021/22
Academic year of graduation2009/105 YAG6 YAG7 YAG8 YAG9 YAG10 YAG11 YAG
2010/114 YAG5 YAG6 YAG7 YAG8 YAG9 YAG10 YAG
2011/123 YAG4 YAG5 YAG6 YAG7 YAG8 YAG9 YAG
2012/132 YAG3 YAG4 YAG5 YAG6 YAG7 YAG8 YAG
2013/141 YAG2 YAG3 YAG4 YAG5 YAG6 YAG7 YAG
2014/151 YAG2 YAG3 YAG4 YAG5 YAG6 YAG
2015/161 YAG2 YAG3 YAG4 YAG5YAG
2016/171 YAG2 YAG3 YAG4 YAG
2017/181 YAG2 YAG3 YAG
2018/191 YAG2 YAG
2019/201 YAG

 

Employment outcomes 

We refer to a graduate as matched if they have been successfully matched to the Department for Work and Pensions’ Customer Information System (CIS) or if they have been matched to a further study instance on the HESA Student Record. Graduates who have not been matched to CIS or a further study record are referred to as unmatched. These graduates were not found on DWP’s Customer Information System (CIS), either because they had never been issued with a National Insurance number or because the personal details provided from the HESA data did not fulfil the matching criteria. These graduates are excluded from calculations performed for UK domiciled populations

UK domiciled graduates who have been matched are then placed in one of five outcomes categories. These are: 

  • Activity not captured
  • No sustained destination
  • Sustained employment only
  • Sustained employment with or without further study
  • Sustained employment, further study or both.

Unmatched graduates are included in the denominator when calculating employment outcomes for non-UK domiciled graduates and are placed in a separate ‘unmatched’ outcome category. For these populations the match rates are much lower and non-UK graduates are much more likely to leave the UK after graduation. Including these graduates in the calculations means we get a better indication of the proportion of graduates who have stayed in the UK to work or study after graduation, making it easier to compare countries with very different match rates.  

More information on match rates is given in the Data matching and match rates section. If a graduate is unmatched on the CIS but has a further study record for the tax year in question, then they are counted as being in further study, and hence are not in the unmatched category. 

Activity not captured 

Graduates in this category have been successfully matched to CIS but do not have any employment, out-of-work benefits or further study records in the tax year of interest. Reasons for appearing in this category include moving out of the UK after graduation for either work or study, voluntarily leaving the labour force or death. 

No sustained destination 

Graduates who have an employment or out-of-work benefits record in the tax year in question but were not classified as being in ‘sustained employment’ and do not have a further study record. 

Sustained employment defined by P45 data 

The ‘sustained employment’ measure aims to count the proportion of graduates in sustained employment in the UK following the completion of their course. The definition of sustained employment is consistent with the definition used for 16-19 accountability and the outcome based success measures published for adult further education https://www.gov.uk/government/statistics/adult-further-education-outcome-based-success-measures (opens in a new tab). This definition looks mainly at employment activity in the six month October to March period of each tax year. A graduate needs to be in paid employment for at least one day in five out of six months between October and March of a given tax year to be classified as being in ‘sustained employment’ in the given tax year. If they are not employed in March, they must additionally have at least one day in employment in the April of the same calendar year to be counted as being in sustained employment. 

For instance, a graduate employed from 1st October 2021 to 5th January 2022 and then again from 30th March 2022 onwards would be classed as being in sustained employment in 2021/22 as although they are not employed in February 2022 they are employed in the other five months in the period from October 2021 to March 2022. However, a graduate employed from 1st October 2021 to 28th February 2022 but not employed in March 2022, would not be considered as being in sustained employment unless they had a day in employment April 2022. 

Sustained employment defined by self-assessment data 

This publication incorporates self-assessment data into measures of sustained employment. Self-assessment data captures the activity of individuals with income that is not taxed through PAYE, such as income from self-employment, savings and investments, property rental, and shares. A full list of income sources that must be declared through a self-assessment return can be found here: https://www.gov.uk/self-assessment-tax-returns/who-must-send-a-tax-return (opens in a new tab)

For the purposes of this publication, individuals are classed as being in sustained employment in the tax year if they meet our definition of sustained employment based on PAYE or have returned a self-assessment form stating that they have received income from self-employment and their earnings from a Partnership or Sole-Trader enterprise of more than £0 (profit from self-employment). These individuals may or may not have an additional PAYE record. Individuals who have received income through self-assessed means other than self-employment, such as through rental of property, and do not have a PAYE record, are not classed as being in employment (either sustained or unsustained). Those who have made a loss from self-employment are currently excluded from sustained employment as we are unable to distinguish between those who made a loss and those who submitted self-assessment returns for other reasons at this moment in time. 

Further study 

A graduate is defined as being in further study if they have a valid higher education study record at any UK HEI on the HESA Student Record or designated English Alternative Provider (AP) on the AP HESA Student Record that overlaps with the relevant tax year. Further study undertaken at further education colleges is not currently reflected in these figures but we will review this in future publications. The further study does not have to be at postgraduate level to be counted. The purpose of this category is to identify how students spent their time in the relevant tax year and as such cannot be used to calculate the proportion of graduates who go on to postgraduate study. We have not counted instances lasting 14 days or less, which is a change from publications up to the 2019/20 tax year. Additionally, students enrolled on further education courses, on some initial teacher training enhancement, booster and extension courses, whose study status is dormant or who were on sabbatical are excluded from this indicator in line with our previous methodology. 

Designated alternative providers were not required to return student level data to HESA prior to the 2014/15 academic year. From the 2014/15 academic year all alternative providers covered by HESA did submit student level data, and these are included in this publication where applicable. The University of Buckingham has historically returned HESA data every year and so is included in all cohorts. 

As a tax year overlaps with two academic years, some students would be coming to the end of their further study in the tax year in question and some would be starting their further study. For example, those who graduated in the 2015/16 academic year and went straight on to a one-year masters course would be counted as being in further study in the 2017/18 tax year (one year after graduation) as their course would finish in July 2017. If a graduate from 2015/16 waited a year before starting their one-year masters course then they would typically be counted as being in further study in the 2017/18 tax year (one year after graduation) if their course started in September 2017 for instance. 

Sustained employment only 

Graduates are considered to be in sustained employment only if they have a record of sustained employment (as defined either via the P45 or self assessment data) but no record of further study (as defined above).  

Sustained employment with or without further study 

Sustained employment with or without further study includes all graduates with a record of sustained employment (defined either via the P45 or self assessment data), regardless of whether they also have a record of further study (as defined above). 

Sustained employment, further study or both 

Sustained employment, further study or both includes all graduates with a record of sustained employment or further study. This category includes all graduates in the ‘sustained employment with or without further study’ category as well as those with a further study record only

It is important to note that our definition of sustained employment does not distinguish between the different types of work that graduates are engaged in and so cannot provide an indication of the proportion of graduates who are employed in graduate occupations. Furthermore, we cannot distinguish between full-time and part-time employment. 

The below table summarises the type of activity people may have to be unmatched or to fall into one of the five outcomes categories. 

LEO categoryFurther studySustained employmentAny employmentOut-of-work Benefits
Unmatched Unmatched to CIS Unmatched to CIS Unmatched to CIS 
Activity not captured  
No sustained destination 
Sustained employment only 
Sustained employment (with or without further study) 
Sustained employment, further study or both Unmatched to CIS Unmatched to CIS Unmatched to CIS 
Further study, with or without sustained employment Unmatched to CIS Unmatched to CIS Unmatched to CIS 

Annualised earnings 

Earnings figures are only reported for those classified as being in sustained employment via PAYE and where we have a valid earnings record from the P14 or where they are self-employed and have reported income of over £0 for that tax year. Those in further study are excluded, as their earnings would be more likely to relate to part-time jobs. Note that our publications prior to December 2017 did not include earnings from self-assessment. Under the new methodology, some graduates will have increased earnings if they have PAYE earnings as well as self-employment earnings. However, there are also more graduates included in the earnings calculations – those who have self-employment earnings but do not have qualifying PAYE earnings. This group typically has lower earnings than graduates with PAYE earnings. Thus, the reported median earnings under the new methodology is not necessarily higher under the new methodology compared to the old methodology. See our December 2017 publication for more details on the effect of this methodology change.  

Under our new methodology, PAYE and earnings from self-employment are treated differently.  

For each graduate who has been paid through the PAYE system, the earnings reported for them for a given tax year are divided by the number of days recorded in the employment spell in that same tax year. This provides an average daily wage, which is then multiplied by the number of days in the tax year to create their annualised earnings.  

This calculation has been used to maintain consistency with figures reported for further education learners after study (Further education outcomes, Academic year 2020/21 – Explore education statistics – GOV.UK (explore-education-statistics.service.gov.uk)). It provides students with an indication of the earnings they might receive once in stable and sustained employment. 

For earnings from self-employment, raw earnings are used. Due to the nature of the Self-Assessment tax return, dates of self-employment are not required and therefore are not available to annualise the self-employment earnings in the same way that PAYE earnings are annualised. We are therefore assuming that earnings reported in the Self-Assessment tax return relate to a spell of (self-)employment covering at least the whole  tax year.  

Where a graduate has income from both sustained employment paid through PAYE and though self-employment, the earnings used for this graduate is the sum of their annualised PAYE earnings and their raw earnings from self-employment. It should be noted that a graduate with a PAYE records (that does not reach the ‘sustained’ criteria) and a self-employment earnings record will be counted as being in ‘sustained employment’ but we do not include their earnings in the earnings calculation. This is to avoid the risk of annualising PAYE data that could be based on a very short earnings spell.  

The annualised earnings calculated are slightly higher than the raw earnings reported in the tax year. This is because the earnings of those who did not work for the entire tax year will be higher when annualised. The difference between the annualised and raw figures decreases as time elapses after graduation. That is, the difference between median annualised earnings and median raw earnings is greater one year after graduation than five years after graduation. The trend follows for both graduates who are in PAYE employment only and graduates who earnt income from both PAYE employment and self-employment.  

Information provided on the Self-Assessment tax return includes a field on earnings through PAYE employment, which we have used only where P14 earnings is not present.  

All earnings presented are nominal. They represent the cash amount an individual was paid and are not adjusted for inflation (the general increase in the price of goods and services). The exception to this is the figure and table showing the nominal earnings compared to real-term earnings using Consumer Prices Index Including Owner Occupiers’ Housing Costs (CPIH) to account for inflation.  

It should be noted that LEO does not currently include data on the average number of hours worked per week. Therefore, we can not distinguish between part-time and full-time employment/earnings. We appreciate that this is likely to impact some demographics more than others and are working towards having this data in future iterations of LEO so that it can be accounted for. 

Median earnings adjusted for region 

The adjusted median is calculated by weighting each university’s graduates so that the distribution of graduates from that HE institution is the same as that nationally. For example, if university A has 3% of its graduates living in the East Midlands compared to 12% nationally then graduates in the East Midlands at that university will be given a weighting of 4.0. This adjusted median can then be compared to the actual median of the institution to give an indication of how the regional distribution of their graduates influences their institution level figures.   

For this calculation to produce meaningful results, each HE institution should have a reasonable number of graduates in each of the regions. Some HE institutions do not have sufficient graduates in each region, and so for these institutions, the adjusted medians have been suppressed. Consequently, we have only been able to produce adjusted medians for HE institutions as the majority of Further Education Colleges have graduates distributed across a small number of regions. We also include data for subject and sex where possible to further aid comparisons.  

To maximise the number of institutions for which we are able to publish data we combined the regions into 6 larger groups based on the similarity of their overall earnings figures. This resulted in the following groups that formed the basis of our weighting calculations:    

Government Office Region  Group  
London  1  
South East  2  
East of England  3  
Scotland  3  
West Midlands  4  
South West  4  
East Midlands  4  
North West  5  
Yorkshire and the Humber  5  
North East  6  
Wales  6  
Northern Ireland  6  

 

It should be noted that whilst this method can produce adjusted median earnings that are either higher or lower than the actual raw median earnings for each institution, this methodology is likely to result in more institutions having higher adjusted median earnings than raw median earnings. This is because the national distribution of graduates is concentrated in London and the South-East and when applying this weighting to all institutions, graduates in these areas (who on average will be higher earners) are likely to account for a higher proportion of the institutions adjusted earnings data.  

In this release, the weighted earnings have been calculated by provider, subject and sex where possible and are available in Table 4 of the ‘Excel table – Provider tables’ file in ‘All supporting files’ under the ‘Explore data and files’ section.   

Calculating earnings difference between sexes 

Previously, the percentage used to compare male and female earnings was calculated as the difference between the medians divided by the female median earnings. Since the 2018/19 publication, we have altered the calculation and use male median earnings as the denominator. This is inline with the calculation used by the ONS in their gender pay gap publication - Gender pay gap in the UK - Office for National Statistics (ons.gov.uk) (opens in a new tab)

Rounding and suppression rules  

We apply rounding and suppression rules to help minimise the risk of someone being identifiable from our data (also known as Statistical Disclosure Control).  All calculations in this publication use rounded figures.  

The following rounding rules have been applied to this publication: 

  • All monetary values have been rounded to the nearest £100
  • All population counts have been rounded to the nearest 5.
  • All percentages have been rounded to 1 decimal place.
  • The following suppression rules have been applied to this publication:
    • Earnings and outcomes based on less than 11 FPE have been suppressed.

The rules for earnings outcomes are the same as other LEO Graduate outcomes publications. For employment outcomes, the rules are stricter here as provider-level is already granular data. 

Contextual information

There are several factors that can influence the employment and earnings outcomes of graduates. To aid comparisons between similar universities, we have provided additional data about the prior attainment and POLAR quintile of the students graduating (see "Prior attainment banding” and “POLAR” in the “Definitions” section for further details). 

The contextual data provides useful information where universities have a reasonable proportion of their students included in the measures. For some universities, the contextual data only covers a small proportion of their graduates. We have therefore provided the following coverage indicators alongside the contextual measures. 

Included in prior attainment band: this column shows the proportion of matched graduates who are included in our calculation of the prior attainment band. As the NPD only contains data on the key stage 5 qualifications obtained by 16–18-year-olds in England since 2002 not all graduates will be included in the prior attainment band for each university. 

Included in POLAR4 quintile: graduates for whom we have POLAR4 information on the HESA student record and who were non-mature when entering higher education, as a proportion of matched graduates. 

We are continuing to investigate how best to compare employment and earnings outcomes for universities that have a low proportion of students covered by the contextual data (mainly universities with a high proportion of mature students).  

In 2018 the Institute for Fiscal Studies published DfE funded research into how the relative returns of an undergraduate vary by subject, provider and student characteristics. This sought to identify differences in earnings five years after graduation, controlling for factor such as prior attainment, gender, ethnicity and social background. The report can be found at https://www.gov.uk/government/publications/undergraduate-degrees-relative-labour-market-returns (opens in a new tab)

Definitions

Sex 

For graduates from HEIs and APs, this field is collected by HESA and more detail can be found on Student 2021/22 - Sex identifier | HESA (opens in a new tab). We filter our data to only include individuals who are recorded as ‘Male’ or ‘Female’ to avoid the risk of disclosure for individuals who are recorded as ‘Other’.  

For graduates from FECs, the field is collected in the ILR and more detail can be found on ILR Specification 2024 to 2025: Field: Sex (submit-learner-data.service.gov.uk) (opens in a new tab). For these individuals, ‘Male’ and ‘Female’ are the only possible entries in the field. 

Subject areas

In 2019-20, The Higher Education Statistics Agency (HESA) changed the way that they classify subjects; the Higher Education Classification of Subjects (HECoS) replaced the Joint Academic Coding System (JACS). HESA uses the Common Aggregation Hierarchy (CAH) and to maintain consistency across years we are using levels 2 and 3 of the CAH to report breakdowns by subject area.  

The number of subject categories in CAH2 is 35. These level 2 CAH categories map exactly to a level 3 CAH category. More information on HECoS and CAH can be found here: HESA Collections | HESA (opens in a new tab) 

CAH Code Subject 
CAH01-01 Medicine and dentistry  
CAH02-02 Pharmacology, toxicology and pharmacy  
CAH02-04 Nursing and midwifery  
CAH02-05 Medical sciences 
CAH02-06 Allied health 
CAH03-01 Biosciences 
CAH03-02 Sport and exercise sciences  
CAH04-01 Psychology 
CAH05-01 Veterinary sciences  
CAH06-01 Agriculture, food and related studies 
CAH07-01 Physics and astronomy  
CAH07-02 Chemistry 
CAH07-04 General, applied and forensic sciences 
CAH09-01 Mathematical sciences  
CAH10-01 Engineering  
CAH10-03 Materials and technology 
CAH11-01 Computing 
CAH13-01 Architecture, building and planning  
CAH15-01 Sociology, social policy and anthropology 
CAH15-02 Economics  
CAH15-03 Politics  
CAH15-04 Health and social care  
CAH16-01 Law 
CAH17-01 Business and management  
CAH19-01 English studies  
CAH19-02 Celtic studies  
CAH19-04 Languages and area studies  
CAH20-01  History and archaeology  
CAH20-02 Philosophy and religious studies  
CAH22-01 Education and teaching 
CAH23-01 Combined and general studies 
CAH24-01  Media, journalism and communications  
CAH25-01  Creative arts and design 
CAH25-02  Performing arts    
CAH26-01 Geography, earth and environmental studies 

It is important to note that, even with these additional splits, each CAH3 subject area can still include a diverse range of subjects, some of which will lead to significantly different employment and earnings outcomes. 

Since the 2018/19 publication, we have provided an additional table that shows additional subject breakdowns within the CAH2 categories. Until last year, we used JACS 4-digit subject breakdowns, but this year, HESA provided only the CAH3 breakdown, which is included in this year's publication (see HESA Collections | HESA (opens in a new tab)) (opens in a new tab). The suppression rules mean that not all subject and provider combinations have employment and earnings outcomes available. 

Current Region

The current region geographical location data is based on the latest address that DWP has recorded for each individual on their Customer Information System (CIS). The LEO dataset does not contain the actual address or postcode for each individual, we currently have data on the Government Office Region (GOR), Local Authority District and Lower Layer Super Output Area (LSOA) where the individual lives at the end of each tax year.   

The CIS is primarily updated when an individual notifies DWP or HMRC of a change of address or through the individual interacting with a tax or benefit system. Individuals who have not been matched to the CIS will not have geographical information. This does not have an adverse effect on the data analysis as ‘unmatched’ graduates are excluded from employment and earnings outcomes.  

For those matched to CIS, address data is available in nearly all cases (over 99.8%), however for those who are not in receipt of benefits or contributing to the tax system then this information could be out of date. Even when contributing to the tax system, employee address is not a mandatory field in the data submitted to HMRC via employers HR systems. It is also possible that in the years soon after leaving university graduates may still use their parents address if they are moving frequently between rented accommodation. More work is needed to try and understand how big an impact this has on the address data held on CIS. 

Home Region 

A graduate's home region is found by using their permanent or home postcode prior to starting the course as recorded by HESA - Student 2021/22 - Postcode | HESA (opens in a new tab)

Full Cycle Movement (home region, provider region and current region)

Data for this section is only available in the Excel tables in ‘All supporting files’ under the ‘Explore data and files’ section.

Full cycle graduate movement uses three variables (home region, provider and current region) to indicate the movement patterns for a provider. The provider and home region variables are both from the HESA student record, and the current region geographical location data is from DWP as is explained in more detail in the above ‘current region’ section.   

It should be noted that if a graduate's home or current region is different to the provider’s region then this doesn’t guarantee that the graduate has moved. It is possible for a graduate to be commuting across regions either for study or work (e.g. after studying in London, living in London but working in the South East).  

If a graduate has an unknown home or current region, they were filtered out of this analysis, meaning that the cohort numbers are smaller than in other breakdowns. An individual may have an unknown home region if their home postcode is not provided by their HE provider, however this only affects a very small proportion of graduates. Reasons for an individual having an unknown current region are explained in the previous ‘current region’ section. We also filter out mature graduates from this analysis because the home region data is unreliable for mature students. This is because the region they lived in prior to starting their course is less likely to be their ‘true’ home region, as they are more likely to have geographically relocated in the years between school and higher education.  

Prior attainment

Prior attainment is the attainment of students prior to commencing their higher education course. We have calculated prior attainment based on key stage 5 qualifications recorded in the National Pupil Database (NPD), which contains data about pupils in schools and colleges in England. Due to the coverage of the NPD, we are unable to provide prior attainment breakdowns for mature students. Note also that coverage for graduates domiciled in Scotland, Wales and Northern Ireland is significantly lower than for those domiciled in England, since only those who took their KS5 qualifications in England are included. 

A level gradePoint Score
A or A* 120 
100 
80 
60 
40 
N/U Not counted as one of top 3 A levels 

Prior attainment banding

We calculate each graduate’s point score from their top three A levels and use this to compute a weighted median point score for each institution, split by subject, gender and year after graduation. As weights, we use Full Person Equivalent (FPE), which takes account of graduates who spent, for example, 50% of their studying time studying French and 50% on philosophy. FPE does not, however, distinguish between full-time and part-time study. 

We then rank providers by their median point scores and place them into one of three bands: 

Band 1: top 25%, greater than 75th percentile 
Band 2: middle 50%, between 25th and 75th percentiles 
Band 3: bottom 25%, less than 25th percentile 

The intention of this method is to allow for comparison of institutions within the same subject area. Since the rankings used are based on single subjects only, it could be misleading to compare an institution’s prior-attainment bands between different subjects. 

It is recognised that the prior attainment bandings can be expanded further to include the points for those who took other key stage 5 qualifications. 

POLAR (Participation Of Local Area) )

The Participation of Local Areas (POLAR) classification places local areas into five groups, based on the proportion of 18 year olds who enter higher education at age 18 or 19. POLAR4 is the iteration used in this publication. Detailed information about the POLAR methodology is available from the OfS at www.officeforstudents.org.uk/data-and-analysis/polar-participation-of-local-areas/ (opens in a new tab)

Here, we publish the proportion of non-mature matched graduates whose postcode on the student record placed them in quintile 1 (the most disadvantaged group) of POLAR4 before applying for or entering higher education. This information is split by subject studied, institution, gender and year after graduation.  

For mature students, their postcode immediately before entering higher education is less likely to be indicative of the environment they grew up in, and hence their POLAR classification would have to be interpreted differently from that of non-mature students. We therefore exclude mature students from our POLAR measure. 

HESA do not publish POLAR figures for Scotland, as Scotland’s relatively high participation rate and the high proportion of higher education students in further education colleges could misrepresent Scottish contributions to widening participation. Following that line of reasoning, this publication does not include POLAR figures for Scottish HEIs either. 

Data matching and match rates

The HESA student records are matched to DWP’s Customer Information System (CIS - Customer Information System – an explanation of the information held about you - GOV.UK (www.gov.uk) (opens in a new tab)) using an established matching algorithm based on the following personal characteristics: National Insurance Number (NINO), forename, surname, date of birth, postcode, and sex. Some of these characteristics are simplified to make the matching process less time-intensive and allow more matches. For instance if a surname is misspelt in one of the datasets, only the first initial of the forename is used, the surname is encoded using an English sound-based algorithm called SOUNDEX (function that turns a surname into a code representing what it sounds like, which allows some flexibility for different spellings. For example Wilson=Willson), and for most matches only the sector of the postcode is used.  

The NINO is not present on the HESA student record itself and has been matched on where possible by fuzzy matching with personal data from the Student Loans Company. This process increases the likelihood of finding a match with CIS. Accordingly, groups less likely to take a student loan, for example international students who are not eligible for one, are likely to have lower match rates.  

All records accessed for analysis are anonymous so that individuals cannot be identified. The personal identifying records used in the actual matching process are accessed under strict security controls.  

There are five match processes carried out, ranging from the highest quality and most likely to be accurate (Green) to the lowest quality and most likely to be a false match (Red-Amber). The table below shows the criteria for each match type.   

Once the HESA records have been matched to the CIS the corresponding tax and benefits records for that individual can then be linked to their HESA record.  

All match rate analysis in this chapter is restricted to the HESA population covered in this publication, that is, UK domiciled, first degree graduates from UK Higher Education Providers.  

Table: Criteria for each match strength (Y indicates a match, - indicates no match)  

Match 
quality
NINO (National 
Insurance number)
Forename
(initial)
Surname(soundex)Date of birthSexPostcode(sector)
1. Green  At least four of forename, surname, DOB, sex and postcode. 
2. Amber  Y  Any three of forename, surname, DOB, sex and postcode. 
3. Green-Amber  -  Y  Y  
4. Amber-Red -  Y  One of sex or postcode. 
5. Red-Amber -  -  -  Y  Y (full postcode)  

Overall match rates  

In this section we consider match rates to the CIS spine. This differs slightly from the match rates displayed in the main tables of this publication, which also include those without a CIS match but with a record of further study in the given year.  

The table below shows the overall CIS match rates for graduates who studied full-time as well as the proportion with a tax or benefit record. Potential reasons for not being able to find a P14 record, despite having a match to the CIS spine, include earning below the Lower Earnings Limit (LEL), self-employment, moving abroad and death.  

Table: Match rates for UK domiciled first degree graduates at English HEIs, by year of graduation  
Coverage: UK domiciled male and female first-degree graduates from English HEIs.  
Cohorts: 2004/05, 2005/06, 2006/07, 2007/08, 2008/09, 2009/10, 2010/11, 2011/12, 2012/13, 2013/14, 2014/15, 2015/16, 2016/17, 2017/18, 2018/19, 2019/20

Academic yearMatched to CIS spine (%)Matched to tax/benefit record (%)
2004/05  9595
2005/06  9695
2006/07  9796
2007/08  9797
2008/09  9797
2009/10  9897
2010/11  9897
2011/12  9898
2012/13  9998
2013/14  9999
2014/15  9999
2015/16  9999
2016/17  9999
2017/18  9999
2018/19  9998
2019/209997

The table above shows that the match rate was high for the most recent cohorts: 99% of full-time graduates in 2019/20 were matched using the CIS, 99% were matched in 2015/16 (our 5YAG breakdown for the 21/22 Tax Year) and almost all of these had at least one tax record or out-of-work benefit record. This compares to a match rate of 95% of graduates in 2004/05. The higher match rates for more recent cohorts is at least partly explained because the CIS holds the most recent names and addresses for individuals, and so if the details change after someone graduates there is less chance that they will be matched. 

Match rate by graduate characteristic  

The table below shows match rates by sex. The match rate for females is slightly lower in the earlier years than for males, but this difference is negligible or non-existent in recent cohorts. As the CIS holds the latest information about an individual, anyone that has changed their name since graduation will have a different name on the CIS compared to their HESA record. This particularly affects females, due to a higher likelihood than males of changing their name upon marriage.

Table: CIS match rate by sex  
Coverage: UK domiciled male and female first degree graduates from English HEIs.  

Academic year Female (%) Male (%) 
2004/05 9498
2005/06 9498
2006/07 9598
2007/08 9698
2008/09 9798
2009/10 9798
2010/11 9798
2011/12 9899
2012/13 9999
2013/14 9999
2014/15 9999
2015/16 9999
2016/17 9999
2017/18 9999
2018/19  9999
2019/20  9999

The match rates were also compared for different ethnic groups out of the UK-domiciled students. Match rates in 2019/20 range from 95% to 99%. The lowest match rate was for the ethnicity Chinese.

The number of forenames or surnames an individual has can affect the match rate, because with multiple names it is more likely that they will not all be recorded, or there may be forenames recorded as surnames or vice versa. Analysis of the match rates showed that those with at least two surnames had a slightly lower match rate than those with only one.  

Match rates are noticeably lower for non-UK domiciled graduates. The main reason for this is that LEO relies on graduates having been issued with a National Insurance number to match them to an employment record. However, international students who have no intention of working or claiming benefits in this country are less likely to apply for a National Insurance number and so would not appear in the LEO data.  

It may be that international graduates remain in the UK but not in work or receiving any type of benefit, and so do not require a National Insurance number. However, our expectation is that international graduates are likely to have moved abroad, with the majority returning to their home country. Recent Home Office reports confirm that the vast majority of non-EEA international students who were granted a visa to study in the UK left in time (97.5% - Fifth report on statistics relating to exit checks: 2019 to 2020 - GOV.UK (www.gov.uk) (opens in a new tab)).  

Some international students may have been issued with a National Insurance number but will not appear in the UK tax or benefit system for the tax years included in this release. These graduates are recorded as ‘activity not captured’, even if they are in employment in another country.   

This publication uses the current region of residence data supplied by the Department for Work and Pensions (DWP) to identify graduates who were not living in the UK for the majority of the tax year. These graduates are removed from the denominator to help improve the accuracy of the employment outcomes calculations.   

Other reasons for lower match rates among non-UK domiciled graduates include higher likelihoods of misspelling of names and lower take up of/eligibility for student loans, meaning we would not be able to attach NINO to the HESA data to aid the matching process. 

Get in touch

Media enquiries  

Press Office News Desk, Department for Education, Sanctuary Buildings, Great Smith Street, London SW1P 3BT.  

Tel: 020 7783 8300  

Other enquiries/feedback  

Email: HE.LEO@education.gov.uk 

Help and support

Contact us

If you have a specific enquiry about LEO Graduate outcomes provider level data statistics and data:

Higher Education Graduate Outcomes Analysis

Email: he.leo@education.gov.uk
Contact name: Cathie Hammond

Press office

If you have a media enquiry:

Telephone: 020 7783 8300

Public enquiries

If you have a general enquiry about the Department for Education (DfE) or education:

Telephone: 037 0000 2288

Opening times:
Monday to Friday from 9.30am to 5pm (excluding bank holidays)