This section describes the five stages that the data go through and how the Department supports schools and local authorities in their task of providing high quality data. The first four stages are all internal processes. The census data is not used publicly until the final, fifth stage when it is formally published within the publication.
The school workforce data required from both schools and local authorities is determined in advance of the census in such a way that schools and LAs can engage the suppliers of their management information systems (MIS) with sufficient time to incorporate any new data items (or changes to existing data items) into their local systems. The software suppliers build data extraction routines based on the data requirements set out in the technical specification published by the Department. Typically, a near final version of the technical specification is shared with software suppliers around a year before the next census date. This allows software suppliers the opportunity to see and comment on the data requirement and any changes from the previous year. Their comments and views are taken into account to ensure the Department is asking for data in a way that is straightforward to deliver. A final version of the technical specification is then published for local authorities and software suppliers to see and use. The current technical specification and archived versions are published here.
Stage 1
By census, day schools and local authorities should have ensured their management information systems hold accurate details for all their staff in scope of the census. They would then ensure that the information required by the Department (as set out in the published School Workforce Census data requirement) is extracted and uploaded to the Department’s COLLECT system. Schools and LAs will have had the opportunity to test out the quality of their data and the data extraction routines provided by their software suppliers by using the familiarisation version of the Department’s data collection system, COLLECT. The full list of data items collected by the census can be found in the guides provided to schools and LAs.
Stage 2
Once schools and local authorities have successfully loaded their data onto COLLECT they can review and inspect their data. The COLLECT system has a range of checks that it runs on the data: e.g. simple formatting checks, arithmetic checks and validation rules that specific data items must meet. The checks made within COLLECT are contained within the published guides and specifications.
Software suppliers often build these checks into their data extraction routines and/or management information system (MIS) upgrades. The checks within COLLECT will flag up where the data provided has either failed to meet the standards required (an error) or does not conform to what was expected (a warning). For example, an error would result if no contract information was provided and a warning would result if the date of birth placed the teacher’s age as less than 21 or over 90. The validation checks are reviewed and improved each year, for example, checks to identify schools with very large changes in teacher numbers were introduced. This gave schools an opportunity to say whether the change was real or to resubmit if there had been a data input error. This process helps identify where schools initial returns need attention. Checks and guidance have been continually updated and improved to help schools provide better quality data on the pay of teachers (especially part-time teachers) and whether the data submitted includes a pay award for the current year.
Schools and local authorities then check their data - especially the errors and warnings - to ensure the data is correct and accurately reflects the staffing levels at their school at the time of the census. Changes and/or corrections to the data provided can be done either on-line in COLLECT or within the local MIS system (preferred). If the changes are done locally then the data has to be resubmitted to the Department. Once schools and local authorities have resolved their errors and warnings they approve their data – which signals to the Department that the data can move to the next stage. The COLLECT system also includes credibility reports which have been developed by the Department to ensure that the overall return looks right and is complete. School’s and local authorities are encouraged to run these reports to ensure that the data being submitted is a true reflection of staffing levels and characteristics.
Stage 3
Once the data has been approved for use by schools and local authorities, the Department runs a further set of checks on the data. These checks look within the data to spot any problem areas, for example, where schools have provided substantial numbers of records that are missing particular data items e.g. staff with no contract information and staff whose pay rate is not credible. The results of these checks are fed back to data providers to amend and improve data quality.
Throughout the first three stages of the collection, the Department operates a helpdesk which staff at schools and local authorities can contact if they are unsure about any aspect of the School Workforce Census. This is the primary route that academy schools use to discuss their queries regarding the data they are submitting. The helpdesk operates throughout the census period, November to December, and the period immediately afterwards when data credibility checking takes place – typically December into January. When this process is completed and both data supplier and the Department are content the Department authorises data for use.
Stage 4
Once all the school and local authority data has been authorised a final database for the year is created which allows the Department’s statisticians to prepare the information that is to be published. At this stage any data that has been provided and deemed to be out of scope is removed from the dataset for example teaching staff on zero hour contracts (likely to be a pool of supply teachers that are regularly used by the school but were not actually in service during the time of the census). Any data that has been received but is not required such as contracts before the date that a school converted to an academy are also removed.
This results in a database containing all contracts for all staff working in all schools for the year. This is added to a database of all contracts in all years This dataset is used to calculate the number of teachers and other school support staff working in each school since the school workforce census began.
At this point linking of teacher records across the years is added using the individual identification information supplied. This is based on the Teacher Reference Number (TRN) supplemented by NI, names and Date of Birth. In addition all year databases for other also modules of the census, absences, curriculum taught, qualifications and vacancies are produced.
To support the process of linking teacher data across years it is matched to other administrative data held by the Department. This is to help overcome occasional gaps in the data, for example, where schools have not supplied the TRN for a teacher. The teacher data from the School Workforce Census is matched to the Database of Teacher Records which is data collected as part of the administration of the Teachers’ Pension Scheme. This dataset should have an accurate TRN for all teachers and is used to fill gaps in the School Workforce Census.
A second match is made to the Database of Qualified Teachers which contains details of the year that teachers finished their teacher training and gained Qualified Teacher Status (QTS). Where teacher records without QTS have been supplied in error then this field can be completed. In addition, the School Workforce Census does not collect the year of qualifying for a teacher, this information is added to the linked teacher dataset so that Newly Qualified Teachers can be identified. – This is important for the analysis of teacher entrants..
The all year contract file is then used as the basis of an aggregated teacher level dataset. This selects a main contract for the small percentage of teachers who have more than one open contract on census day e.g. where they have two distinct roles within a school or work in more than one school. The aggregated teacher dataset is the main source of data used for publishing statistics on teacher numbers, teacher characteristics as well as entrants to teaching, teachers leaving state funded schools and teacher retention.
The linking of teacher data across years allows for the better identification of poor quality or inconsistent school data. The linked dataset has helped with the identification and removal of duplicate teacher contracts - where they were provided by both a school and a local authority. It has also supported the identification of individual teachers working in multiple schools.
Stage 5
The publication “School Workforce in England” is the first part of the dissemination process. The release of the publication signals the availability of the data for use by the Department (e.g. Teacher Supply Modelling) and to the general public (e.g. in reply to Freedom of Information (FoI) requests.) and to independent analysts and researchers – who can request specific information.
The second main output produced from the latest data is a large set of school level data which is released as part of the Department’s commitment to release the underlying data used to create all national statistics. The Department releases school level school workforce statistics showing teacher and support staff numbers, staff characteristics, teacher pay and sickness absence and the number of vacant posts. The school data also includes school type and phase and various geographical data e.g. the local authority.