Lung cancer in United Kingdom general practice and the possibility of developing an early warning score

Iyen-Omofoman, Barbara (2012) Lung cancer in United Kingdom general practice and the possibility of developing an early warning score. PhD thesis, University of Nottingham.

PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (3MB) | Preview



Lung cancer has a dreadful prognosis and is the leading cause of cancer deaths in the world and in the UK. The UK survival rates are particularly poor when compared with survival in other countries in Europe. More than two-thirds of people with lung cancer in the UK are diagnosed at a late stage when curative treatment is no longer possible. Since lung cancer survival rates are higher with earlier diagnosis, there is need to diagnose cases earlier. This suggests a potential to examine and if possible, modify the care pathway for people with lung cancer to achieve earlier diagnosis.


The overall aim of this thesis was to explore the patient characteristics and interactions in primary care before the diagnosis of lung cancer, as a means of identifying the features that are predictive of lung cancer and the potential for earlier diagnosis. To achieve this aim, it was necessary to investigate and validate the use of lung cancer data from The Health Improvement Network.


The Health Improvement Network (THIN) database of United Kingdom general practice records, was used to identify and study the characteristics of cases of lung cancer in the UK. To ensure that THIN was a valid source of lung cancer information for research, a study was done to assess the completeness and representativeness of the lung cancer data in THIN by comparing the lung cancer patient characteristics, incidence and survival in THIN with the UK National Cancer Registry and the National Lung Cancer Audit Database. Experian's Mosaic Public Sector variable linked into THIN database was then used to identify detailed profiles of the UK sectors of society where lung cancer incidence was highest as a means of exploring the potential of using this geo-demographic tool to facilitate disease ascertainment.

Two case-control datasets were developed from the database using the identified cases of lung cancer. The first dataset was matched on age, sex and general practice and it was used to carry out three studies in this thesis. The first study was a pilot study of methods to identify the socio-demographic and clinical features independently associated with lung cancer as well as to identify the timing of these clinical features before lung cancer was diagnosed. This was followed by two studies to examine separate hypotheses on the variation in lung cancer risk firstly between smokers of different socioeconomic status, then between smokers with and without a recorded history of depression, as socioeconomic deprivation and depression are both associated with increased prevalence of cigarette smoking.

The second case-control dataset was matched only on practice and this dataset expanded on the methods from the pilot study to identify the socio-demographic factors including age and sex, as well as the early clinical features that are predictive of lung cancer. This was followed by a study which used the identified predictors to develop and validate a risk-prediction model for lung cancer. The model validation was carried out using another dataset of patients in a more recent version of THIN with records spanning a time period after the last date of records for patients used for the earlier studies in the thesis.


A study population of 12,135 patients with incident lung cancer were identified from the 1st of January 2000 to the 28th of July 2009. The overall incidence of lung cancer, median survival and general lung cancer patient characteristics in THIN were similar to other national lung cancer databases - The National Lung Cancer Audit Data and the UK National Lung Cancer Registry data from the Office of National Statistics. Mosaic™ classifications identified wider variations in lung cancer incidence than existing markers of socioeconomic deprivation and therefore allowed more detailed classifications of the UK sectors of society where lung cancer incidence was highest. For example the incidence rate in Mosaic Public Sector™ type I50 (Cared-for pensioners) was 31.2 times higher (IRR 31.2; 95% CI 21.9-44.5) than the incidence rate in Mosaic Public Sector™ type B10 (Upscale new owners).

With regards to the risk of lung cancer among smokers from different socioeconomic groups, stratified analyses of the association between smoking and lung cancer by Townsend deprivation quintiles showed that the risks of lung cancer were similar in smokers of different socioeconomic status. Depression was associated with a 30% increased risk of lung cancer (odds ratio 1.30; 95% CI 1.24-1.38) which was completely explained by smoking. Cigarette smoking was more common and levels of consumption were higher among depressed compared to non-depressed individuals. Stratified analyses of the association between smoking and lung cancer by depression showed that there was no difference in lung cancer risk among depressed and non-depressed smokers.

Socio-demographic features - age, sex, socioeconomic status and smoking, increase in the frequency of general practice consultations as well as early records of presentation for symptoms of cough, haemoptysis, dyspnoea, weight loss, lower respiratory tract infections, non-specific chest infections, chest pain, hoarseness, upper respiratory tract infections and Chronic Obstructive Pulmonary Disease (COPD) were found to be independently associated with lung cancer 4 to 12 months before diagnosis. A risk prediction model was developed with these variables, and on validation using an independent THIN dataset of 1,826,293 patients, the model performed well with an area under the curve statistic of 0.88.


Routine electronic data in THIN are a valid source of lung cancer information for research. Mosaic™ identifies greater incidence differentials than standard area-level measures and as such could be used as a tool for public health programmes to ascertain future cases more effectively.

Neither socioeconomic deprivation nor a history of depression increases an individuals' vulnerability to the carcinogenic effects of cigarette smoke. The increase in lung cancer risk among more deprived individuals and those with depression is largely explained by the greater cigarette consumption by these groups of people. Smoking cessation interventions targeted to these groups of people are needed to reduce the lung cancer-related health inequalities associated with deprivation and depression.

A combination of patients' age, sex, socioeconomic characteristics, smoking status and early stage symptoms in general practice aid earlier identification of patients at increased risk of lung cancer. The model developed using these variables performed substantially better than the current NICE referral guidelines and all comparable models, being able to predict lung cancer early enough to make detection at a potentially curable stage feasible by allowing general practitioners to better risk-stratify their patients.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Hubbard, R.
Tata, L.
Keywords: Lung cancer, Primary care, Early diagnosis, Early-warning score
Subjects: W Medicine and related subjects (NLM Classification) > WF Respiratory system
Faculties/Schools: UK Campuses > Faculty of Medicine and Health Sciences > School of Community Health Sciences
Item ID: 42999
Depositing User: Iyen, Barbara
Date Deposited: 11 Oct 2017 12:25
Last Modified: 14 Oct 2017 20:51

Actions (Archive Staff Only)

Edit View Edit View