Independent Component Analysis
We applied an ICA questionnaire data from the UKB. ICA is a statistical decomposition method that separates multivariate data into a predetermined number of statistically independent components. For the ICA we used data from each participants’ answer to a self-report mental health questionnaire included in the UKB data collection. This mental health questionnaire covers questions reflecting various topics relevant to mental health such as depressive symptoms, anxiety, psychological trauma, and questions about substance use. We excluded individuals with a diagnosed psychiatric or neurological disorder (F or G ICD-10 diagnosis except for those with a nerve, nerve root, and plexus disorders, categories G50 to G59) and removed individuals with more than 10% missing answers. This resulted in a sample of 136,678 individuals.
In the feature domain, we used only questionnaire items that were answered by all subjects in the dataset, which excluded follow-up questions on the original items to ensure a robust dataset with minimal missing data. We also excluded questions that referred to symptoms only in the past two weeks prior to assessment to avoid unwanted temporal effects in the analyses. Finally, we excluded items with more than 10% missing answers. This reduced the number of questions in the dataset from about 140 to 43. We imputed missing responses through KNN imputation with k = 3. We subsequently z-score standardized the data.
This dataset of 136,678 individuals and 43 items was then used as input to the ICA. Since ICA requires an a priori decision of the number of clusters, we used a combination of expert knowledge and a data-driven approach implemented in the icasso toolbox1,2. This provided us with evidence that 13 was the ideal number of independent components. These components each capture a separate domain of mental health contained within the original dataset. The individual loadings from each participant on the 13 independent components would later serve as the input vector for the GWAS described below.