
This research investigates the application of Large Language Models (LLMs) in measuring andanalyzing loneliness in the caregiver and non-caregiver populations to enable building diversesocial media datasets to study loneliness across the two populations and better understandtheir experiences of loneliness.
Firstly, this research applies GPT-4o, GPT-5-nano, and GPT-5 to evaluate and detecthigh quality Reddit posts from 15 subreddits. We developed an expert-developed frameworkto measure loneliness and an expert-informed cause of loneliness typology framework toidentify and categorize causes of loneliness across populations. This complete data processingpipeline is validated with human annotation and resulted in a validated data processingpipeline that judges a given post’s relevance, measures the author’s loneliness, extracts andcategorizes the author’s cause of loneliness, and extracts demographic information.
We find that LLMs are able to be successfully applied to measure loneliness via apsychologically grounded framework in the caregiver and non-caregiver populations, achieving76.09% and 79.78% average accuracy respectively. Additionally, we find that LLMs areable to effectively apply the cause of loneliness categorization framework on high-qualityReddit posts, achieving high micro-F1 scores of 0.825 and 0.8 in the caregiver and non-caregiver populations, respectively. We find that the distribution of cause categories stronglydiffers across the two populations, suggesting our dataset and framework captures differencesbetween the two populations. We find that the perceived causes of loneliness between thetwo populations highly differ, with caregiver’s loneliness predominately originating from theirrole as caregivers, demonstrating the loneliness experiences between the two populations aredistinct. Through applying these validated frameworks, we successfully created a dataset ofhigh quality posts for both populations. Through demographic data extraction, we find thatReddit data is viable for building a diverse dataset across 6 demographic categories in thecaregiver population. This work contributes to understanding caregiver and non-caregiverloneliness by establishing a LLM-based data processing pipeline for sourcing high quality anddiverse social media data and demonstrating successfully application of LLMs to analyzedifferences in the loneliness of the two populations.
Computer Science / Emory University
BS / Fall 2025
Jinho D. Choi, Computer Science, Emory University (Chair)
Joyce C. Ho, Computer Science, Emory University
Jane Chung, School of Nursing, Emory University
Anthology | Paper | Presentation

Jane Chung, Michelle Kim, Joyce Ho, Jinho D. Choi