Skip to main content
Enterprise AI Analysis: Identifying Chronic Disease Themes Research on Social Networks Based on Artificial intelligence and Big Data

ENTERPRISE AI ANALYSIS

Identifying Chronic Disease Themes Research on Social Networks Based on Artificial intelligence and Big Data

With the rapid advancement of technology, social networks have gradually evolved into a core platform for people to exchange health insights and share health information. Chronic diseases, as one of the most prevalent health challenges, have a significant negative impact on individuals' quality of life and physical and mental health. In this context, social networks have become a crucial channel for chronic disease patients to access valuable health information. To more effectively tap into and utilize the vast amount of health information resources generated by users on the internet, this paper conduct in-depth exploration and identification of chronic disease themes in the context of social networks. This paper selects four chronic disease texts of diabetes, hypertension, depression and autism in social networks as objects, and conducts research on themes and emotions through network text mining. an LDA-based topic model was constructed to conduct topic analysis on four chronic disease data, and social support theory was introduced on this basis. Finally, Sentistrength was used to perform sentiment computation on tweets of four chronic diseases, and SPSS was used to analyze the differences in sentiment among different diseases and social support topics. system.

Executive Impact: Why This Matters for Healthcare AI

The insights from this research are crucial for developing targeted AI solutions in chronic disease management. Understanding social network dynamics allows for personalized support systems and early intervention strategies.

0 Chronic disease deaths in China (2019)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

1 Introduction

With the continuous development of China's social economy, the acceleration of urbanization, and the profound changes in residents' lifestyles, the prevalence of chronic non-communicable diseases (hereinafter referred to as chronic diseases), represented by hypertension and diabetes, has been increasing year by year, as shown in Figure 1, posing a severe threat to the health of Chinese residents. According to statistics, in 2019, deaths from chronic diseases accounted for 88.5% of the total deaths in China. Globally, chronic diseases have also become a significant public health issue that cannot be ignored, with deaths from chronic diseases accounting for more than 60% of all deaths worldwide.

In recent years, with the rapid development of information technology and the Internet, China's healthcare sector has accumulated a vast amount of health "big data" resources, which come in various forms and types. The advent of the "big data era" has provided new ideas and methods for solving the challenges of chronic disease prevention and control [1]. Driven by the wave of informatization, how to leverage these "big data" resources to innovate chronic disease prevention and control strategies, in order to meet the growing needs of residents for chronic disease management and improve the current severe situation of chronic disease prevention and control, has become a research hotspot in the field of chronic disease management.

This article systematically reviews the application of "big data" in chronic disease prevention and control both domestically and internationally, and applies artificial intelligence to assist in related data analysis. It aims to provide references and basis for the development and in-depth research of new models for chronic disease prevention and control in China, with the hope of contributing new wisdom and strength to the scientific prevention and control of chronic diseases in the era of artificial intelligence and big data.

2 Overview of Relevant Theories and Methods

The LDA (Latent Dirichlet Allocation, implicit Dirichlet allocation) topic model is based on the intuition that a document will display multiple topics, each of which is defined as a fixed lexical distribution [2]. LDA is an unsupervised maximum likelihood algorithm that uses a generation probability method to discover a given number of unknown topics or topics from the text. It is a three-level Bayesian AI model that has been widely used in text analysis to discover hidden underlying themes. The three-layer topology structure of the LDA topic model is shown in Figure 2.

In theory, this can be calculated by adding all the joint probabilities of the corpus under each instance of the topic model. However, the number of instances grows exponentially, and the sum is difficult to calculate. The topic modeling algorithm is usually a method for estimating the denominator approximation, also known as evidence in Bayesian statistical research. There are two distinct subject modeling algorithms - the sample-based algorithm and the variational algorithm [3]. The most commonly used sample-based algorithm is called Gibbs sampling, which creates a Markov chain. Markov chains are defined on implicit variables, and the chain is expected to run for a period of time during which it collects samples, thus making it approximate a posteriori. In contrast to sampling methods, variational methods are deterministic. Unlike using sample approximation, the variational method assumes a parameterized family of distributions on the hidden structure and then finds the members of the family closest to a posteriori, which turns the inference problem into a parameter optimization problem.

2.2 Construction of topic model based on LDA

In the generation process of LDA topic model, this paper uses the LDA modeling method provided by the Gensim library in Python, and uses the standard parameter: random in the model_ state=100, update_ every=1, chunksize=100, passes=10, alpha='auto'. Among them, random_ State indicates the number of random seed for training, update_ Every is the number of documents to iterate over each update, chunksize is the number of documents used in each training block, passes is the number of times the algorithm passes through the corpus during training, alpha is a one-dimensional array of expected topics, which represents a prior belief in the probability of each topic and can be set to automatic.

2.3 Determination of the optimal number of topics

The other task is to determine the optimal number of topics in the topic model. The main problem in using the LDA topic model is to determine the number of topics. It is required to avoid missing valuable topics and ensure their interpretability. Although there is no concept of "perfect result" in the LDA model, scholars often use the following three methods to help select a more suitable number of topics:

(1) Manual judgment based on experience. According to the data category and the results of the topic model, manual analysis is carried out, and different theme numbers are constantly tried to determine the most appropriate theme number. This method is generally cumbersome [].

(2) Calculate the Perplexity degree to determine the optimal number of topics. The formula for calculating Perplexity is as follows:

perplexity (D) = exp ΣMlog P (wi) ΣN

Among them, M is the number of text tested, Ni is the length of the text, and P (wi) is the probability of generating text wi from the LDA model [4]. The Perplexity of LDA model is based on the generalization of the model to measure the stability of LDA topic model. It also ensures the predictability of the LDA model leading to new themes. A lower level of confusion indicates better model generalization ability. However, this method is prone to the phenomenon that the number of topics is too large, leading to overfitting of the model. This paper calculates the Perplexity of the LDA topic model of four diseases and uses Excel to make a line chart, as shown in Figure 3. Blue represents diabetes, red represents hypertension, yellow represents depression, and green represents autism. It can be seen from Figure 3 that the Perplexity of the four disease models decreases with the increase of the number of topics. When the number of topics exceeds 10, the Perplexity of the model starts to decrease significantly and continuously. Therefore, in order to prevent the model from overfitting, this paper selects the best number of topics in 2-10 and further judges through the visualizable images of the model.

(3) Use LDAvis topic model to visualize image judgment. Sievert et al. [] carried out visualization research based on the LDA topic model, and successfully displayed the topic classification, distance between topics, topic feature words, etc. intuitively in the form of graphs, so that the results of the topic model can be viewed more clearly. In Python, the LDAvis tool used can perform theme visualization operations, clearly showing the distribution of different themes in the two-dimensional vector space. Different bubbles represent different themes, and the size of the bubbles represents how much text the theme contains. On a two-dimensional space vector, there are 5 themes on the left, and there is a large overlap phenomenon between bubbles; On the right are four themes, with no overlap between bubbles and scattered positions between them. This result indicates that the specified number of four topics is a better choice and has better topic recognition performance.

As shown in Figure 4, this paper selects 2-10 topics from the topic model of four diseases for theme visualization, and finds that the optimal number of topics for diabetes is 4, the optimal number of topics for hypertension is 4, the optimal number of topics for depression is 4, and the optimal number of topics for autism is 4.

3 Results of four major chronic disease topic model Methodology

After the calculation of Perplexity of the topic model and the visual analysis of the theme, the optimal number of themes for four chronic diseases is obtained. Based on this, this paper will further analyze and explain the results of different topic model of diseases. According to the keywords obtained from the topic model, this paper summarizes the four themes of diabetes as follows: symptom description, prevention and control, drug treatment and daily life. The keywords and themes of each theme account for as shown in Table 1.

On the basis of subject word analysis, Through a closer reading of the tweets, this article further explains what each topic of diabetes tweets contains.Topic 1 "Symptom Description" mainly refers to the description and expression of diabetes patients on the causes, symptoms and related complications of their own diseases, and this topic accounts for the largest proportion of topics in the entire diabetes tweets, with 43. 00%. The second theme "prevention and control" mainly refers to the prevention and examination measures for diabetes in patients without diabetes, and the control measures taken by people with diabetes to avoid deterioration, and the distribution of this topic is 22. 91%. The third topic "drug treatment" mainly includes the treatment plan of diabetes, as well as the type, time and dosage of drugs used by diabetic patients, etc., and the distribution of this theme is 18 78%. The fourth theme "daily life" mainly refers to the daily diet, exercise and work of diabetic patients, and the distribution of this theme accounts for 15. 31%.

According to the keywords obtained by the theme model, this paper summarizes the four topics of hypertension as follows: symptom description, drug treatment, daily life and medical diagnosis, and the keywords and topics of each topic account for Table 2 shown.

Based on the analysis of subject words, through in-depth reading of related tweets, this article further explains the content contained in each topic of hypertension tweets. Topic 1 "Symptom Description" mainly refers to the description and expression of hypertensive patients on the causes, symptoms and related complications of their own diseases, and this topic accounts for the largest proportion of topics in the entire hypertension tweets, 44. 07%. The second topic "drug treatment" includes the treatment plan of hypertension, as well as the type, time and dosage of drugs used by hypertensive patients, etc., and the distribution of this topic is 21 50%. The third theme "daily life" mainly refers to the daily diet, exercise and work of hypertensive patients, and the distribution of this theme accounts for 18 00%. Topic 4 "Medical Diagnosis" mainly refers to patients with high blood pressure seeking a doctor's diagnosis after the appearance of related symptoms, or routine examinations to prevent high blood pressure.

According to the keywords obtained by the topic model, this paper summarizes the four themes of depression as follows: seeking help, emotional expression, diagnosis and treatment, and daily life, and the keywords and themes of each topic account for Table 3 shown.

Based on the analysis of subject words, through in-depth reading of related tweets, this article further explains the content contained in each theme of depression tweets. The first theme "seeking help" mainly means that people suffering from depression and their families seek help and support from others, social institutions, etc., and the distribution of this theme is 40. 30%. The second theme "emotional expression" mainly includes the expression of negative and positive emotions in patients with depression in the process of illness and treatment, which is also a way to manifest the symptoms of depression. The distribution of this topic is 23 96%. The third topic "diagnosis and treatment” mainly indicates that patients with depression seek a doctor's diagnosis and treatment plan after the appearance of related symptoms, and the distribution of this theme is 20 71%. The fourth theme "daily life" mainly refers to the daily diet, exercise and work of depressed patients, and the distribution of this theme is 15 03%.

According to the keywords obtained by the theme model, this paper summarizes the four themes of autism as follows: symptom description, seeking help, diagnosis and treatment, and daily life, and the keywords and topics of each topic account for Table 4 is shown.

Based on the analysis of subject words and through in-depth reading of related tweets, this article further explains the content contained in each topic of autism tweets [5]. Topic 1 "symptom description" mainly refers to the description and expression of autistic patients on the causes, symptoms and related complications of their own diseases, and this topic accounts for the largest proportion of topics in the entire hypertension tweets, with 32. 52%. The second theme "seeking help" mainly indicates that people suffering from depression and their families seek help and support from others, social institutions, etc., and the distribution of this theme is 29. 53%. The third topic "diagnosis and treatment" mainly refers to autistic patients seeking a doctor's diagnosis and treatment plan after the appearance of related symptoms, and the distribution of this topic accounts for 21 88%. The fourth theme "daily life" mainly refers to the daily diet, exercise and work of autistic patients, and the distribution of this theme accounts for 16.09%.

4 Conclusion

Based on the analysis of the results of the above four chronic disease tweet topic classifications, it can be found that patients with different types of diseases pay different attention to various topics. Overall, patients with various diseases have widely discussed and interacted with on Twitter about the topics of symptoms, diagnosis and treatment, which also shows that most social network users are more concerned with how to determine their own disease and how to effectively treat it.

A closer look reveals that the four diseases differ in the proportional distribution of symptom descriptions, diagnosis, and drug treatment topics. Diabetes and high blood pressure account for significantly higher proportions of these topics than depression and autism [6]. The reason is that diabetes and hypertension are both physiological chronic diseases, generally with obvious physiological symptoms, and it is easier to describe the condition. At the same time, most physiological diseases have the characteristics of more urgent medical diagnosis and treatment, and usually require special drugs and treatment methods. Therefore, the topic of treatment is a more important concern for diabetes and hypertension in social networks. In addition, hypertension and diabetes are more dependent on medication than depression and autism, and medication is the main treatment for diabetes and hypertension, and patients need to take drugs for a long time. Therefore, the topic of drug treatment mainly appears in diabetes and hypertension tweets, and accounts for a large proportion. The way of diagnosis and treatment of psychological diseases such as depression and autism generally come mainly from psychologists, and drugs only play an auxiliary role in treatment.

On the other hand, people with depression and autism are more focused on seeking help and support in social networks, which does not necessarily come from doctors [7], but can also be comfort and encouragement from other patients, volunteers and even unrelated people. At the same time, patients with mental illness, especially those with depression, will often express their personal emotions in social networks, because the anonymity of social media can give patients better protection of users' privacy, so that they can be free from social fear, so as to fully express themselves in cyberspace. In addition, no matter what kind of disease they have, people can share their daily life anytime, anywhere, without worrying about the stigma of the disease.

88.5% of total deaths in China (2019) were due to chronic diseases, highlighting the urgency for AI-driven prevention and control.

LDA Topic Model Generation Process

Data Preprocessing
Text to Word Model
LDA Training
Topic Recognition
Theme Evaluation
Social Support Integration

Chronic Disease Topic Distribution Comparison

Disease Key Topic 1 (Symptom/Description) Key Topic 2 (Treatment/Support)
Diabetes
  • Description of symptoms (43.00%)
  • Preventive controls (22.91%)
  • Medication (18.78%)
  • Everyday life (15.31%)
Hypertension
  • Description of symptoms (44.07%)
  • Everyday life (18.00%)
  • Drug therapy (21.50%)
  • Seek medical advice (16.43%)
Depression
  • Ask for help (40.30%)
  • Emotional expression (23.96%)
  • Diagnosis and treatment (20.71%)
  • Everyday life (15.03%)
Autism
  • Description of symptoms (32.52%)
  • Ask for help (29.53%)
  • Diagnosis and treatment (21.88%)
  • Everyday life (16.09%)

Case Study: AI-Powered Social Network Analysis for Chronic Disease

Challenge: Traditional healthcare struggles to keep pace with chronic disease management, often lacking real-time patient insights and personalized support beyond clinical settings.

AI Solution: Implementing an AI-driven platform that monitors social network discussions related to chronic diseases (diabetes, hypertension, depression, autism). Utilizing LDA for topic modeling to identify prevalent themes (symptoms, treatment, daily life, emotional support) and sentiment analysis for emotional context.

Impact: The platform proactively identifies patients in need of support, tailors educational content based on trending topics, and facilitates connection to relevant resources. For example, recognizing a surge in "seeking help" posts for depression triggers AI to suggest mental health resources, while "medication" discussions for diabetes lead to pharmacist-led Q&A sessions. This direct application of social network data, processed by AI, creates a dynamic, responsive healthcare ecosystem.

Result: Improved patient engagement, timely interventions, and a more holistic understanding of the patient journey outside of clinical visits, leading to better disease management outcomes and reduced healthcare burden.

Calculate Your Potential AI ROI

Estimate the financial and operational benefits of implementing AI solutions in your enterprise, based on our research findings.

Annual Cost Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A typical journey from initial strategy to full-scale AI deployment. We tailor every phase to your unique business needs.

Discovery & Strategy

In-depth analysis of your current operations, data infrastructure, and business objectives to define clear AI use cases and strategic alignment.

Data Preparation & Modeling

Collecting, cleaning, and preparing data. Developing and training custom AI models tailored to your specific problems and datasets.

Pilot & Iteration

Deploying AI solutions in a controlled pilot environment, gathering feedback, and iteratively refining models for optimal performance and integration.

Full-Scale Deployment & Monitoring

Seamless integration of AI systems into your existing workflows. Continuous monitoring, maintenance, and performance optimization to ensure long-term value.

Scaling & Future Innovation

Expanding AI capabilities across your organization, identifying new opportunities, and fostering a culture of continuous AI-driven innovation.

Ready to Transform Your Enterprise with AI?

Our experts are ready to help you navigate the complexities of AI adoption and unlock unparalleled efficiency and growth. Book a personalized consultation today.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking