Enterprise AI Analysis
Artificial intelligence, data sharing, and privacy for retinal imaging under Brazilian Data Protection Law
The integration of artificial intelligence (AI) in healthcare has revolutionized various medical domains, including radiology, intensive care, and ophthalmology. However, the increasing reliance on AI-driven systems raises concerns about bias, particularly when models are trained on non-representative data, leading to skewed outcomes that disproportionately affect minority groups. Addressing bias is essential for ensuring equitable healthcare, necessitating the development and validation of AI models within specific populations. This viewpoint paper explores the critical role of data in AI development, emphasizing the importance of creating representative datasets to mitigate disparities. It discusses the challenges of data bias, the need for local validation of AI algorithms, and the misconceptions surrounding retinal imaging in ophthalmology. Additionally, highlights the significance of publicly available datasets in research and education, particularly the underrepresentation of low- and middle-income countries in such datasets. The Brazilian General Data Protection Law is also examined, focusing on its implications for research and data sharing, including the legal and ethical measures required to safeguard data integrity and privacy. Finally, the manuscript underscores the importance of adhering to the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) to enhance data usability and support responsible AI development in healthcare.
Executive Impact: Navigating AI in Healthcare
The integration of AI in healthcare presents both immense opportunities and significant challenges, particularly concerning data governance and equitable outcomes. Understanding these key metrics provides a glimpse into the strategic considerations for successful and responsible AI deployment.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Understanding AI Bias in Healthcare
Over recent years, artificial intelligence (AI) algorithms have been increasingly designed and implemented in various facets of daily activities. Notably, the healthcare sector has emerged as one of the main focus of research and investment, where AI-enabled systems have made significant strides across diverse domains, such as radiology, and intensive care, with ophthalmology as a promising field [1-4]. This paper explores the issue of bias in AI within healthcare, emphasizing the importance of representative datasets, ethical data sharing, and global collaboration to ensure the development of responsible and equitable AI technologies.
Impact of Biased AI Algorithms
In healthcare, the utilization of AI models trained on non-representative data can lead to dangerous outcomes, with potentially skewed and prejudiced results that disproportionately affect minority groups [5-7]. Bias in healthcare AI is an intricate, multifaceted, demanding, concerted effort to mitigate these risks effectively [8]. Data bias, deriving from the developing algorithms trained on non-representative datasets, stands out as a critical concern in the field of AI development. Data is the cornerstone upon which AI advancements are built and the description of the included data for algorithm training and of datasets are critical for improving reproducibility and identify possible biases in AI development [9, 10]. The creation and use of representative, well-balanced datasets play a pivotal role in facilitating algorithm development, fostering reproducibility in studies, and enabling local validation processes, all of which are important for reducing disparities among different sub-populations [8, 11].
The Challenge of Generalizability and Local Validation
In evaluating real-world performance, it is important to validate algorithms within the specific target population [12]. Algorithmic high-performance training and validating metrics do not inherently guarantee generalizability. Therefore, before deploying any AI-enabled system, engaging in rigorous local validation efforts is paramount. As an illustrative example, pneumonia screening and diabetic retinopathy algorithms exhibit high algorithmic accuracy but perform poorly when applied to data from external institutions, even when those institutions are within the same country [13, 14]. More examples of AI bias in healthcare highlight the risks of relying on non-representative data. For instance, cardiovascular disease prediction models trained predominantly on populations from high-income countries often fail to account for the genetic, lifestyle, and environmental differences present in Latin American populations. These biases can lead to inaccurate risk assessments, resulting in inappropriate treatments or delayed care. Likewise, AI-driven diagnostic tools for skin conditions frequently struggle to accurately diagnose diseases in patients with darker skin tones [15, 16]. Furthermore, AI algorithms developed for radiology may overlook disease markers that manifest differently in Latin American patients, reinforcing the importance of using local data to ensure these systems work equitably across diverse populations [13].
Retinal Imaging Misconceptions
In ophthalmology, there is a popular misconception regarding retinal images and iris images, corroborated by movies and TV series eye scanners [13]. While retinal images are unique to each individual, a specialized camera is needed to take them. Also, the lack of a dataset linking these scans to personal information significantly reduces the risk of reidentification. As a result, the likelihood of uncovering new information about an individual from sharing retinal scans is minimal [17].
Data Sharing and Open Data for Research
Publicly available datasets play a central role in facilitating various aspects of research, education, and bias assessment, offering a valuable alternative to the considerable costs and challenges associated with developing databases. However, the landscape of publicly accessible datasets predominantly features a representation of high-income countries, with a scarcity of datasets originating from low- and middle-income countries (LMICs) [18-20]. Latin America and Brazil remain underrepresented in publicly available datasets. In Brazil's specific case, most datasets are sourced from governmental entities such as Brazilian DATASUS and the Institute of Geography and Statistics databases [18]. For ophthalmological datasets, the majority originate from high-income countries (HICs) [19]. Within Brazil, the retinal datasets are from São Paulo and Bahia [21-23] and include only retinal fundus photos modality. There is a need for greater diversity and inclusivity in dataset availability. Furthermore, international initiatives such as the European Data Governance Act, the United States National Institute of Health, and Dutch ZonMw are actively promoting open science [11, 24, 25]. These efforts hold the promise of fostering collaboration and knowledge exchange, leading to a more comprehensive and inclusive landscape of accessible datasets.
Brazilian General Data Protection Law (LGPD)
In Brazil, the Brazilian General Data Protection Law (LGPD) is responsible for protecting the rights of privacy and personal data of individuals in Brazil and regulating the processing of personal data by organizations, both domestic and foreign, operating in the country [26]. The legislative process started in 2010, was enacted in August 2018, and came into effect in September 2021. Throughout this period, there were consultations, debates, and revisions to the draft legislation, involving various government agencies, experts, and stakeholders, to ensure that it aligned with international standards and addressed the specific privacy concerns of the Brazilian population. The LGPD bears similarities to the European General Data Protection Regulation (GDPR) in its objectives and principles. Article 5 defines personal data as any information related to an identified or identifiable natural person. It encompasses identifying information, contact details, biometric data, financial information, and sensitive personal data. Sensitive personal data refers to information related to an individual's racial or ethnic origin, religious beliefs, political opinions, health data, genetic or biometric data, sexual orientation, or criminal records. The LGPD establishes clear guidelines regarding the use of personal data for research purposes, emphasizing that such data should exclusively serve the original purpose for which it was collected, particularly in the context of scientific research. Obtaining written consent represents the primary mechanism for the lawful collection and processing of research data.
Research and Data Sharing under LGPD
Research groups under the LGPD are organizations devoid of financial interests and dedicated to conducting research with objectives spanning historical, scientific, technological, or statistical domains. These groups are driven by a commitment to both fundamental and applied research, aligning with the overarching goals of advancing knowledge and understanding. The LGPD articles 7 and 11 establish that in instances related to public health research, researchers may access personal data without direct patient communication and consent, provided that the data is securely stored and pseudonymized. Pseudonymization ensures that individuals' identities remain safeguarded while allowing valuable research to occur. It also consists of data protection techniques to replace identifiable information with encrypted information, allowing for more secure data analysis. It's important to note that de-identified data falls outside the LGPD's and facilitates compliant data sharing. De-identified data pertains to information that has undergone rigorous processing techniques, rendering it incapable of identifying specific individuals. This practice aligns with the LGPD's provisions and fosters responsible data utilization for research purposes. However, it is essential to implement appropriate legal and ethical measures to protect the confidentiality and security of data, regardless of whether it is considered personal or de-identified [11]. Safeguarding the integrity and privacy of data ensures compliance with the LGPD and promotes responsible data handling practices.
Data Usability and FAIR Principles
Beyond data availability, it is essential to establish a robust infrastructure that fosters the reusability of data, guided by the FAIR principles—Findability, Accessibility, Interoperability, and Reusability [27]. Findability ensures that data can be located swiftly and effectively by both humans and computer systems. Accessibility guarantees that data is readily retrievable and accessible for download or use. Interoperability ensures that data is formatted in a manner that facilitates seamless integration with other datasets. Additionally, data should be meticulously documented and prepared to support its reuse in research, complete with comprehensive metadata and detailed descriptions.
Brazilian Publicly Available Datasets: FAIR Concepts
| Feature | Sistema de Internação Hospitalar | Pesquisa Nacional por Amostra de Domicílio | BRAX | mBRSET |
|---|---|---|---|---|
| Findability | datasus.saude.gov.br |
www.ibge.gov.br |
https://physionet.org/content/brax/1.1.0/ |
https://physionet.org/content/mbrset/1.0/ |
| Accessibility | Publicly available |
Publicly available |
Credentialed access |
Credentialed access |
| Interoperability | Comma-separated values file with tabular data |
HTML, JSON, ODS, XML file with tabular data |
Comma-separated values file with tabular labels and DICOMS. |
Comma-separated values file with JPEG files. |
| Reusability | Meta-data dictionary |
Meta-data dictionary |
Meta-data dictionary, GitHub repository |
Meta-data dictionary, GitHub repository |
Data Standardization for Healthcare AI
Data standardization is crucial for information exchange, enabling effective communication between various devices and information systems, such as electronic health records (EHRs), imaging devices, and AI systems [28, 29]. This standardization is not only vital for ensuring consistent patient care but also plays a critical role in the development and integration of AI systems. In the context of ophthalmology, for retinal fundus photography and optical coherence tomography (OCT) exams, the American Academy of Ophthalmology recommends adopting the Digital Imaging and Communications in Medicine (DICOM) format [30]. DICOM is the standard for handling, storing, printing, and transmitting information in medical imaging, which ensures uniformity across different devices and platforms. However, the DICOM format presents several challenges. It lack compatibility across different devices, making it difficult to exchange data between systems. Extracting DICOM files from storage systems can be cumbersome, and the files themselves need to be deidentified to protect patient privacy, as they contain sensitive metadata. Moreover, DICOM is a complex format, which can pose implementation challenges, particularly for smaller healthcare providers or systems with limited technical resources [31]. As an alternative, lossless compression formats have been suggested to encode the same data contained in DICOM files. These formats maintain the integrity of the data while potentially offering more straightforward implementation and compatibility across different platforms [30]. Fast Healthcare Interoperability Resources (FHIR) is a standard for healthcare data exchange developed by Health Level Seven International [32]. It is designed to enable interoperability between different healthcare systems by providing a framework for structuring and sharing data in a way that is both easy to implement and scalable. FHIR plays a critical role in healthcare AI development, as it facilitates the data integration from sources like EHRs, imaging systems, and clinical databases, which are essential for AI models. While FHIR helps address interoperability challenges, its adoption in LMICs faces hurdles. It requires digital maturity, such as EHR systems and technical expertise, which are often lacking in rural areas. Additionally, FHIR doesn't automatically ensure data quality, and poor-quality data can still impact AI model performance. Strong data governance, including encryption and authentication, is also necessary to protect sensitive health information when using FHIR.
Federated Learning for Secure Collaboration
Federated learning (FL) is a decentralized approach to machine learning that allows institutions to train AI models collaboratively without sharing data. Instead, only model updates are sent to a central server, addressing privacy concerns in fields like healthcare [33, 34]. One of the major promises of federated learning lies in its ability to leverage diverse datasets across regions and institutions, enabling the development of more generalizable and robust AI models. This could significantly benefit LMICs, where data is often scarce and access to high-quality healthcare datasets is limited. By allowing institutions in LMICs to collaborate without compromising patient privacy, FL opens new possibilities for creating AI systems that reflect the diverse health profiles of these populations. However, FL faces challenges in LMICs, such as limited infrastructure, including high-speed internet and computational resources [34]. The continuous communication required for FL can strain network bandwidth, and inconsistent data quality may affect model performance. Additionally, although data stays local, FL is still vulnerable to adversarial attacks that could extract sensitive information from model updates [35].
Discussion: Towards Responsible AI in Healthcare
The rising integration of artificial intelligence in healthcare has brought about significant advancements across various medical domains, yet it has also highlighted the critical issue of bias in AI algorithms. Addressing bias is a complex challenge, but one essential solution lies in the sharing of representative datasets. These datasets enable the development and validation of AI models within the specific populations they aim to serve, reducing the risk of biased outcomes. However, the landscape of publicly available datasets is heavily skewed toward high-income countries, leaving a significant gap in representation, particularly for LMICs [19, 20]. Latin America, including Brazil, faces underrepresentation in available datasets, further emphasizing the need for comprehensive data-sharing efforts [18]. The underrepresentation of LMIC datasets has a direct impact on healthcare outcomes. AI models trained predominantly on data from HICs, where healthcare infrastructure, disease prevalence, and patient demographics differ, may not perform effectively for LMIC populations. For instance, AI models developed for diabetic retinopathy screening using data from North America or Europe may not account for the distinct disease progression patterns observed in Brazilian patients, leading to misdiagnoses or reduced screening efficacy. In Brazil, these challenges are further compounded by healthcare disparities. Publicly available datasets are mostly derived from urban centers, leaving rural and underserved regions underrepresented. This imbalance exacerbates health inequities, as AI tools deployed across the country may not be adequately validated for use in all regions, leading to inaccurate diagnoses and treatment in remote or low-income areas. Smaller healthcare providers, particularly those serving rural populations, would benefit most from AI models trained on data that reflect their unique patient demographics. These underserved regions, where healthcare systems face significant challenges such as limited access to specialized care and medical infrastructure, present an opportunity for AI to bridge gaps in healthcare delivery. However, the absence of representative data from these regions diminishes AI’s potential in such settings. To address these issues, improving dataset diversity is essential. Establishing partnerships among government agencies, academic institutions, and healthcare providers could facilitate the development of more inclusive datasets that better represent the full spectrum of patient demographics in Brazil and other LMICs. These collaborations would ensure that data from rural and underserved regions are incorporated, enhancing the generalizability and fairness of AI models. In addition to diverse datasets, robust legal frameworks are crucial. The enactment of data protection laws like the Brazilian General Data Protection Law aligns with global efforts to safeguard individual privacy rights. The LGPD defines personal data, outlines guidelines for research data usage, and emphasizes the importance of obtaining written consent for lawful data collection. It also recognizes the significance of pseudonymization in enabling secure access to data for public health research while protecting individuals’ identities. Improving data-sharing practices is equally important. Establishing secure, ethical systems for data exchange will accelerate AI advancements in healthcare. Brazil should participate in international open science initiatives, adopting global standards for data sharing while adhering to local privacy regulations. Such collaboration would enable broader access to knowledge and contribute to the development of fairer AI technologies. Future research should focus on validating AI algorithms within the populations they are designed to serve [12]. Assessing the real-world performance of AI models across different Latin American regions is crucial for identifying and mitigating bias. Furthermore, interdisciplinary collaboration among data scientists, healthcare professionals, and legal experts is necessary to address the ethical and technical challenges of AI development, fostering fairness, transparency, and accountability in healthcare systems. Innovative methods for collecting diverse data in resource-limited settings should also be explored. Mobile health technologies and telemedicine offer opportunities to gather high-quality data from underrepresented regions, such as rural Brazil, improving the inclusivity and performance of AI models. Finally, the distinction between pseudonymized and de-identified data is crucial and facilitates compliant data sharing. Nonetheless, irrespective of data type, implementing robust legal and ethical measures to safeguard data integrity and privacy remains paramount. These practices ensure alignment with the LGPD and support responsible data use in research and beyond. A coordinated effort by governments, academia, and private entities will be vital to ensure equitable AI development, ultimately leading to improved healthcare outcomes in Brazil and globally.
Calculate Your Potential AI Impact
Estimate the significant efficiency gains and cost savings AI can bring to your healthcare operations. Adjust the parameters to see a customized ROI projection.
Strategic AI Implementation Roadmap
A phased approach ensures responsible and effective AI deployment, focusing on ethical data handling, robust validation, and continuous improvement within healthcare.
Data Strategy & Compliance Assessment
Evaluate existing data infrastructure, assess compliance with regulations like LGPD, and define a comprehensive data strategy for AI development.
Representative Dataset Curation & Preprocessing
Identify and curate diverse, representative datasets, focusing on ethical acquisition, pseudonymization, and robust preprocessing to mitigate bias.
AI Model Development & Local Validation
Develop and train AI models using curated data, emphasizing rigorous local validation within target populations to ensure generalizability and equitable performance.
Integration with Healthcare Systems & Monitoring
Seamlessly integrate validated AI models into clinical workflows, establishing robust monitoring mechanisms for performance, bias, and patient outcomes.
Continuous Bias Mitigation & Governance Review
Implement ongoing processes for identifying and mitigating emerging biases, regularly reviewing data governance policies, and adapting models for sustained ethical and effective use.
Ready to Build Equitable AI in Healthcare?
Navigate the complexities of AI implementation with expert guidance, ensuring your solutions are compliant, effective, and ethically sound.