Skip to main content
Enterprise AI Analysis: Evaluating Chat GPT-4o's Comparative Performance over GPT-4 in Japanese Medical Licensing Examination and Its Clinical Partnership Potential

Medical Education

Evaluating Chat GPT-4o's Comparative Performance over GPT-4 in Japanese Medical Licensing Examination and Its Clinical Partnership Potential

Recent advances in artificial intelligence (AI) have produced ChatGPT-4o, a multimodal large language model (LLM) capable of processing both text and image inputs. Although ChatGPT has demonstrated usefulness in medical examinations, few studies have evaluated its image analysis performance. This study compared GPT-4o and GPT-4 using public questions from the 116th–118th Japanese National Medical Licensing Examinations (JNMLE), each consisting of 400 questions. Both models answered in Japanese using simple prompts, including screenshots for image-based questions. Accuracy was analyzed across essential, general, and clinical questions, with statistical comparisons by chi-square tests. Results show GPT-4o consistently outperformed GPT-4, achieving passing scores in all three examinations. In the 118th JNMLE, GPT-4o scored 457 points versus 425 for GPT-4. GPT-4o demonstrated higher accuracy for image-based questions in the 117th and 116th exams, though the difference in the 118th was not significant. For text-based questions, GPT-4o showed superior medical knowledge, clinical reasoning, and ethical response behavior, notably avoiding prohibited options. Overall, GPT-4o exceeded GPT-4 in both text and image domains, suggesting strong potential as a diagnostic aid and educational resource. Its balanced performance across modalities highlights its promise for integration into future medical education and clinical decision support.

Executive Impact & Key Findings

GPT-4o sets a new benchmark in multimodal AI for medical contexts, demonstrating significant improvements in diagnostic aid and educational potential.

0 GPT-4o Combined Overall Accuracy
0 GPT-4o Prohibited Choices Selected
0 GPT-4o Image-Based Accuracy (Combined)
0 GPT-4o Text-Based Accuracy (Combined)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

This section explores the implications of GPT-4o's performance specifically within the realm of medical education and clinical decision support, highlighting its enhanced capabilities and safety.

90.8% GPT-4o Combined Overall Accuracy Across JNMLE

GPT-4o vs. GPT-4 Performance in JNMLE

Feature GPT-4o Performance GPT-4 Performance
Overall Accuracy
  • Consistently higher (e.g., 90.8% combined)
  • Lower (e.g., 83.3% combined)
Image-Based Questions
  • Significantly higher accuracy in 117th (84.3%) and 116th (89.4%)
  • Lower accuracy in 117th (65.4%) and 116th (75.5%)
Text-Based Questions
  • Superior medical knowledge and clinical reasoning (e.g., 92.7% combined)
  • Lower accuracy (e.g., 86.0% combined)
Prohibited Choices
  • Zero selections across all exams (0.0%)
  • Selected 1-2 per exam, indicating higher risk (up to 22.2%)
Passing Score (All 3 Exams)
  • Achieved passing scores (e.g., 457 points in 118th)
  • Achieved passing scores (e.g., 425 points in 118th)

Enterprise Process Flow

Question Download
Simple Japanese Prompts
Text & Image Input (Screenshots)
GPT-4 & GPT-4o Response Generation
Accuracy & Prohibited Choice Analysis
Statistical Comparison

Enhanced Safety in Clinical Decision-Making

A particularly important finding was GPT-4o's ability to avoid prohibited choices, which represent clinically dangerous decisions. In scenarios where GPT-4 failed to recognize critical pathological findings from images or selected dangerous treatment options, GPT-4o consistently identified abnormal findings and incorporated them into its reasoning, demonstrating improved risk-aware clinical decision support when integrating textual and visual information. This highlights GPT-4o's potential for patient safety-oriented decision-making in high-stakes medical contexts.

Calculate Your Potential AI ROI

Estimate the transformative impact of advanced AI solutions on your enterprise's efficiency and cost savings.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A typical phased approach to integrating advanced AI into your enterprise, ensuring a smooth and successful transition.

Phase 1: Discovery & Strategy

Comprehensive assessment of current workflows, identification of AI opportunities, and development of a tailored implementation strategy.

Phase 2: Pilot & Proof of Concept

Deployment of AI solution in a controlled environment to validate performance, gather feedback, and demonstrate initial ROI.

Phase 3: Integration & Scaling

Seamless integration of AI across target departments, user training, and gradual scaling to maximize enterprise-wide benefits.

Phase 4: Optimization & Future-Proofing

Continuous monitoring, performance optimization, and strategic planning for future AI advancements and expanded applications.

Ready to Transform Your Enterprise with AI?

Unlock new levels of efficiency, innovation, and competitive advantage. Our experts are ready to guide you.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking