SHORT-PAPER
CelebCaption: A Benchmark Dataset for Identity-Sensitive Unlearning in Image Captioning
Machine unlearning seeks to remove the influence of selected training examples without retraining the model from scratch. Recent work has extended this goal to vision-language models, yet existing datasets are not suited for judging whether a sample's influence has truly been erased from learned image-text pairs. Current algorithms often intend to introduce false information into sentences generated after unlearning, which compromises utility. We first establish three criteria that an image-caption unlearning method should meet: Specificity Reduction, Identity Removal, and Performance Preservation. Guided by these criteria, we present CelebCaption, an image-text dataset of 15,000 photographs covering 150 well-known individuals, each linked to four captions that vary in detail (detailed vs. summary) and in the presence of the subject's name. This design enables controlled, quantitative assessment of the proposed unlearning objectives. We benchmark several representative unlearning algorithms on CelebCaption, using both caption quality scores and MIA accuracy as a quantitative unlearning metric, and observe that current methods fail to achieve their privacy objectives. Our unlearning criteria and dataset provide a focused, reproducible testbed for advancing privacy-aware image captioning. Our CelebCaption dataset is publicly available at https://github.com/DASH-Lab/CelebCaption.
Executive Impact Summary
Key takeaways from the article demonstrate that machine unlearning in image captioning requires new benchmarks like CelebCaption to effectively address privacy concerns by removing identity traces while preserving model utility, as current methods fall short.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Insights for SHORT-PAPER
Explore the core findings and enterprise applications relevant to this research area.
Enterprise Process Flow
| Objective | Description | Key Goal |
|---|---|---|
| Specificity Reduction (SR) | Captions for forgotten data should lose fine-grained visual details. |
|
| Identity Removal (IR) | Direct or indirect identity references for forgotten subjects must be eliminated. |
|
| Performance Preservation (PP) | Model's ability to generate accurate, fluent captions for retain data must be maintained. |
|
Current Unlearning Methods: Failure Points
Qualitative analysis reveals that methods like Finetune, MultiDelete, and SCRUB often leak identities and vivid details from forgotten images. Conversely, Gradient Ascent (GA) and GA + Mismatch, while attempting identity removal, frequently produce repetitive and ungrammatical captions, severely degrading utility. This critical trade-off between privacy and utility highlights the unsolved challenges in achieving effective identity-sensitive unlearning in image captioning.
Ethical Design of CelebCaption
The CelebCaption dataset is meticulously designed to enhance user privacy. It employs a multi-stage filtering process to ensure legal compliance and research suitability, avoiding unsafe content. By addressing the risk of incomplete unlearning and including Performance Preservation as a key objective, the benchmark actively works to prevent model degradation and ensure practical utility while advancing privacy-aware AI.
Advanced ROI Calculator
Estimate your potential return on investment by implementing AI solutions tailored to your enterprise needs. Adjust the parameters below to see the impact.
Your AI Implementation Roadmap
Our proven methodology guides your enterprise through a seamless AI integration, from strategic planning to sustained impact.
Phase 01: Discovery & Strategy
Comprehensive analysis of your current operations, identification of AI opportunities, and development of a tailored strategy blueprint.
Phase 02: Pilot & Proof of Concept
Deployment of targeted AI solutions in a controlled environment to validate effectiveness and gather initial performance metrics.
Phase 03: Full-Scale Integration
Seamless integration of validated AI solutions across your enterprise, ensuring minimal disruption and maximum adoption.
Phase 04: Optimization & Scaling
Continuous monitoring, performance optimization, and strategic scaling of AI initiatives to expand impact and ROI.
Ready to Transform Your Enterprise?
Let's connect to tailor a strategy that aligns with your business objectives and leverages the full potential of AI.