SHORT-PAPER

CelebCaption: A Benchmark Dataset for Identity-Sensitive Unlearning in Image Captioning

Machine unlearning seeks to remove the influence of selected training examples without retraining the model from scratch. Recent work has extended this goal to vision-language models, yet existing datasets are not suited for judging whether a sample's influence has truly been erased from learned image-text pairs. Current algorithms often intend to introduce false information into sentences generated after unlearning, which compromises utility. We first establish three criteria that an image-caption unlearning method should meet: Specificity Reduction, Identity Removal, and Performance Preservation. Guided by these criteria, we present CelebCaption, an image-text dataset of 15,000 photographs covering 150 well-known individuals, each linked to four captions that vary in detail (detailed vs. summary) and in the presence of the subject's name. This design enables controlled, quantitative assessment of the proposed unlearning objectives. We benchmark several representative unlearning algorithms on CelebCaption, using both caption quality scores and MIA accuracy as a quantitative unlearning metric, and observe that current methods fail to achieve their privacy objectives. Our unlearning criteria and dataset provide a focused, reproducible testbed for advancing privacy-aware image captioning. Our CelebCaption dataset is publicly available at https://github.com/DASH-Lab/CelebCaption.

Schedule Your Strategy Session

Executive Impact Summary

Key takeaways from the article demonstrate that machine unlearning in image captioning requires new benchmarks like CelebCaption to effectively address privacy concerns by removing identity traces while preserving model utility, as current methods fall short.

106 Total Downloads

0 Total Citations

15,000 Images in Dataset

150 Individuals Covered

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Insights for SHORT-PAPER

Explore the core findings and enterprise applications relevant to this research area.

Enterprise Process Flow

Input Image

→

Process & Extract Entities (PIE)

→

Apply Unlearning Mechanism

→

Generate Output Caption

Unlearning Objectives Breakdown
Objective	Description	Key Goal
Specificity Reduction (SR)	Captions for forgotten data should lose fine-grained visual details.	Avoid residual cues exposing memorized information.
Identity Removal (IR)	Direct or indirect identity references for forgotten subjects must be eliminated.	Ensure person in forgotten image cannot be identified.
Performance Preservation (PP)	Model's ability to generate accurate, fluent captions for retain data must be maintained.	Preserve overall caption quality and factuality for retain set.

70% Highest MIA Accuracy for MultiDelete, indicating significant membership signal leakage post-unlearning.

Current Unlearning Methods: Failure Points

Qualitative analysis reveals that methods like Finetune, MultiDelete, and SCRUB often leak identities and vivid details from forgotten images. Conversely, Gradient Ascent (GA) and GA + Mismatch, while attempting identity removal, frequently produce repetitive and ungrammatical captions, severely degrading utility. This critical trade-off between privacy and utility highlights the unsolved challenges in achieving effective identity-sensitive unlearning in image captioning.

60,000 Total captions (four per image) crafted to isolate identity cues and specificity levels.

Ethical Design of CelebCaption

The CelebCaption dataset is meticulously designed to enhance user privacy. It employs a multi-stage filtering process to ensure legal compliance and research suitability, avoiding unsafe content. By addressing the risk of incomplete unlearning and including Performance Preservation as a key objective, the benchmark actively works to prevent model degradation and ensure practical utility while advancing privacy-aware AI.

Discuss Deep Analysis

Advanced ROI Calculator

Estimate your potential return on investment by implementing AI solutions tailored to your enterprise needs. Adjust the parameters below to see the impact.

Your Industry

Number of Employees (Impacted by AI)

Avg. Hours/Week (Saved per Employee)

Avg. Hourly Rate ($)

Annual Savings $0

Hours Reclaimed Annually 0

Calculate Your AI ROI

Your AI Implementation Roadmap

Our proven methodology guides your enterprise through a seamless AI integration, from strategic planning to sustained impact.

Phase 01: Discovery & Strategy

Comprehensive analysis of your current operations, identification of AI opportunities, and development of a tailored strategy blueprint.

Phase 02: Pilot & Proof of Concept

Deployment of targeted AI solutions in a controlled environment to validate effectiveness and gather initial performance metrics.

Phase 03: Full-Scale Integration

Seamless integration of validated AI solutions across your enterprise, ensuring minimal disruption and maximum adoption.

Phase 04: Optimization & Scaling

Continuous monitoring, performance optimization, and strategic scaling of AI initiatives to expand impact and ROI.

Discuss Your Implementation

Ready to Transform Your Enterprise?

Let's connect to tailor a strategy that aligns with your business objectives and leverages the full potential of AI.

Book a Free Consultation

SHORT-PAPER

CelebCaption: A Benchmark Dataset for Identity-Sensitive Unlearning in Image Captioning

Executive Impact Summary

Deep Analysis & Enterprise Applications

Insights for SHORT-PAPER

Enterprise Process Flow

Current Unlearning Methods: Failure Points

Ethical Design of CelebCaption

Advanced ROI Calculator

Your AI Implementation Roadmap

Phase 01: Discovery & Strategy

Phase 02: Pilot & Proof of Concept

Phase 03: Full-Scale Integration

Phase 04: Optimization & Scaling

Ready to Transform Your Enterprise?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai