Enterprise AI Analysis

Where is the Watermark? Interpretable Watermark Detection at the Block Level

Maria Bulychev, Neil G. Marchant, Benjamin I. P. Rubinstein

University of Melbourne — Contact Authors

Executive Impact: Why This Matters for Your Business

Problem: Existing image watermarking schemes act as black boxes, lacking localised detection and transparency, which hinders user trust and tamper assessment, especially with the rise of AI-generated content and misinformation risks.

Solution: We propose MELB (Multidimensional Embedding via Localised Blocking), a post-hoc image watermarking method that uses a statistical block-wise strategy in the Discrete Wavelet Transform (DWT) domain. This provides localised embedding and generates interpretable detection maps.

Key Benefits for Your Enterprise

Generates detection maps highlighting specific watermarked/altered regions.
Achieves strong robustness against common image transformations (e.g., cropping up to half the image).
Remains sensitive to semantic manipulations.
Watermarks are highly imperceptible.
Offers more interpretable detection compared to prior methods.

0 Watermark Imperceptibility (PSNR)

0 Cropping Robustness (TPR)

High Localised Tamper Detection

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Abstract

Recent advances in generative AI have enabled the creation of highly realistic digital content, raising concerns around authenticity, ownership, and misuse. While watermarking has become an increasingly important mechanism to trace and protect digital media, most existing image watermarking schemes operate as black boxes, producing global detection scores without offering any insight into how or where the watermark is present. This lack of transparency impacts user trust and makes it difficult to interpret the impact of tampering. In this paper, we present a post-hoc image watermarking method that combines localised embedding with region-level interpretability. Our approach embeds watermark signals in the discrete wavelet transform domain using a statistical block-wise strategy. This allows us to generate detection maps that reveal which regions of an image are likely watermarked or altered. We show that our method achieves strong robustness against common image transformations while remaining sensitive to semantic manipulations. At the same time, the watermark remains highly imperceptible. Compared to prior post-hoc methods, our approach offers more interpretable detection while retaining competitive robustness. For example, our watermarks are robust to cropping up to half the image.

Introduction

Artificial intelligence can generate realistic and high-quality digital content, making it increasingly challenging to distinguish between human-created and machine-generated content. Modern diffusion models like Midjourney, Stable Diffusion, and DALL-E have revolutionised content creation with their ability to create an almost limitless range of novel images, producing everything from photo-realistic imagery to artwork of any style and aesthetic [27]. The spread of AI-generated content poses significant risks, including the potential for misinformation, intellectual property theft, and the blurring of lines between authentic and fabricated media [37]. For example, the European Union's AI Act imposes a transparency obligation for generative AI: providers must ensure that AI-generated text, images, video or audio are marked in a machine-readable way and identifiable as artificially generated [35].

Numerous security mechanisms have been developed to provide a means to assert ownership, verify authenticity, and protect intellectual property rights [30, 33]. However, most users share personal photos on social media without protection, unaware of the risks this poses. Unprotected images can be easily downloaded and manipulated, allowing malicious actors to create misleading or defamatory content and deepfakes. To address these issues, the research community has been exploring various solutions, with watermarking emerging as a prominent technique.

Watermarking involves embedding a unique signal into the original source, which, while subtle enough to remain invisible in normal use, can be extracted to validate multimedia content [30]. The process consists of two main phases: embedding and detection. In the embedding phase the watermark information is incorporated into the host data using techniques like modifying pixel values [2], frequency coefficients [8], or other signal properties [12]. Detection involves analysing the potentially watermarked content to verify the presence of the watermark and extract any hidden information, which can be done through statistical analysis [17], correlation [8], or pattern-matching methods [5].

Most image watermarking schemes are treated as black-box systems that output either a binary decision (whether an image is watermarked or not) or a score for the entire input without any explanation [31]. This lack of interpretability can be problematic for users who may wish to understand why a particular input is classified as non-watermarked. For instance, an image that was originally generated with a watermark may no longer be classified as such after certain regions are edited. However, existing schemes do not indicate whether the absence of a watermark is due to localised modifications of a watermarked image or due to the whole image not being watermarked in the first place. Yet, many modern watermarking schemes [34, 39, 42, 43] rely on deep neural networks (DNNs), which behave like black boxes and offer no clear reasoning for their decisions. Although explainability is widely seen as essential for trustworthy AI [9], current DNN-based detectors lack reliable methods to provide human-understandable explanations primarily because deep neural networks themselves are not inherently interpretable [32].

This context highlights the need for watermarking schemes that offer region-level interpretability, not only detecting whether tampering has occurred, but also where. Targeting this gap, we propose MELB Multidimensional Embedding via Localised Blocking a watermarking method that combines localised embedding, interpretability, and robustness to common image manipulations. Our main contributions are as follows: • We introduce a block-wise image watermarking method that embeds watermark signals into the discrete wavelet transform (DWT) domain using a statistical partitioning approach. This design enables robustness against common manipulations such as compression, noise, and blur, while remaining sensitive to semantically meaningful changes in the image structure (e.g., warping or shape modification). • In contrast to traditional schemes that produce only a global score, our block-wise design allows for the generation of a detection map that highlights the distribution of watermark presence across the image. This enables users to identify which parts of an image have likely been altered and which remain authentic, providing transparency and localised tamper assessment. • Our watermarks are highly imperceptible, with consistently strong results across standard similarity metrics, mostly exceeding prior approaches. Additionally, our embedding process supports explicit control over the trade-off between invisibility and robustness. • Our method demonstrates robustness to common image modifications comparable to state-of-the-art post-hoc watermarking methods and shows notably stronger performance under cropping attacks. Even when up to 50% of the image is removed, detection remains reliable, with observed true positive rates of up to 91%.

Related Work

Digital Image Watermarking. Traditional watermarking methods [8, 42] embed watermarks post-hoc, working with existing images after their creation. With the development of diffusion models [14], research has expanded to include in-processing watermarking methods [11, 39] that embed the watermark during the generation process. These integrative approaches show promise in embedding signals deep within the image's semantic structure. However, their model-specific nature limits generalisability and applicability to existing images. Our post-hoc approach offers greater flexibility, accommodating AI-generated images from any model as well as natural images.

Post-hoc watermarking techniques [10, 19, 34, 42, 43] can be further distinguished between spatial domain and transform domain methods [3, 13]. Additionally, deep learning-based methods extend beyond these domains by leveraging latent feature spaces for watermark embedding [11, 20]. Spatial domain methods [41] operate directly on pixel values, offering low computational complexity but generally lower robustness to image processing operations [3, 30]. The spatial domain refers to the representation of an image as an array of pixels [41]. By contrast, transform domain methods, such as those using the Discrete Wavelet Transform (DWT) [40], typically exhibit higher robustness to image alterations and compression.

Recent advances in deep neural networks (DNNs) have led to learning-based watermarks that push the boundaries of robustness and imperceptibility. StegaStamp [34] embeds invisible messages into photographs by training with differentiable image perturbations for noise robustness and using a spatial transformer to account for geometric distortions from printing and re-capturing. RivaGAN [42] is a video watermarking method that uses an attention-guided dual-stream architecture to learn optimal watermark placement. The method's robustness is enhanced via adversarial training, where a critic ensures visual quality and a removal network continuously challenges the watermark's integrity. Other recent work [23, 29] explores interpretable and robust watermarking for ownership verification. While these DNN-based approaches offer strong robustness, they lack the ability to provide localised detection, which is a key feature of our proposed method.

Localised Watermarking. Despite the potential for enhanced interpretability and forensic utility, localised image watermarking—where a watermark's presence can be verified for specific image regions—has seen limited research to date. An early example is the hierarchical block-based approach of Utku Celik et al. [36], which introduced fragile watermarks with block-level tamper localisation using least significant bit embedding. More recently, EditGuard [44] introduced a dual watermarking strategy that embeds both localisation and copyright watermarks. While effective for tamper detection, this method is not robust to geometric augmentations such as cropping. Building on this, in concurrent work Sander et al. [25] proposed WAM, a localised watermarking method achieving strong robustness. However, WAM is designed to preserve the watermark even after severe alterations that significantly degrade the image, a feature that can be undesirable for image forensics. Our watermark, in contrast, is intentionally fragile and designed to break under such strong editing, providing a clearer signal that the image has been heavily manipulated. Furthermore, while pre-trained models for WAM are available, its detector is a transformer model with approximately 100 million parameters. Our method is training-free, using a lightweight detector based on projections in DWT space. In contrast to both EditGuard and WAM, our work thus offers a computationally efficient approach to localised detection that is robust to common manipulations but fragile to severe, perceptually obvious ones. The concept of localised detection has seen limited exploration in other domains as well. For instance, AudioSeal [24] tackles localised watermarking for AI-generated speech, providing sample-level resolution in longer recordings.

Statistical Detection of Randomised Partitions. Kirchenbauer et al. [17] introduced a simple but innovative technique that embeds a watermark into the output of large language models by subtly modifying token probabilities during generation. This is achieved by partitioning the vocabulary at each step evenly into two distinct groups ("Green List" and "Red List"), where the likelihood of Red List tokens appearing in the generated text is reduced, while Green List tokens are boosted. A similar watermark was proposed for tabular data [46] where the watermark is embedded by perturbing specific cells ("key cells"), selecting the perturbation values from the "Green Domains". Our work builds upon these statistical partitioning concepts, adapting them to the image domain. This approach enables us to achieve localised detection, a feature notably absent in most existing image watermarking schemes, while maintaining a balance between imperceptibility and robustness.

Preliminaries

Our watermarking framework embeds a robust, yet imperceptible signal into images, allowing owners to identify their intellectual property without compromising visual quality. It also provides an interpretable detection map that visually highlights areas of potential modification (such as through insertions, deletions, or warping). This addresses scenarios on image-sharing platforms like Facebook or Instagram, where users post content without protection, risking unintended alterations or redistribution [45].

The framework is designed to maintain watermark integrity under common image editing operations. This robustness ensures that the connection to the original creator is preserved even when the image undergoes benign modifications or is subject to attempts at removal, aiding in authentication and traceability.

Our treatment of digital image watermarking introduces an additional aspect: localised detection. Unlike traditional watermarking schemes that provide only a global decision for an entire image, our formulation allows for fine-grained, interpretable watermark detection at the local level. We focus on post-hoc watermarking, where watermarks are embedded into any image, regardless of its origin—traditional, photographic, or AI-generated—offering broad utility in diverse scenarios.

3.1. Problem Formulation

An image is a three-dimensional array I ∈ Vh×w×c where h is the height, w is the width, c is the number of channels, and V is the range of intensity values. For 24-bit colour images, V = {0,1,... 255} and c = 3 (e.g., corresponding to RGB). We write Ii,j,k to refer to the intensity of the k-th channel at spatial location (i, j), where i indexes rows (top to bottom) and j indexes columns (left to right).

An image watermarking scheme consists of two fundamental algorithms: an embedding algorithm and a detection algorithm. These algorithms are typically operated by a single entity, which we will call the watermarker.

Definition 1. The embedding algorithm maps an original image I ∈ Vh×w×c to a watermarked image I* = embed(I, sk) using a secret key sk known only to the watermarker.

Definition 2. The detection algorithm detect(Î, sk) takes a possibly watermarked image Î ∈ yh×w×c and secret key sk. This yields two outputs: a global detection score d∈ [0,1] which indicates whether the entire image is watermarked, and a localised detection map V∈ [0,1]h×w quantifying the likelihood that localised regions are watermarked. These scores can be thresholded, so that a positive detection is represented by 1, while a negative detection is represented by 0.

We now summarise the design desiderata for our image watermarking scheme:

Imperceptibility: The watermarked image I* should be visually indistinguishable from the original image I. Formally, for some perceptual distance function d and threshold €, we require d(I, I*) < є.
Blindness: The watermark should be detectable without access to the original image.
Localised detection: Watermark detection should highlight localised factors that influence detection decisions.
Soundness: The false positive rate 8 of the detection algorithm should be negligible. For I a distribution of non-watermarked images, Pr1~1[detect(I, sk) ≠ 0] < δ where we are implicitly applying a threshold to global detection scores.
Robustness: The watermark should be detectable even after common image transformations T: detect(T(I*), sk) ≈ detect(I*, sk).

Threat Model To define the scope of our threat model, we describe the abilities of attackers as follows. 1) We assume that attackers have full access to watermarked images and 2) are aware of the general process of the watermark embedding and detection algorithms, but 3) do not know secret information, such as the secret key (see Sec. 4.1). Moreover, we assume that potential attackers 4) aim to modify images while preserving their visual quality. For example, when adding noise, the modifications are expected to remain subtle and imperceptible to the human eye. In cases of blurring or compression, the objective is to maintain the image's core content and semantics. An observer should still be able to recognise and understand the primary subject.

MELB: Multidimensional Embedding via Localised Blocking

In this section, we present the algorithms of our watermarking method in detail. The process is illustrated in Fig. 1 and involves two stages: embedding and detection.

4.1. Watermark Embedding

Our watermark embedding method operates in the transform domain, specifically using the Discrete Wavelet Transform (DWT). The key idea is based on the concept of statistical partitioning of DWT coefficients into allowed ("green") and disallowed (“red") regions. Further background on the DWT is provided in App. A.1, and pseudocode is provided in Algorithm 1 of App. B.1.

We first divide the image into non-overlapping blocks of size m x m, ignoring incomplete edge blocks. For each block, we generate a seed using a generic get_seed function (e.g., in our experiments, this function returns the rounded mean intensity of the block per channel). This seed, along with a random secret key sk, is passed as input to a pseudorandom function (PRF), which returns a pseudo-random binary sequence. The resulting binary sequence determines the partitioning of DWT coefficients for the block into red (0) or green (1).

The actual embedding occurs at the sub-block level, using k×k sub-blocks (k < m) within each larger block. This reduces the visibility of block-wise artefacts in the watermarked image, as larger blocks could introduce more noticeable discontinuities at boundaries. For each sub-block:

We map the image to the transform domain by applying d levels of DWT to the LL band.
Using the parent block's red/green partitioning, we project the DWT coefficients onto the allowed (green) region by perturbing any coefficient in a red interval to the centre of the nearest green interval. This perturbation is applied only to the LL (low-frequency) band at level d, as low-frequency components are expected to be more robust to common image manipulations.
Finally, we apply the inverse DWT d times to obtain the watermarked sub-block.

This process ensures that in the watermarked image, nearly all DWT coefficients fall within green regions, while in natural, non-watermarked images, approximately 50% of the coefficients would fall in each region. App. B.2 provides a detailed discussion of the parameters, their control mechanisms, and our design choices. The key parameters impacting visibility and robustness are the interval length l and the number of DWT levels d. Increasing l generally improves robustness but may introduce visual artefacts. Higher values of d are typically more advantageous as they enhance both imperceptibility and robustness. Optionally, our method can operate in a mode that skips watermark embedding in low-entropy blocks, where perturbations might be more perceptible (see App. С.1).

Our block-based approach directly supports our goal of localised detection. As we shall see in the next section, the detection algorithm outputs scores for each block, which are statistically significant due to the multiple sub-blocks contained within. The resolution of our detection map is controlled by the block size m, allowing for fine-grained localisation of potential modifications.

Remark 1. The use of a PRF with a random secret key sk mitigates forgery risks by preventing unauthorised creation of watermarked images. This approach necessitates a more centralised setup for detection—e.g., where the detection algorithm is made accessible to users through a public API. The PRF can be instantiated with a block cipher like AES or a cryptographic hash function such as SHA3 [17]. While our experiments use a fixed secret key, employing multiple keys can further strengthen the method against brute-force attacks (see Sec. B.4).

4.2. Watermark Detection

To check for the presence of a watermark in an image, we essentially reverse the embedding process. The complete procedure is detailed in Algorithm 4 of App. B.1. Here we describe the key steps. For each colour channel of each block, we:

Compute the seed using the get_seed function. This function should produce a stable seed even when the block has undergone minor modification, so that we can recover the seed that was used to embed the potential watermark without access to the original image.
Use the seed, along with the PRF and secret key sk, to determine the red/green partitioning of DWT coefficients for the block.
Run detection for each k×k sub-block: we apply d levels of DWT to the LL sub-band, then examine the LL sub-band at the last level, counting how many coefficients fall in the green (allowed) region.

From this block-level analysis, we can proceed to compute either a localised detection map or a global score for the entire image. We describe each of these processes in more detail below.

Localised Detection Map. For each m×m block, we aggregate the count of coefficients in the green region across all its sub-blocks. We set a detection threshold on this aggregate count based on a statistical hypothesis test with a prescribed significance level (see App. B.3 for details). The resulting detection map is generated at the block level: each m×m block is coloured green (indicating it is watermarked) if its aggregate green count exceeds the threshold, otherwise it is coloured red, as shown in Figs. 2 and 3.

Global Detection. While the localised detection map provides detailed information about the distribution of the watermark, a global measure is important for making a definitive assessment about whether the entire image is watermarked. To achieve this, we aggregate over the localised detection map, counting the fraction of blocks classified as watermarked (green). If the total count exceeds a predetermined threshold (calibrated to achieve a prescribed false positive rate), we predict that the entire image is watermarked; otherwise, we conclude it is not watermarked.

Remark 2. Since we embed the watermark into blocks throughout the image, we can implement a detection method that identifies watermarks even in cropped images. The method is based on a brute-force strategy to systematically search for the correct starting point of the embedded blocks, as described in Sec. В.6.

4.2.1. Interpreting the Localised Detection Map

The localised detection map provides valuable insights beyond global detection. It can be a helpful aid for image forensics, allowing users to identify specific locations where a watermarked image may have been altered or where protected (watermarked) content has been inserted into a non-watermarked image. It is important, however, to acknowledge that false positives and false negatives can occur since the hypothesis test suffers some error depending on the chosen significance level. To enhance the visual clarity of our map we apply an optional post-processing step, akin to a low-pass filter: if a green block is entirely surrounded by red blocks, we consider it a likely false positive and recolour it red. Similarly, red blocks completely surrounded by green blocks are recoloured green. It is important to note that this post-processing is only done to improve the utility of the detection map as a visual aid we do not apply it when computing the global score for the image. Example: Non-watermarked Image. Fig. 2 displays a non-watermarked image alongside its localised detection map and an analysis of its DWT coefficients. The analysis (zoomed in panel) shows the frequency of rule violations averaged across the three colour channels. We observe that most blocks (over 91%) have balanced distributions of red and green domain coefficients. This balance is expected in natural non-watermarked images where approximately half of the blocks exhibit more than 50% violating coefficients, while the other half show less than 50%. The raw localised detection map (to the right of the image) visually represents the result of our detection process. We see that most of the image's area is red, with only small, scattered green clusters. These green blocks, which make up only 8.3% of the total, and are scattered rather randomly across the image, represent natural statistical variations rather than the presence of a watermark. It is worth noting that small clusters require careful interpretation. Dense clusters may naturally form in highly monochromatic areas with minimal variation, such as the black shadow in the top right corner of our example. In these regions, blocks tend to have similar block seeds (mean colours) and therefore similar DWT coefficients. If these coefficients happen to fall in the "watermarked" regions of our detector, a green cluster may appear even in non-watermarked images. Example: Inserted Non-watermarked Content. Conversely, Fig. 3 demonstrates a scenario where non-watermarked content has been incorporated into a watermarked image. The detection map shows a prominent red cluster (corresponding to the inserted couple) within a predominantly green area. This demonstrates the utility of our localised approach for digital forensics and helps highlight the impact of manipulations on the distributed watermark.

Experiments and Analysis

In this section, we evaluate our watermarking scheme's performance, focusing on its detectability, interpretability for image forensics, and robustness against common image manipulations. We begin by identifying a set of image manipulations that preserve visual quality, then assess our method across multiple datasets. We then assess watermark detectability across multiple datasets, examine our scheme's interpretability for image forensics, and evaluate its robustness against common image corruptions. The parameter settings used for our method are detailed in App. C.1.

5.1. Experimental Setup

5.1.1. Datasets

We employ two real-world datasets: MS-COCO [18], where we select the first 5,000 images from the test set and WikiArt [21] with the first 1,000 images. Additionally, we utilise two AI-generated datasets: 3,310 images generated by DALL-E 3 [22], accessed through Hugging Face Datasets [15] and 5,000 images from the DiffusionDB dataset [38]. This selection ensures that experiments cover a broad range of image resolutions. A distribution of image sizes for each dataset is provided in Fig. 11 in App. D.2.

5.1.2. Evaluation Metrics

For image quality assessment, we utilise three common metrics to compare watermarked images with their original counterparts: PSNR, where higher values indicate better image quality; SSIM, for which higher scores reflect greater structural similarity to the original image and LPIPS, where lower scores indicate closer structural similarity.

To evaluate the robustness of watermarking schemes, we employ two key metrics. Watermark detection rate (WDR), equivalent to the true positive rate (TPR), measures the proportion of watermarked images correctly identified as watermarked. It ranges from 0 to 1, with higher values indicating better performance. False positive rate (FPR) assesses the proportion of non-watermarked images incorrectly identified as watermarked. It also ranges from 0 to 1, with lower values indicating better performance.

5.1.3. Image Manipulations

To assess the robustness of our method, we consider a range of image manipulations, building on the comprehensive evaluation provided by [43]. These include:

Brightness and contrast modifications with factor 0.5
JPEG compression, using a quality setting of 50
Adding Gaussian noise with a standard deviation of 0.05
Gaussian blur with a kernel size of 5 and σ = 1
Rotating the image by 90 degrees
VAE-based image compression models, Bmshj18 [1] and Cheng20 [6], with a quality level of 3.

Our primary focus is on transformations that preserve visual quality. Some of these modifications, particularly the VAE-based compression models (Cheng20 and Bmshj18) and extreme brightness adjustments, can significantly alter the appearance of the image as shown in Fig. 10 of App. D.1. While we include these in our experiments for completeness and comparison with prior work, we consider them outside our primary scope of evaluation.

5.2. Detectability

To demonstrate the effectiveness of our watermark detection across different image types, we analyse the scores generated for both original and watermarked images. In this experiment, we embed the watermark with an interval length of l = 8 and conduct hypothesis tests using a 5% significance level for each block. We then calculate the percentage of watermarked blocks per image. The results are presented in Tab. 1. Our findings show a clear distinction between watermarked and non-watermarked images. For watermarked images, the percentage of detected watermarked area exceeds 90%, while for non-watermarked images, this percentage remains below 20%. This provides strong evidence that our watermark is reliably detectable. We additionally compute FPR and ROC AUC, where we observe perfect detection scores of 0% and 1 across all datasets.

5.3. Image Forensics

Our method has potential applications in image forensics owing to its ability to produce a localised detection map. As shown in Sec. 4.2.1, this map can highlight areas that may have been modified or inserted post-watermarking by analysing the distribution and clustering of red and green blocks. The same principle applies to subtle image manipulations, such as shape changes using common warp tools.

In Fig. 4, we present an example image of a woman with applied face modifications such as making the eyes and lips bigger, and slightly changing the shape of the nose. Our localised detection map clearly highlights the correct areas of modification. While our watermark is robust to noise or compression, which apply uniformly across the image and primarily affect higher frequency components, even small warping transformations can break it. This sensitivity occurs because we embed our watermark in the LL band of the DWT transform. The DWT decomposes the image into different frequency subbands, with the LL band representing the large-scale approximation of the image. This band contains the most significant information about the edges and shapes, and is sensitive to geometric transformations.

5.4. Robustness

5.4.1. Cropping

We evaluate our method for different cropping strengths and present the results in Fig. 8. Since we are using the brute-force detection algorithm presented in Sec. B.6, we work with a subsample of 1000 images per dataset. We set the stopping criterion to p = 0.8, which yields a FPR of ≈ 0.01 for each dataset. A cropping strength of e.g., 70% means that we randomly remove 30% of the image area. Our results indicate that our method is highly robust against cropping, maintaining a TPR of > 90% for all datasets when up to 30% of the image area is cropped away. Even under radical cropping conditions where half of the image is removed, our method still achieves high TPR values ranging between 76% and 91%. While the performance on all datasets is strong, the results for WikiArt and DALL-E stand out as particularly robust. This can be attributed to their larger image sizes, providing more blocks for evaluation. A higher number of blocks leads to more stable detection statistics and reduces the influence of random variation.

5.4.2. Other Image Manipulations

In Tab. 2, we compare the performance of our method with other post-hoc watermarking approaches, based on results reported in [43]. Across all similarity metrics, our method achieves high imperceptibility—producing watermarked images that closely resemble the originals.

In terms of robustness, our method performs comparably to existing techniques when evaluated against common image transformations such as JPEG compression, Gaussian noise, and blur. While certain watermarks outperform ours in overall robustness, particularly under extreme perturbations, our method remains sufficiently resilient to modifications that do not substantially degrade image quality. Our approach is less robust to contrast and brightness adjustments. However, this aligns with our threat model, which assumes a malicious actor is unlikely to apply transformations that visibly distort the image. E.g., a brightness factor of 0.5 darkens the image by 50%, resulting in a perceptually significant alteration. Since our method is designed to detect tampering that preserves visual plausibility, we consider this trade-off acceptable within our intended use case.

Conclusion

We propose a simple yet effective post-hoc image watermarking method, MELB, that combines localised embedding with region-level interpretability. When embedding, our method operates in the discrete wavelet transform domain, using a block-wise approach that randomly divides the transform coefficients into allowed and disallowed regions, then projects the image blocks onto the nearest allowed regions. For detection, we adopt a one-sided hypothesis test to determine whether an image is watermarked. Our method outputs not only an overall detection score for the whole image but also a detection map, highlighting the areas where the watermark is found, providing users more transparency. Compared to prior post-hoc methods, MELB offers more interpretable detection while retaining competitive robustness against rotation, JPEG compression, blur, and noise, all while introducing minimal visible artefacts. Notably, our method shows strong robustness to cropping, maintaining detectability even when up to half of the image is cropped away. Future work could explore alternative approaches to seed computation beyond our current method of using the rounded mean colour of a block, e.g., investigating various hashing functions.

Societal Impact

Watermarking improves the traceability and authenticity of digital content, including AI-generated media. This capability offers important societal benefits, such as protecting intellectual property and helping detect misinformation or manipulated content. This traceability can also have unintended consequences, including privacy concerns and potential misuse, such as unauthorised watermarking of unwatermarked content to falsely claim ownership. Our approach mitigates forgery risks through secret-key-based pseudorandom embedding (see Remark 1), and the localised detection map provides transparency to aid forensic verification while recognising limitations such as possible false detection errors (Sec. 4.2.1). Overall, watermarking plays a crucial role in supporting digital content authenticity and traceability, making it a valuable tool. We believe the societal benefits of watermarking in maintaining trust and authenticity outweigh potential risks.

Appendix

A. Further Background

A.1. Discrete Wavelet Transform

The Discrete Wavelet Transform (DWT) is a method that converts an image from its spatial representation to a frequency-based representation. When applied to an image, the DWT creates four frequency subbands: low-low (LL), low-high (LH), high-low (HL), and high-high (HH). To achieve additional levels of DWT, each subband can be further decomposed into four new subbands. The LL sub-band, containing the lowest frequency components, holds most of the image's weight and important structural information. While it is resilient to subtle, image-wide changes such as noise and compression, it demands precise editing to preserve image quality, as alterations in this band are more likely to create visible artefacts [30]. Higher frequency subbands (LH, HL, and HH) capture fine details and edges of the image. These components are more easily affected by typical image modifications, such as JPEG compression. To embed a watermark, we carefully alter the coefficients in the LL subband. After embedding, we can reconstruct the original image using the inverse DWT.

A.2. Entropy of Visual Data

Image entropy, specifically Shannon entropy, quantifies the randomness or information content of an image [28]. It is calculated from the empirical probability distribution of pixel intensities. For a grayscale image, entropy H is calculated as H = - Σp(i) log(p(i)), (1) where p(i) represents the frequency of intensity level i in the image. Intuitively, regions with high entropy signify areas in the image where there is a lot of variation in pixel intensities, such as edges, textures, and details. These complex regions are often more suitable for embedding watermarks because the changes introduced by the watermark are less likely to be noticeable to the human eye [26].

B. Advanced Considerations and Variants of MELB

This section provides the core algorithmic components behind our watermarking method. We provide pseudocode and formal descriptions of the embedding and detection procedures and formulate the statistical hypothesis test used for detection. We also present an optional extension of our method that strengthens the system against brute-force attacks and elaborate on the details of two further refinements that were used in our experiments in Sec. 5.4.

B.1. Algorithms

In this section, we present additional algorithms for the detection process, as described in Sec. 4.1 and Sec. 4.2. Algorithm 1 and Algorithm 4 represent the main detection steps, while Algorithm 2 summarises the evaluation procedure. The statistical hypothesis test refers to the procedure described in Sec. 4.2, where we determine whether a block is watermarked by assessing the number of its DWT coefficients that fall within the allowed domain. The optional post_process step, introduced in Sec. 4.2.1, applies a low-pass filter to the detection map to enhance its utility as a visual aid. This post-processing step does not influence the final image-level score.

B.2. Parameter Configuration and Trade-offs

The effectiveness of our watermarking method depends on multiple parameters. The most crucial parameter in our method is the interval length l, which controls the trade-off between watermark robustness and visual imperceptibility: shorter interval lengths result in less visible watermarks but decreased robustness, while longer intervals enhance robustness at the potential cost of visual perceptibility (see Fig. 6). This is explained by the fact that larger interval lengths provide greater tolerance for variations in the DWT coefficients, allowing them to undergo minor modifications while staying in the green domain. When choosing excessively large interval lengths, we suggest embedding the watermark only in high entropy areas to reduce the visual impact. It is practical to define a consistent range [-r, r] that will be considered for the DWT coefficients that applies to all watermarked images. This ensures that the intervals are uniformly divided and always start from the same reference point, allowing the reliable reconstruction of the interval colouring during detection. While we can compute the theoretical bounds for DWT coefficients and work with the whole range, we note that most coefficients in natural images fall significantly below this bound. To reduce computational complexity and storage, we define the coefficient range as a parameter, chosen to cover the region where most DWT coefficients are expected to lie, and consider each value outside this range as allowed. We propose to define the get_seed function as the rounded mean colour of each image block. Generating distinct seeds per block is crucial, as it prevents uniform allowed value ranges across blocks, thereby strengthening security by avoiding repetitive patterns exploitable by attackers. Additionally, deriving each seed from the block's content enhances robustness to cropping, since the seed can be independently recovered even if parts of the image are removed. Using randomly generated seeds for each block would necessitate storing all seeds for detection, which is impractical at scale due to high storage overhead. Therefore, using the mean channel average strikes a practical balance between security, efficiency, and storage complexity. Exploring more sophisticated hash functions for seed generation is left as future work. When the seed is derived from the rounded mean colour of the block, the rounding value presents a trade-off between the robustness against image manipulations and forgability. In general, a smaller rounding value would create more distinct seed values across the image, making it harder for malicious actors to gather sufficient samples of allowed and disallowed modifications for any particular seed value. This is critical because if too many blocks share the same seed value, an attacker could analyse the pattern of allowed and disallowed modifications across these blocks to approximate the watermarking rules. For instance, if multiple blocks with the same mean colour exhibit consistent patterns of allowed colour modifications, an attacker could learn and exploit these patterns to forge or remove the watermark. However, the rounding value cannot be too small, as it must also provide sufficient robustness against simple mean-shifting attacks. In Sec. D.3.1 we demonstrate through experimental validation, that a rounding value of 30 achieves this balance effectively. Any attempt to shift block mean values enough to change their rounding behaviour results in severe degradation of visual quality. When employing DWT in image watermarking, it is common practice to apply two or three levels of DWT to enhance robustness [7, 16]. On top of this, we observe that using a higher number of DWT transforms helps to make the watermark more imperceptible, as illustrated in Fig. 5, since the embedding is spread more subtly across the transformed coefficients. In Tab. 3, we report the WDR for different levels of DWT. We observe that the level does not affect the WDR, as all results are at ≈ 100%. For this reason, it is generally preferable to choose a larger d. The number of levels of DWT d that can be applied to each sub-block is constrained by the sub-block size k. Since each level of DWT decomposition reduces the dimensions of the image (or sub-band) by half in both directions, k must be greater than or equal to 2d. Additionally, the block size m must be a multiple of k to ensure proper application of the DWT decomposition to the whole image area.

B.3. Hypothesis Test for Watermark Detection

To reduce false positives, we use a statistical test per block instead of a fixed 50% threshold, making the method more robust to natural variation. Our one-sided hypothesis test for each block (conducted at a 5% significance level in our experiments, can be adjusted by the user) is formulated as follows: H0: The block is not watermarked. It violates the red-green rule in at least 50% of its DWT coefficients. H1: The block is watermarked. The significance level of the hypothesis test can be used to control the false positive rate. Increasing the significance threshold reduces the likelihood of misclassifying unwatermarked blocks as watermarked, which comes at the potential cost of slightly reduced robustness against manipulations. If the null hypothesis is true, the number of green list coefficients CG has an expected value of N/2 and a variance of N/4, where N is the total number of DWT coefficients after d levels of DWT transformations to the block. We can calculate the threshold c for the number of green list coefficients for which we start rejecting the null hypothesis as: CG 20.95 2 √N+ N 2 (2) Example 1. To better understand the impact of scheme parameters on detection, consider example parameter settings. For a block size of m = 96 and sub-block size of k = 8 with d = 3 levels of DWT, this yields a threshold of c ≈ 82. This means that for a block to reject the null hypothesis and consider this block watermarked, at least 82 out of 144 DWT coefficients (> 56.94%) need to be from the green domain.

B.4. Strengthening against Brute-Force Attacks

In Sec. 4.1 we assume that the user generates a single secret key sk which we keep fixed for the whole image. Our system can be strengthened against brute-force attacks by employing K > 1 distinct secret keys for watermark embedding. Instead of using a single secret key for the entire image, the algorithm randomly selects a key from a pre-defined set for each m × m block. During detection, we evaluate the image using all K possible keys and correct for multiple hypotheses via Bonferroni correction. When partitioning DWT coefficients, half of the intervals are defined as allowed regions, meaning that each coefficient will be in the allowed domain for K/2 keys and in the disallowed domain for the other K/2 [17]. This approach helps prevent statistical attacks since no consistent patterns can be easily discovered when analysing large numbers of watermarked images, even when an attacker attempts to aggregate statistics across multiple images.

B.5. Adaptive Embedding Based on Block Entropy

The interval length I of our watermark presents a trade-off between visibility and robustness. Higher values of l result in watermarks that are more robust to image manipulations but may become visible in low-entropy regions (see Fig. 6). To preserve this robustness while maintaining imperceptibility, we propose to embed the watermark only in high-entropy areas. For the experiments in Sec. 5.4 we implement the following procedure: Before executing line 5 of Algorithm 1 (and for the detection before line 12 of Algorithm 4), we calculate the entropy H(b) of each block b according to Eq. (1). Subsequently, we determine the entropy threshold τ = median{H(b₁),..., H(bn)} where n is the total number of blocks in the image. Only blocks with entropy above the threshold (H(b) > τ) are processed for watermark embedding or detection. Note that while we use the median as our threshold, alternative percentile thresholds could be selected by the user depending on the desired embedding capacity and imperceptibility. Excluding low-entropy regions from watermarking is generally reasonable, as these areas typically correspond to monochromatic or background regions with minimal information rather than the actual content we seek to protect. This approach aligns with similar entropy-based filtering strategies employed in previous work [17].

B.6. Crop-Resilient Detection

Since we embed the watermark into blocks throughout the image, we can implement a detection method that identifies watermarks even in cropped images. As visualised in Fig. 7, this detection is based on a brute-force strategy to systematically search for the correct starting point of the embedded blocks. To detect a watermark in a potentially cropped image, we iterate through the image in m × m pixel blocks. Since the original starting point of the watermark grid is unknown in a cropped image, we exhaustively test all possible starting positions for the first m × m block. For the detection process, we propose two different strategies: 1. Score-based selection: We collect the detection scores for all possible grid alignments and output the highest identified score. 2. Early stopping via fixed threshold: We choose a detection threshold calibrated to our desired FPR and stop the search if this threshold is met. 3. Early stopping via hypothesis testing: We evaluate the percentage of watermarked blocks for each iteration, performing a one-sided hypothesis test to determine if the proportion of watermarked blocks exceeds 50% at a specified significance level. If this threshold is met, we conclude that a watermark has been detected and stop the search.

C. Further Experimental Details

C.1. Embedding Settings

We apply d = 3 levels of DWT, and accordingly set the block size to the smallest allowed value k = 2d = 8. We partition the domain of the DWT coefficients into intervals of size l over the range [-3000, 3000] and treat values outside of this range as allowed (green). To minimise the visible impact of the introduced perturbations after reversing the transform, we restrict the maximum perturbation to at most ±3l. If no green intervals are found within this range, the coefficient remains unchanged. This restriction is justified as the probability of all six intervals (left and right for each of the 3 channels) surrounding a coefficient being red is very low, at approximately ≈ 0.016.

For the detectability, image forensics, and cropping experiments (Sections 5.2, 5.3, and 5.4.1), we set the interval length to l = 8. To assess the method's behaviour under geometric transformations (Sec. 5.4.2), we increase the interval length to l = 14 and selectively embed the watermark in high-entropy regions. Specifically, we calculate the median entropy across all blocks and embed the watermark only in blocks with entropy values above this median threshold, thereby excluding regions with very low entropy values.

C.2. Choosing the Detection Threshold

For a direct comparison with the results presented in [43], we follow their methodology: For watermarking techniques that embed bitstrings, detection thresholds are defined to reject the null hypothesis (i.e., the absence of a watermark) at p < 0.01. This requires the correct detection of at least 24 out of 32 bits for 32-bit methods, and 61 out of 96 bits for StegaStamp. For score-based watermarking methods, the threshold parameter p is calibrated to achieve reasonable FPRs. Following this approach, our score-based method is calibrated to match similar FPR values as in [43] across different datasets. Through empirical evaluation of various threshold settings (Tab. 4), optimal threshold values were determined: p = 0.37 for the COCO dataset, p = 0.36 for Diffusion DB, and p = 0.46 for WikiArt.

D. Additional Figures and Analysis

In this section, we provide more details on our evaluations and additional figures.

D.1. Image Quality of Manipulated Images

We first illustrate two example images to demonstrate how the same set of modifications can have different visual effects depending on the input image. One of the images is more robust to the changes, while the other reacts more sensitively, as shown in Fig. 10. Based on this observation, we further quantify the variability in modification effects by evaluating each manipulation using a set of similarity metrics. Fig. 12 presents the results for each similarity metric individually. It is evident that some manipulations, such as JPEG compression, are consistently less perceptible across all metrics, while others (e.g., Bmjsh18) cause significant disruptions to image similarity. As each metric captures a different aspect of perceptual similarity, we cannot observe a consistent ranking of manipulations based on image quality degradation. To provide a more comprehensive overview, we normalise all metric scores to the range [0, 1] and combine them into a unified scale, shown in Figure 9. As the transformations Cheng20, Bmshj18 and Brightness failed to achieve 50% of the average quality metric, we deemed these to be insufficient to achieve the attacker's goals and so did not consider them further.

D.2. Distribution of Image Sizes

To ensure diversity in our evaluation, we include datasets with varying image sizes. WikiArt images are the largest, with a median size over 9× larger than MS-COCO and nearly 2.5x larger than DALL-E. DALL-E and DiffusionDB represent mid-sized images, with DALL-E being notably consistent in resolution, while DiffusionDB shows broader variability. An overview of the pixel count distributions is shown in Figure 11. When it comes to image forensics and analysing the detection map, our watermarking method performs best when applied to large images. In these cases, it becomes easier to analyse the patterns of red squares in the localised detection map. The larger scale helps us distinguish between actual modifications and random false positives, which can occur in small numbers even in unaltered images.

D.3. Additional Robustness Evaluations

In this section, we provide additional evaluation of our watermarking method's robustness against various image manipulations. To complement the cropping analysis presented in Sec. 5.4.1, we include Fig. 8 for completeness, which visualises the robustness of our method to different levels of cropping. In addition to the results presented in Sec. 5.4.2, we assess the detection performance of our method under various image modifications: • JPEG compression with quality factors 30, 50, 70 and 90. • Gaussian blur with standard deviation 1 and kernel sizes of 3, 5 and 7. • Adding Gaussian noise with standard deviation σ∈ {0.02, 0.04, 0.05, 0.06, 0.08}. • Contrast and brightness adjustment with factors ∈ {0.5, 0.8, 1.0, 1.2, 1.6, 2.0}. We present ROC curves for each type of distortion in Fig. 13. We can observe that our watermarking scheme exhibits high robustness against JPEG compression. Detection performance is nearly perfect for quality factors 90, 70, and 50, with AUC values close to 1.00. Even under strong compression with a quality factor of 30, the method maintains good detectability, achieving an AUC of 0.83. Similarly, the method shows excellent robustness to Gaussian blurring, as depicted in Fig. 13b. For a kernel size of 3, the AUC is 1.00, indicating perfect detection. Increasing the kernel size to results in only a marginal decrease in performance (AUC of 0.99). The performance under additive Gaussian noise is evaluated in Fig. 13c. We can observe that the watermark detection remains high as the noise strength increases further to σ = 0.06 (AUC = 0.91) and σ = 0.08 (AUC = 0.81). Contrast and brightness adjustments, however, pose a greater challenge (see Fig. 13d and Fig. 13e. The detection performance is weaker across all tested factors. These image modifications directly modify the pixel intensity values and the overall dynamic range of the image. Given that the DWT coefficients are computed from these pixel values, substantial alterations to the input data necessarily lead to significant changes in the calculated coefficient magnitudes and distributions. Overall, this additional evaluation indicates that our watermarking method is highly robust to common distortions like JPEG compression, Gaussian blur, and additive Gaussian noise. While detection performance can be impacted by strong contrast and brightness adjustments, it is noteworthy that such significant modifications typically result in substantial and perceptible visual degradation of the image content. Consequently, the practical relevance of this sensitivity might be reduced in scenarios where maintaining the visual authenticity and quality of the content is important. D.3.1. Attacking the Seed To evaluate the robustness of the detection process of our watermarking scheme, we design a straightforward attack targeting the sensitivity of the block seed (get_seed function in Algorithms 1 and 4, see Algorithm 3 for a definition). The attack aims to modify the mean colour of each block, which is crucial for watermark detection. Specifically, we explore three variations of this attack: 1) Adjusting the mean colour to the nearest value that would round to a different integer when using our rounding-to-30 scheme. 2) Increasing the mean colour to the next higher value that would result in a different rounded integer. 3) Decreasing the mean colour to the next lower value that would yield a different rounded integer. In Fig. 14 we demonstrate example outputs from this attack, indicating that all three variations of this attack necessitated substantial alterations to the mean colour values. These changes resulted in visual artefacts, that significantly compromise image quality. These results demonstrate the difficulty of watermark removal by manipulating the block seeds.

References

[1] Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018. 7

[2] Abdullah Bamatraf, Rosziati Ibrahim, Mohd Salleh, and Mohd Najib. A new digital watermarking algorithm using combination of least significant bit (LSB) and inverse bit. Journal of Computing, 3(4):1–8, 2011. 1

[3] Mahbuba Begum and Mohammad Shorif Uddin. Digital image watermarking techniques: A review. Information, 11(2), 2020. 2

[4] Christopher Campbell. Picture of a woman with red hair. Unsplash Image, 2015. Retrieved January 18, 2025. 8

[5] Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, and Furu Wei. WavMark: Watermarking for audio generation, 2024. 1

[6] Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7939–7948, 2020. 7

[7] Rita Choudhary and Girish Parmar. A robust image watermarking technique using 2-level discrete wavelet transform (DWT). In 2016 2nd International Conference on Communication Control and Intelligent Systems (CCIS), pages 120-124, 2016. 13

[8] Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica Fridrich, and Ton Kalker. Digital Watermarking and Steganography. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2 edition, 2007. 1, 2, 8

[9] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning, 2017. 1

[10] Pierre Fernandez, Alexandre Sablayrolles, Teddy Furon, Hervé Jégou, and Matthijs Douze. Watermarking images in self-supervised latent spaces. In ICASSP 2022-IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3054-3058, 2022. 2, 8

[11] Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22466-22477, 2023. 2

[12] Daniel Gruhl, Anthony Lu, and Walter Bender. Echo hiding. In Information Hiding, pages 295-315, Berlin, Heidelberg, 1996. Springer Berlin Heidelberg. 1

[13] Adil Haouzia and Rita Noumeir. Methods for image authentication: a survey. Multimedia Tools Appl., 39(1):1-46, 2008. 2

[14] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840-6851, 2020. 2

[15] Evgeniy Hristoforu. DALL-E 3 images. https://huggingface.co/datasets/ehristoforu/dalle-3-images, 2023. Accessed: January 17, 2025. 6

[16] Nikita Kashyap and G. R. Sinha. Image watermarking using 3-level discrete wavelet transform (DWT). International Journal of Modern Education and Computer Science, 4:50-56, 2012. 13

[17] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, pages 17061-17084. PMLR, 2023. 1, 3, 5, 13, 14

[18] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In Computer Vision – ECCV 2014, pages 740-755, Cham, 2014. Springer International Publishing. 6

[19] Rui Ma, Mengxi Guo, Yi Hou, Fan Yang, Yuan Li, Huizhu Jia, and Xiaodong Xie. Towards blind watermarking: Combining invertible and non-invertible mechanisms. In Proceedings of the 30th ACM International Conference on Multimedia, page 1532-1542. ACM, 2022. 2, 8

[20] Minzhou Pan, Yi Zeng, Xue Lin, Ning Yu, Cho-Jui Hsieh, Peter Henderson, and Ruoxi Jia. JIGMARK: A black-box approach for enhancing image watermarks against diffusion model edits, 2024. 2

[21] Fred Phillips and Brandy Mackintosh. Wiki Art Gallery, Inc.: A case for critical thinking. Issues in Accounting Education, 26(3):593-608, 2011. 6

[22] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, pages 8821-8831. PMLR, 2021. 6

[23] Vinu Sankar Sadasivan, Mehrdad Saberi, and Soheil Feizi. IConMark: Robust interpretable concept-based watermark for Al images. arXiv preprint arXiv:2507.13407, 2025. 2

[24] Robin San Roman, Pierre Fernandez, Hady Elsahar, Alexandre Défossez, Teddy Furon, and Tuan Tran. Proactive detection of voice cloning with localized watermarking. In Proceedings of the 41st International Conference on Machine Learning, pages 43180-43196. PMLR, 2024. 3

[25] Tom Sander, Pierre Fernandez, Alain Oliviero Durmus, Teddy Furon, and Matthijs Douze. Watermark anything with localized messages. In The Thirteenth International Conference on Learning Representations, 2025. 2

[26] Jordi Serra-Ruiz, Amna Qureshi, and David Megías. Entropy-based semi-fragile watermarking of remote sensing images in the wavelet domain. Entropy, 21(9), 2019. 11

[27] Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, and Ben Y. Zhao. Glaze: Protecting artists from style mimicry by Text-to-Image models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 2187-2204. USENIX Association, 2023. 1

[28] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379-423, 1948. 11

[29] Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, and Kui Ren. Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution. In Proceedings 2025 Network and Distributed System Security Symposium. Internet Society, 2025. 2

[30] Sunpreet Sharma, Ju Zou, Gu Fang, Pancham Shukla, and Weidong Cai. A review of image watermarking for identity protection and verification. Multimedia Tools and Applications, 83:31829-31891, 2024. 1, 2, 11

[31] Roop Singh, Mukesh Saraswat, Alaknanda Ashok, Himanshu Mittal, Ashish Tripathi, Avinash Chandra Pandey, and Raju Pal. From classical to soft computing based watermarking techniques: A comprehensive review. Future Generation Computer Systems, 141:738–754, 2023. 1

[32] Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 180-186. Association for Computing Machinery, 2020. 2

[33] Nandhini Subramanian, Omar Elharrouss, Somaya AlMaadeed, and Ahmed Bouridane. Image steganography: A review of the recent advances. IEEE Access, 9:23409-23423, 2021. 1

[34] Matthew Tancik, Ben Mildenhall, and Ren Ng. StegaStamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117-2126, 2020. 1, 2, 8

[35] European Union. Regulation (eu) 2024/1689: Artificial intelligence act. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689, 2024. Accessed: 2025-04-04. 1

[36] M. Utku Celik, G. Sharma, E. Saber, and A. Murat Tekalp. Hierarchical watermarking for secure image authentication with localization. IEEE Transactions on Image Processing, 11(6):585-595, 2002. 2

[37] Cristian Vaccari and Andrew Chadwick. Deepfakes and disinformation: Exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Social Media + Society, 6(1), 2020. 1

[38] Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. DiffusionDB: A large-scale prompt gallery dataset for text-toimage generative models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 893-911. Association for Computational Linguistics, 2023. 6

[39] Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-rings watermarks: Invisible fingerprints for diffusion images. In Advances in Neural Information Processing Systems, pages 58047–58063, 2023. 1, 2

[40] Xiang-Gen Xia, Charles G. Boncelet, and Gonzalo R. Arce. Wavelet transform based watermark for digital images. Opt. Express, 3(12):497-511, 1998. 2

[41] Heng Zhang, Chengyou Wang, and Xiao Zhou. A robust image watermarking scheme based on svd in the spatial domain. Future Internet, 9(3), 2017. 2

[42] Kevin Alex Zhang, Lei Xu, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Robust invisible video watermarking with attention, 2019. 1, 2, 8

[43] Lijun Zhang, Xiao Liu, Antoni Martin, Cindy Bearfield, Yuriy Brun, and Hui Guan. Attack-resilient image watermarking using stable diffusion. Advances in Neural Information Processing Systems, 37:38480-38507, 2024. 1, 2, 7, 8, 15

[44] Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, and Jian Zhang. Editguard: Versatile image watermarking for tamper localization and copyright protection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11964-11974, 2024. 2

[45] Y Zhao, B Liu, T Zhu, M Ding, X Yu, and W Zhou. Proactive image manipulation detection via deep semi-fragile watermark. Neurocomputing, 585:127593, 2024. 3

[46] Yihao Zheng, Haocheng Xia, Junyuan Pang, Jinfei Liu, Kui Ren, Lingyang Chu, Yang Cao, and Li Xiong. TabularMark: Watermarking tabular datasets for machine learning. In Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, pages 3570-3584. Association for Computing Machinery, 2024. 3

Enterprise Process Flow: MELB Watermark Embedding

Original Image

→

Separate Colour Channels

→

Divide into Blocks

→

Compute Mean Colour & Generate Seed

→

Partition DWT Coeffs (Red/Green)

→

Embed Watermark (Perturb Coeffs)

→

Apply Inverse DWT

→

Watermarked Image

MELB's Localised Detection vs. Traditional Global Scores

MELB provides granular insights into watermark presence and tampering, unlike traditional methods that offer only a binary or global assessment.

Feature	MELB (Our Method)	Traditional Schemes
Output	Localised Detection Map	Binary (Yes/No) or Global Score
Transparency	Highlights specific altered/watermarked regions	Black-box, no explanation
Tamper Assessment	Identifies where tampering occurred	Only identifies if tampering occurred
Robustness to Cropping	High (brute-force search for grid)	Often limited (watermark fragile to spatial changes)
Interpretability	Aids in image forensics, localized tamper assessment	Lacks reasoning for decisions
Embedding Domain	DWT (LL band)	Spatial, frequency, latent feature spaces

91% True Positive Rate (TPR) for 50% Cropped Images

MELB demonstrates strong robustness, maintaining over 90% TPR even when half of the image is cropped, showcasing its resilience to partial content removal.

Case Study: Impact of DWT Levels on Imperceptibility

Challenge: Achieving both strong robustness and high imperceptibility is a critical trade-off in watermarking. Spreading the watermark too widely can reduce robustness, while concentrating it too much can increase visibility.

Solution: Our method leverages higher DWT levels (d) to embed the watermark more subtly across transformed coefficients. This strategy diffuses the embedding changes, making them less perceptible to the human eye while maintaining robustness.

Outcome: As illustrated in Figure 5, increasing DWT levels from d=1 to d=3 significantly enhances the imperceptibility of the embedded watermark, without negatively impacting the Watermark Detection Rate (WDR) which remains close to 100% across various levels (Table 3).

Addressing Seed Manipulation Attacks

Mean Colour Rounding (Seed Generation)

→

Attack: Modify Mean Colour

→

Disrupt Seed Reconstruction

→

Result: Visual Artifacts

Advanced ROI Calculator

Estimate the potential return on investment for integrating interpretable watermarking into your enterprise workflows.

Your Industry Sector

Number of Employees (Impacted by content authenticity)

Average Weekly Hours (Spent on content verification/management)

Average Hourly Cost (Including overhead)

Estimated Annual Savings $0

Hours Reclaimed Annually 0

Your Enterprise AI Adoption Roadmap

A phased approach to integrating MELB into your existing content pipelines, ensuring a smooth transition and maximum impact.

Phase 1: Pilot & Proof of Concept (Weeks 1-4)

Integrate MELB with a small, representative dataset. Evaluate imperceptibility, robustness, and localised detection capabilities against your specific content types and manipulation scenarios. Establish baseline metrics and success criteria.

Phase 2: Customization & Integration Planning (Weeks 5-8)

Based on pilot results, fine-tune MELB parameters for optimal performance within your infrastructure. Design API integrations with existing content management systems, generative AI platforms, and forensic tools. Develop a secure key management strategy.

Phase 3: Scaled Deployment & Training (Months 3-6)

Roll out MELB across selected departmental workflows. Provide comprehensive training for content creators, legal teams, and security personnel on watermark embedding, detection, and interpretation of localised detection maps. Begin monitoring system performance and user adoption.

Phase 4: Optimization & Expansion (Ongoing)

Continuously monitor and refine MELB's performance, adapting to new AI models and evolving threat landscapes. Explore advanced features like adaptive embedding based on content entropy. Expand deployment across all relevant enterprise content pipelines, ensuring comprehensive digital authenticity.

Plan Your Phased Rollout

Ready to Safeguard Your Digital Content?

Connect with our AI specialists to explore how interpretable watermarking can enhance authenticity, ownership, and trust across your enterprise.

Book a Free Consultation