Radu Sion - Purdue University Computer Science
On Watermarking Semi-Structures.XML
Nov 14, 2001
Abstract
Watermarking in the area of semi-structured data presents a whole new set of challenges and associated trade-offs. One characterizing main difference can be expressed simply as "lack of bandwidth", deriving from the inherent lack of a major noise component in that domain.
We present some of the issues encountered in the course of our ongoing work in watermarking XML and numeric database content. We define a preliminary model-level analysis of the new domain and corresponding transforms. We design a method for watermarking semistructures based on a novel canonical labeling algorithm that self-adjusts to the specifics of the content. Labeling is tolerant to a significant number of graph attacks ("surgeries") and relies on a complex "training" phase at watermarking time in which it reaches a optimal stability point with respect to the expected attacks. Watermark detection works without requiring the original un-marked object. We analyse how to perform efficient and useful generic node content summarisation, hashing. We treat the issue of graph partitioning in the framework of hierarchical watermarking and show how hierarchical watermarking effectively amplifies the power of weak marking algorithms leading to an ultimately more powerful and robust watermark. We perform experiments enforcing some of the introduced algorithms (e.g. labeling) under different attack conditions and present some of the conclusions. Future envisioned medium and long term research issues are outlined.