Abstract
Watermarking, in the traditional sense is the technique of
embedding un-detectable (un-perceivable) hidden information
into multimedia objects (i.e. images, audio, video, text) mainly to protect
the data from unauthorized duplication and distribution by enabling
provable ownership over the content. Whereas considerable work has been
invested in this topic, little has been done (with the notable exception of
attempts in software watermarking and recent
progress in the area of natural language processing
to enable the same concept in the area of semi-structured non-media data such
as XML, databases and non-multimedia repositories.
We believe that there is much to be
gained from the ability to embed non-destructive hidden information
in this kind of content, in particular considering current mainstream
migration of business interactions towards distributed computing
technologies using markup languages such as XML and underlying database
storage.
Watermarking in the area of semi-structured data presents a whole new
set of challenges and associated trade-offs. One characterizing main
difference can be expressed simply as \"lack of bandwidth\", deriving
from the inherent lack of a major noise component in that domain.
We present some of the issues encountered in
the course of our ongoing work in watermarking XML and numeric database
content. We define a preliminary model-level analysis of
the new domain and corresponding transforms.
We design a method for watermarking semistructures
based on a novel canonical labeling algorithm that self-adjusts
to the specifics of the content. Labeling is tolerant to a
significant number of graph attacks (\"surgeries\") and relies
on a complex \"training\" phase at watermarking time in which
it reaches a optimal stability point with respect to the
expected attacks.
Watermark detection works without requiring
the original un-marked object.
We analyse how to perform efficient and useful generic
node content summarisation, hashing. We treat the
issue of graph partitioning in the framework of hierarchical
watermarking and show how hierarchical watermarking effectively
amplifies the power of weak marking algorithms leading to an
ultimately more powerful and robust watermark.
We perform experiments
enforcing some of the introduced algorithms (e.g. labeling) under
different attack conditions and present some of the conclusions.
Future envisioned medium and long term research
issues are outlined.