On Watermarking Semistructures

Get BibTex-formatted data

Download

PDF

Author

Radu Sion and Mikhail Atallah and Sunil Prabhakar

Tech report number

CERIAS TR 2001-54

Entry type

techreport

Abstract

Watermarking, in the traditional sense is the technique of embedding un-detectable (un-perceivable) hidden information into multimedia objects (i.e. images, audio, video, text) mainly to protect the data from unauthorized duplication and distribution by enabling provable ownership over the content. Whereas considerable work has been invested in this topic, little has been done (with the notable exception of attempts in software watermarking and recent progress in the area of natural language processing to enable the same concept in the area of semi-structured non-media data such as XML, databases and non-multimedia repositories. We believe that there is much to be gained from the ability to embed non-destructive hidden information in this kind of content, in particular considering current mainstream migration of business interactions towards distributed computing technologies using markup languages such as XML and underlying database storage. Watermarking in the area of semi-structured data presents a whole new set of challenges and associated trade-offs. One characterizing main difference can be expressed simply as \"lack of bandwidth\", deriving from the inherent lack of a major noise component in that domain. We present some of the issues encountered in the course of our ongoing work in watermarking XML and numeric database content. We define a preliminary model-level analysis of the new domain and corresponding transforms. We design a method for watermarking semistructures based on a novel canonical labeling algorithm that self-adjusts to the specifics of the content. Labeling is tolerant to a significant number of graph attacks (\"surgeries\") and relies on a complex \"training\" phase at watermarking time in which it reaches a optimal stability point with respect to the expected attacks. Watermark detection works without requiring the original un-marked object. We analyse how to perform efficient and useful generic node content summarisation, hashing. We treat the issue of graph partitioning in the framework of hierarchical watermarking and show how hierarchical watermarking effectively amplifies the power of weak marking algorithms leading to an ultimately more powerful and robust watermark. We perform experiments enforcing some of the introduced algorithms (e.g. labeling) under different attack conditions and present some of the conclusions. Future envisioned medium and long term research issues are outlined.

Download

PDF

Booktitle

(submission)

Institution

Purdue University

Key alpha

sion2002wmsemistructures

Affiliation

CERIAS and Computer Sciences

Publication Date

1900-01-01

Language

English

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Techreport{ sion2002wmsemistructures,
	title = "On Watermarking Semistructures",
	author = "Radu Sion and Mikhail Atallah and Sunil Prabhakar",
	booktitle = "(submission)",
	institution = "Purdue University",
	abstract = "Watermarking, in the traditional sense is the technique of
embedding un-detectable (un-perceivable) hidden information
into multimedia objects (i.e. images, audio, video, text) mainly to protect
the data from unauthorized duplication and distribution by enabling
provable ownership over the content. Whereas considerable work has been
invested in this topic, little has been done (with the notable exception of
attempts in software watermarking and recent
progress in the area of natural language processing
to enable the same concept in the area of semi-structured non-media data such
as XML, databases and non-multimedia repositories.
 
We believe that there is much to be
gained from the ability to embed non-destructive hidden information
in this kind of content, in particular considering current mainstream
migration of business interactions towards distributed computing
technologies using markup languages such as XML and underlying database
storage.
 
Watermarking in the area of semi-structured data presents a whole new
set of challenges and associated trade-offs. One characterizing main
difference can be expressed simply as \"lack of bandwidth\", deriving
from the inherent lack of a major noise component in that domain.
We present some of the issues encountered in
the course of our ongoing work in watermarking XML and numeric database
content.  We define a preliminary model-level analysis of
the new domain and corresponding transforms.
We design a method for watermarking semistructures
based on a novel canonical labeling algorithm that self-adjusts
to the specifics of the content. Labeling is tolerant to a
significant number of graph attacks (\"surgeries\") and relies
on a complex \"training\" phase at watermarking time in which
it reaches a optimal stability point with respect to the
expected attacks.
Watermark detection works without requiring
the original un-marked object.
We analyse how to perform efficient and useful generic
node content summarisation, hashing. We treat the
issue of graph partitioning in the framework of hierarchical
watermarking and show how hierarchical watermarking effectively
amplifies the power of weak marking algorithms leading to an
ultimately more powerful and robust watermark.
We perform experiments
enforcing some of the introduced algorithms (e.g. labeling) under
different attack conditions and present some of the conclusions.
Future envisioned medium and long term research
issues are outlined.
",
	affiliation = "CERIAS and Computer Sciences",
	language = "English",
}