Abstract
Today's world of increasingly dynamic computing environments
naturally results in more and more data being available as fast
streams. Applications such as stock market analysis, environmental
sensing, web clicks and intrusion detection are just a few of the
examples where valuable data is streamed to its consumer. Often,
streaming information is offered on the basis of a non-exclusive,
single-use customer license. One major concern, especially given the
digital nature of the valuable stream, is the ability to easily record
and potentially "re-play" parts of it in the future. If there is
value associated with such future re-plays, it could constitute enough
incentive for a malicious customer (Mallory) to duplicate segments of
such recorded data, subsequently re-selling them for profit. Being able
to protect against such infringements becomes a necessity.
In this paper we introduce the issue of rights protection for streaming
data through watermarking. This is a novel problem with many associated
challenges including: the inability to perform multiple-pass random
accesses to the entire data set, the requirement to be fast enough to
keep up with the incoming stream rate, to survive instances of extreme
sparse sampling and summarizations, while at the same time keeping
data alterations within allowable bounds. We propose a solution and
analyze its resilience to various types of attacks as well as some of
the important expected domain-specific transforms, such as sampling and
summarization. We implement a proof of concept software (wms.*) for the
proposed solution and perform experiments on real sensor data to assess
these resilience levels in practice. Our method proves to be well suited
for this new domain. For example, we can recover an over 97% confidence
watermark from a sampled (e.g. less than 8%) stream. Similarly, our
encoding ensures survival to stream summarization (e.g. 20%) and random
alteration attacks with very high confidence levels, often above 99%.