The production of closed captions is an important but expensive process in video broadcasting. We propose a method to generate highly accurate off-line captions efficiently. Our system uses text alignment to synchronize program transcripts obtained for a video program with text produced by an automatic speech recognition (ASR) system. We will also describe the accuracy in both closed-caption text and the ASR output for a number of news programs and provide a detailed analysis of the errors that occur.
In this paper, we describe an approach that uses a combination of visual and audio features to cluster shots belonging to the same person in video programs. We use color histograms extracted from keyframes and faces, as well as cepstral coefficients derived from audio to calculate pairwise shot distances. These distances are then normalized and combined to a single confidence value which reflects our certainty that two shots contain the same person. We then use an agglomerative clustering algorithm to cluster shots based on these confidence values. We report the results of our system on a data set of approximately 8 hours of programming.
We address the problem of detecting shots of subjects that are interviewed in news sequences. This is useful since usually these kinds of scenes contain important and reusable information that can be used for other news programs. In a previous paper, we presented a technique based on a priori knowledge of the editing techniques used in news sequences which allowed a fast search of news stories. We present a new shot descriptor technique which improves the previous search results by using a simple, yet efficient algorithm, based on the information contained in consecutive frames. Results are provided which prove the validity of the approach.
Rate scalable video compression is appealing for low bit rate applications, such as video telephony and wireless communication, where bandwidth available to an application cannot be guaranteed. In this paper, we investigate a set of strategies to increase the performance of SAMCoW, a rate scalable encoder. These techniques are based on based on wavelet decomposition, spatial orientation trees, and motion compensation
In this paper, an analysis of the efficiency of three signal-to-noise ratio (SNR) scalable strategies for motion compensated video coders and their non-scalable counterpart is presented. After assuming some models and hypotheses with respect to the signals and systems involved, we have obtained the SNR of each coding strategy as a function of the decoding rate. To validate our analysis, we have compared our theoretical results with data from encodings of real video sequences. Results show that our analysis describes qualitatively the performance of each scalable strategy, and therefore, it can be useful to understand main features of each scalable technique and what factors influence their efficiency.
Recently, we proposed a method for constructing a template for efficient temporal synchronization in video watermarking. Our temporal synchronization method uses a state machine key generator for producing the watermark embedded in successive frames of video. A feature extractor allows the watermark key schedule to be content dependent, increasing the difficulty of copy and ownership attacks. It was shown that efficient synchronization can be achieved by adding temporal redundancy into the key schedule. In this paper, we explore and extend the concepts of our temporal synchronization method to spatial synchronization. The key generator is used to construct the embedded watermark of non-overlapping blocks of the video, creating a tiled structure. The autocorrelation of the tiled watermark contains local maxima or peaks with a grid-like structure, where the distance between the peaks indicates the scale of the watermark and the orientation of the peaks indicate the watermark rotation. Experimental results are obtained using digital image watermarks. Scaling and rotation attacks are investigated.
We present a new method for full-field mammogram analysis. A mammogram is analyzed region by region and is classified as normal or abnormal. We present methods for extracting features that can be used to distinguish normal and abnormal regions of a mammogram. We describe our classifier technique that uses a unique reclassification method to boost the classification performance. We have tested this technique on a set of ground-truth full-field mammograms.
Text data forms the largest bulk of digital data that people encounter and exchange daily. For this reason the potential usage of text data as a covert channel for secret communication is an imminent concern. Even though information hiding into natural language text has started to attract great interest, there has been no study on attacks against these applications. In this paper we examine the robustness of lexical steganography systems.In this paper we used a universal steganalysis method based on language models and support vector machines to differentiate sentences modified by a lexical steganography algorithm from unmodified sentences. The experimental accuracy of our method on classification of steganographically modified sentences was 84.9%. On classification of isolated sentences we obtained a high recall rate whereas the precision was low.
Line or linear structure detection is a very basic, yet important problem in image pro- cessing and computer vision. Many line detection algorithms are based on edge detection and consider lines as extended or contiguous edges. Most techniques require that a binary edge map be first extracted from the image before line detection is performed. In this paper, we propose a new line detection technique that is based on a model that describes spatial characteristics of line structures in an image. This line model uses simple properties of lines that include both graylevel and geometric features. The performance of the line detector on natural scenes and medical images will be shown. The technique is shown to be capable of detecting lines of different width, lines of varying width, as well as curves.
Leaky prediction layered video coding (LPLC) partially includes the enhancement layer in the motion compensated prediction loop, by using a leaky factor between 0 and 1, to balance the coding efficiency and error resilience performance. In this paper, rate distortion functions are derived for LPLC from rate distortion theory. Closed form expressions are obtained for two scenarios of LPLC, one where the enhancement layer stays intact and the other where the enhancement layer suffers from data rate truncation. The rate distortion performance of LPLC is then evaluated with respect to different choices of the leaky factor, demonstrating that the theoretical analysis well conforms with the operational results.
The block DCT (BDCT) is by far one of the most popular transforms used in image and video coding. However, it introduces a noticeable blocking artifact at low data rates. A great deal of work has been done to remove the artifact with information extracted from the spatial and frequency domains. In this paper we address the video sequence restoration problem as a 3D Huber-Markov random field model and derive the temporal extension to traditional maximum a posteriori (MAP)-based methods. Two schemes, we call temporal MAP (TMAP) and motion compensated TMAP (MC-TMAP) respectively, are presented. We test our methods on MPEG-2 compressed sequences and evaluate their performances with traditional MAP restoration. Experimental results confirm that our schemes can significantly improve the visual quality of the reconstructed sequences.
Trust - “reliance on the integrity, ability, or character of a person or thing” - is pervasive in social systems. We constantly apply it in interactions between people, organizations, animals, and even artifacts. We use it instinctively and implicitly in closed and static systems, or consciously and explicitly in open or dynamic systems. An epitome for the former case is a small village, where everybody knows everybody, and the villagers instinctively use their knowledge or stereotypes to trust or distrust their neighbors. A big city exemplifies the latter case, where people use explicit rules of behavior in diverse trust relationships. We already use trust in computing systems extensively, although usually subconsciously. The challenge for exploiting trust in computing lies in extending the use of trust-based solutions, first to artificial entities such as software agents or subsystems, then to human users’ subconscious choices.
Digital libraries involve various types of data like text, audio, images and video. The data objects are typically very large and of the order of hundreds and thousands of kilobytes. In a digital library, these data objects are distributed in a wide area network. Retrieving large data objects in a wide area network has a high response time. We have conducted experiments to measure the communication overhead in the response time. We have studied the correlation between communication and size of data, between communication and type of data and the communication delay to various sites in a local and wide area network. We present different strategies for reducing delay while communicating multimedia data. Images are amenable to losing data without losing semantics of the image. Lossy compression techniques reduce the quality of the image and reduce the size leading to a lower communication delay. We compared the communication delay between compressed and uncompressed images and study the overhead due to compression and decompression. We present issues in providing digital library service to mobile users and discuss a question: What if communication were free? Finally, we present a framework for efficient communication of digital library data.
Multiversions of data are used in database systems to increase concurrency and to provide efficient recovery. Data versions improve the concurrency by allowing the concurrent execution of “non-conflicting†read-write lock requests on different versions of data in an arbitrary fashion. A transaction that accesses a data item version, which later diagnosed to lead to an incorrect execution, is aborted. This act is reminiscent of the validation phase in the optimistic concurrency control schemes. Various performance studies suggest that these schemes perform poorly in high data contention environments where the excessive transaction aborts result due to the failed validation. We propose an adaptable constrained two-version two-phase locking (C2V2PL) scheme in which these “non-conflicting†requests are allowed only in a constrained manner. C2V2PL scheme assumes that a lock request failing to satisfy the specific constraints will lead to an incorrect execution and hence, must be either rejected or blocked. This eliminates the need for a separate validation phase. When the contention for data among the concurrent transactions is high, the C2V2PL scheduler in aggressive state rejects such lock requests. The deadlock free nature of C2V2PL scheduler adapts to the low data contention environments by accepting the lock request that have failed the specific constraints but contrary to the assumption, will not lead to an incorrect execution. Thus, C2V2PL scheme improves the potential concurrency due to reduced transaction aborts in this conservative state. We have compared performance of our scheme with other lock-based concurrency control schemes such as two phase locking, Wait-depth locking and Optimistic locking schemes. Our results show increase in throughput and reduced transaction-abort-ratio in case of C2V2PL scheme.