Reports and Papers Archive - Reports & Papers

An automatic face detection and recognition system for video indexing applications

E Acosta, L Torres, A Albiol, E Delp

Download: PDF

The objective of this work is the integration and optimization of an automatic face detection and recognition system for video indexing applications. The system is composed of a face detection stage presented previously which provides good results while maintaining a low computational cost (see Albiol, A. et al., Proc. IEEE Int. Conf. on Image Proc., vol.2, p.239-42, 2000). The recognition stage is based on the principal component analysis (PCA) approach which has been modified to cope with the video indexing application. After the integration of the two stages, several improvements are proposed which increase the face detection and recognition rate and the overall performance of the system. Good results have been obtained using the MPEG-7 video content set used in the MPEG-7 evaluation group.

Added 2008-04-03

The indexing of persons in news sequences using audio-visual data

A Albiol, L Torres, Delp, E.J.

Download: PDF

We describe a video indexing system that automatically searches for a specific person in a news sequence. The proposed approach combines audio and video confidence values extracted from speaker and face recognition analysis. The system also incorporates a shot selection module that seeks for anchors, where the person on the scene is likely speaking. The system has been extensively tested on several news sequences with very good recognition rates.

Added 2008-04-03

Client-server computing in mobile environments

JIN JING, AS HELAL, A ELMAGARMID

Added 2008-04-03

Automated video summarization using speech transcripts

CM Taskiran, M Cuneyt, A Amir, DB Ponceleon, EJ Delp

Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries for long videos. Our video summarization approach involves mainly two tasks: first, segmenting the video into small, coherent segments and second, ranking the resulting segments. Our proposed algorithm scores segments based on word frequency analysis of speech transcripts. Then a summary is generated by selecting the segments with the highest score to duration ratios and these are concatenating them. We have designed and performed a user study to evaluate the quality of summaries generated. Comparisons are made using our proposed algorithm and a random segment selection scheme based on statistical analysis of the user study results. Finally we discuss various issues that arise in evaluation of automatically generated video summaries.

Added 2008-04-03

Video preprocessing for audiovisual indexing

A Albiol, L Torres, EJ Delp

Download: PDF

We address the problem of detecting shots of subjects that are interviewed in news sequences. This is useful since usually these kinds of scenes contain important and reusable information that can be used for other news programs. In a previous paper, we presented a technique based on a priori knowledge of the editing techniques used in news sequences which allowed a fast search of news stories (see Albiol, A. et al., 3rd Int. Conf. on Audio and Video-based Biometric Person Authentication, p.366-71, 2001). We now present a new shot descriptor technique which improves the previous search results by using a simple, yet efficient, algorithm, based on the information contained in consecutive frames. Results are provided which prove the validity of the approach

Added 2008-04-03

A discussion of leaky prediction based scalable coding

Y Liu, Z Li, P Salama, EJ Delp

Download: PDF

In this paper, we focus on the leaky prediction based scalable coding (LPSC) structure and present a general framework for LPSC. We demonstrate the similarity between LPSC and motion compensation based multiple description coding scheme. We show that since the information contained in the enhancement layer in LPSC is actually a mismatch between two descriptions for each frame, it cannot be guaranteed that the enhancement layer always achieves superior reconstruction quality beyond that achieved by the base layer. We derive three reconstructions for each frame under the LPSC framework, and propose a maximum-likelihood (ML) estimation scheme for LPSC video reconstruction at the decoder. This generally achieves superior decoded video quality than both the enhancement layer and the base layer.

Added 2008-04-03

Supporting top-kjoin queries in relational databases

Ihab F. Ilyas, Walid G. Aref, and Ahmed K. Elmagarmid

Added 2008-04-03

Nile: A query processing engine for data streams

WG Aref, AC Catlin, AK Elmagarmid, M Eltabakh, MG

Added 2008-04-03

Automated closed-captioning using text alignment

A Martone, C Taskiran, EJ Delp

The production of closed captions is an important but expensive process in video broadcasting. We propose a method to generate highly accurate off-line captions efficiently. Our system uses text alignment to synchronize program transcripts obtained for a video program with text produced by an automatic speech recognition (ASR) system. We will also describe the accuracy in both closed-caption text and the ASR output for a number of news programs and provide a detailed analysis of the errors that occur.

Added 2008-04-03

Detection of unique people in news programs using multimodal shot clustering

CM Taskiran, A Albiol, L Torres, EJ Delp

Download: PDF

In this paper, we describe an approach that uses a combination of visual and audio features to cluster shots belonging to the same person in video programs. We use color histograms extracted from keyframes and faces, as well as cepstral coefficients derived from audio to calculate pairwise shot distances. These distances are then normalized and combined to a single confidence value which reflects our certainty that two shots contain the same person. We then use an agglomerative clustering algorithm to cluster shots based on these confidence values. We report the results of our system on a data set of approximately 8 hours of programming.

Added 2008-04-03

Combining audio and video for video sequence indexing applications

A Albiol, L Torres, EJ Delp

Download: PDF

We address the problem of detecting shots of subjects that are interviewed in news sequences. This is useful since usually these kinds of scenes contain important and reusable information that can be used for other news programs. In a previous paper, we presented a technique based on a priori knowledge of the editing techniques used in news sequences which allowed a fast search of news stories. We present a new shot descriptor technique which improves the previous search results by using a simple, yet efficient algorithm, based on the information contained in consecutive frames. Results are provided which prove the validity of the approach.

Added 2008-04-03

Encoding of predictive error frames in rate scalable video codecs using wavelet shrinkage

E Asbun, P Salama, EJ Delp

Download: PDF

Rate scalable video compression is appealing for low bit rate applications, such as video telephony and wireless communication, where bandwidth available to an application cannot be guaranteed. In this paper, we investigate a set of strategies to increase the performance of SAMCoW, a rate scalable encoder. These techniques are based on based on wavelet decomposition, spatial orientation trees, and motion compensation

Added 2008-04-03

Analysis of the efficiency of SNR-scalable strategies for motion compensated video coders

J Prades-Nebot, GW Cook, EJ Delp

Download: PDF

In this paper, an analysis of the efficiency of three signal-to-noise ratio (SNR) scalable strategies for motion compensated video coders and their non-scalable counterpart is presented. After assuming some models and hypotheses with respect to the signals and systems involved, we have obtained the SNR of each coding strategy as a function of the decoding rate. To validate our analysis, we have compared our theoretical results with data from encodings of real video sequences. Results show that our analysis describes qualitatively the performance of each scalable strategy, and therefore, it can be useful to understand main features of each scalable technique and what factors influence their efficiency.

Added 2008-04-03

Spatial synchronization using watermark key structure

ET Lin. EJ Delp

Recently, we proposed a method for constructing a template for efficient temporal synchronization in video watermarking. Our temporal synchronization method uses a state machine key generator for producing the watermark embedded in successive frames of video. A feature extractor allows the watermark key schedule to be content dependent, increasing the difficulty of copy and ownership attacks. It was shown that efficient synchronization can be achieved by adding temporal redundancy into the key schedule. In this paper, we explore and extend the concepts of our temporal synchronization method to spatial synchronization. The key generator is used to construct the embedded watermark of non-overlapping blocks of the video, creating a tiled structure. The autocorrelation of the tiled watermark contains local maxima or peaks with a grid-like structure, where the distance between the peaks indicates the scale of the watermark and the orientation of the peaks indicate the watermark rotation. Experimental results are obtained using digital image watermarks. Scaling and rotation attacks are investigated.

Added 2008-04-03

Full-field mammogram analysis based on the identification of normal regions

Y Sun, C Babbs, EJ Delp, E.J.

Download: PDF

We present a new method for full-field mammogram analysis. A mammogram is analyzed region by region and is classified as normal or abnormal. We present methods for extracting features that can be used to distinguish normal and abnormal regions of a mammogram. We describe our classifier technique that uses a unique reclassification method to boost the classification performance. We have tested this technique on a set of ground-truth full-field mammograms.

Added 2008-04-03