Multi-media Analysis of Broadcast Television Data
Benchmarks of state-of-the art automatic Radio and TV Monitoring systems against the world’s best human analysis team have shown that some important information is solely conveyed in the displayed text. This K-project Vision+ activity carried out by AIT and eMedia Monitor GmbH has for the first time demonstrated a proof-of-concept for a fully automated end-to-end (from image to text) text detection and recognition system, which enables a combined audio and visual cue based content analysis framework capturing spoken and displayed concepts and their relations to consistently exceed the performance of the best humans for the first time.
Reliable end-to-end text recognition in broadcast video streams
Within the K-project Vision+ researchers from AIT together with the industrial partner eMedia Monitor, a leading provider of automated media monitoring solutions and services, have laid the foundation for an automated end-to-end (from image to text) text recognition system. The algorithms have been validated on large real-world data sets and the obtained recognition results suggest that all text containing image frames can be efficiently retrieved and analyzed with a high accuracy.
Encountered scientific challenges involve the representational and segmentation aspects centered on the questions: (i) which visual features can informatively describe text along with its variations (size, font, spacing, color) and (ii) how to accomplish reliable segmentation in presence of clutter while maintaining high computational speed even for high-resolution images. Currently achieved results exhibit a promising quality implying that visual and audio/speech information fusion – to be investigated in a later phase of the Vision+ project – can be successfully integrated into a large-scale, real-time broadcast multimedia analysis system.
Parts of the developed scientific concepts have been described in the book chapter Real-Time Multimedia Policy Analysis of Using Video and Audio Recognition from Radio, TV and User-Generated Content in Advanced ICT Integration for Governance and Policy Modeling, which was published by IGI Global in 2014.
Fig. 1: A TV frame example with running text, which is delineated by tracking. The top image shows the segmented running text region.
Fig. 2: A real-world TV frame sample with detected text regions
indicated by blue rectangles
Impact and effects
Automated multimedia content analysis is an application field of rapidly growing importance due to the steadily increasing amounts of digital data and the significance of extracted content for a wide range of customers such as telecommunication organizations, financial services, information management industry as well as non-profit organizations and governing bodies. Automated monitoring provides the data 24×7 in real-time required as input for human decision makers to take action based on the provided situational awareness data. Applications include analysis of the competitive market situation,
Fig. 3: A real-world TV frame sample with detected text regions
and recognized textual Content.
reputation management, crisis management and many others.
By capturing relevant displayed or spoken bits of information within multimedia data, the possibility towards large-scale search and data exploration opens up. Such an analysis has the potential to reveal certain correlations across space, time and topics (e.g. frequently expressed terms, awareness towards reoccurring incidents), which ultimately generate meaningful information from unstructured data.