Keynote Talks

Tue., Sep. 4th
09:15
Peter Kroon
Media Signal Processing in Cell Phones – What is so Smart about it?
Tue., Sep. 4th
10:45
Peter Jax
Signal Enhancement for Future High-Resolution Spatial Audio Formats
Tue., Sep. 4th
14:00
Jan Skoglund
Interactive Audio in a Web-Based World
Tue., Sep. 4th
16:30
Bernd Geiser
Paths toward HD-Voice Communication
Wed., Sep. 5th
09:00
Patrick Naylor
Acoustic Signal Processing in Noise: It‘s Not Getting Any Quieter
Thu., Sep. 6th
09:00
Richard Heusdens
Distributed Signal Processing: Application to MVDR Beam-Forming
Thu., Sep. 6th
11:00
Henning Puder
Optimized Directional Processing in Hearing Aids with Integrated Spatial Noise Reduction
Thu., Sep. 6th
14:00
Sharon Gannot
Multi-Microphone Speech Enhancement

 

Media Signal Processing in Cell Phones – What is so Smart about it?

Tuesday, September 4th, 9:15, Generali-Saal

Peter Kroon

Peter Kroon
Intel, Mobile and Communciations Group, Allentown, United States

Modern cell phones have become widely accepted across the world, and have become small wonders of media signal processing. Although voice will always remain the essential media signal for a phone, many other multimedia applications have found its way into cell phones, making it a true media signal processing device. Voice, audio, image, video and graphics are all present, and are processed using techniques based on years of media signal processing research. To make this all work in a device that is constrained by power, size and cost has turned out to be quite a challenge. This talk will review some of the relevant media standards and processing techniques that are commonly found in cell phones. We also highlight some interesting accomplishments, and describe some of the challenges that we will find ahead.

 

Signal Enhancement for Future High-Resolution Spatial Audio Formats

Tuesday, September 4th, 10:45, Generali-Saal

Peter Jax

Peter Jax
Technicolor Research & Innovation, Hanover, Germany

After the well-known standards of stereo and surround sound have served as a robust baseline for sound creation during the last decades, there is currently a rising trend in the media and entertainment industry towards more sophisticated spatial audio formats. Proposed technical solutions include evolutionary concepts that extend the conventional stereo and surround sound specifications by additional loudspeakers at specific positions, as well as more universal techniques that represent spatial audio content in ways that are intrinsically independent of the target loudspeaker configuration. The latter techniques comprise object-oriented and sound-field-oriented approaches, as well as hybrid technologies.

With the increasing complexity of loudspeaker settings and spatial audio formats, there is also a rising need for more advanced spatial audio signal capturing and post production techniques. Sound mixers need at least the same order of quality and similar ways to manipulate recorded sound as they are used to have within todays production workflows for stereo and surround sound. It is one of the goals at Technicolor to provide sound mixing artists with advanced signal enhancement and processing tools that enable them to create considerably more immersive and compelling spatial audio content than it was possible with stereo and surround formats.

In this talk, we will highlight constraints, technical concepts and open research challenges from this application area. One of the most important constraints is the necessity of having an interacting creative person looped in. This provides interesting opportunities for designing very powerful algorithms without the need for fully automated processing, while creating at the same time the need for careful conditioning of the spatial audio signals for the manual interaction.

 

Interactive Audio in a Web-Based World

Tuesday, September 4th, 14:00, Generali-Saal

Jan Skoglund

Jan Skoglund
Chrome at Google, Mountain View, United States

This talk will discuss the challenges of building audio processing applications at scale and the need for robustness of these tools when used by millions of users having diverse usage environments ranging from mobile devices to multi-channel home theatre setups. The approach we have chosen to address these challenges is to drive both real-time and non-real-time applications from a web browser. By using an open source model collaboration between industry and academia is easily enabled and can also accelerate development faster than a closed system. Examples of such collaboration will be given.

 

Paths toward HD-Voice Communication

Tuesday, September 4th, 16:30, Generali-Saal

Julian Spittka

Bernd Geiser
RWTH Aachen University, Aachen, Germany

These days, the telecommunication world is undergoing a major technology change toward a universal, packet-based network architecture for both fixed and mobile communications. The main motivations and incentives behind this effort are presumably improved flexibility and cost-efficiency. But in particular for speech and audio communication applications, the opportunity should be seized to promote high quality services which are far superior to the long-accustomed narrowband speech telephony experience.  Indeed, new audio codecs, delivering additional functionality and a much better audio quality, are deployed much quicker within such a (future) network environment.

But, as a matter of fact, very little is done to improve the audio quality for today's communication networks. Instead, "least common denominator solutions" are pursued, keeping up the status quo of narrowband speech. At first sight, this might appear reasonable from the economic and marketing perspectives. However, it is nevertheless true that subscribers of new services will still experience inferior quality if their communication partner uses an old telephone or circuit-switched network access, e.g., via GSM/UMTS speech channels or private/government subnetworks. Large parts of the worldwide telephone network are in fact based on such legacy technology and can be expected to prevail for a long time. To this end, new, more advanced methods and algorithms for "High Definition" audio transmission and reproduction are required that maintain interoperability with legacy network components.

In this contribution, current developments in packet-based HD-voice communication are summarized and a future perspective toward systems for binaural/ambient audio communication is given. Moreover, the, usually problematic, interoperability issue is addressed. Therefore, several algorithmic approaches—including embedded coding, receiver- or network-based parameter estimation, and steganographic parameter transmission—are discussed based on the practically relevant example of parametric bandwidth extension for speech and audio signals.

 

Acoustic Signal Processing in Noise: It‘s Not Getting Any Quieter

Wednesday, September 5th, 9:00, Generali-Saal

Patrick Naylor

Patrick Naylor
Imperial College London, London, United Kingdom

Processing signals degraded by noise brings specific challenges, particularly for speech signals which may also suffer reverberation associated with room acoustics and nonlinear distortions such as caused in voice networks. Our societes' economic and social ambitions, together with a quickly growing global population and increasing urbanisation, point only to the increasing importance of human communication technology that is robust to noise, and which might eventually even help to reduce it. Even today one might sometimes ask: what would we give for a little peace and quiet?

Many researchers are continuously amazed at just how well humans can communicate even in scenarios with severe degradations in the speech signal. It seems intuitive, however, that this human capability does not come without substantial cognitive load. Recent experiments will be used to illustrate how one might quantify the increase in cognitive load associated with human speech understanding as a function of the type and level of degradation applied to speech. This kind of information has the potential then to inform the design of speech enhancement technology so as to maximize listening comfort and intelligibility.

As well as additive noise, speech may be degraded by convolution with an unknown acoustic transmission channel so as to cause reverberation. Estimates of the channel characteristics can be useful to enhance the received speech signal if such characteristics can be estimated blindly and with the necessary accuracy. Recent research on multi-channel blind SIMO acoustic system identification has spawned new research on the resulting acoustic inverse filtering problem which will be described as a way to reduce reverberation, though usually at the cost of introducing some level of undesired artefacts. In the case of single-channel acquisition, alternative approaches to channel estimation will be presented with examples.

The talk will also emphasize the importance of comparable evaluations of acoustic signal processing algorithms, using the same data and the same metrics, such as advocated in the  recently launched IEEE AASP Challenges.

 

Distributed Signal Processing: Application to MVDR Beam-Forming

Thursday, September 6th, 9:00, Generali-Saal

Richard Heusdens

Richard Heusdens
Delft University of Technology, Delft, Netherlands

With the emergence of (large scale) wireless audio sensor networks, there is a need for a new class of algorithms that can implement audio processing algorithms in a distributed fashion. Wireless audio sensor networks consist of a large number of nodes, each having a sensing (microphone), data processing, and communication component. In such networks, due to the absence of a central processing point (fusion center), nodes use their own processing ability to locally carry out simple computations and transmit only the required and partially processed data to neighboring nodes. Despite these simple operations of the individual nodes, jointly they are able to perform relatively complex tasks. The decentralized settings in which signal processing algorithms then have to be deployed are typically dynamic, in the sense that sensors are added, removed, or moving, usually in an unpredictable way. In those settings, the algorithms must allow for a parallel implementation, must be easily scalable, must be able to exploit the possible (large) sparse geometry in the problem and must be numerically robust against (small) changes in the network topology.

We will present an iterative, distributed MVDR beamforming algorithm using message passing. The algorithm is based on probabilistic inference in random Markov fields. At each iteration, each node in the network keeps track of so-called messages received from neighboring nodes, which are used to make a new estimate of the final solution and construct new messages to be transmitted at the next iteration. In the talk we will give an introduction to message-passing algorithms and show their suitability for a variety of signal processing applications in wireless sensor networks, where we use the MVDR beamformer application to illustrate the algorithm.

 

Optimized Directional Processing in Hearing Aids with Integrated Spatial Noise Reduction

Thursday, September 6th, 11:00, Generali-Saal

Henning Puder

Henning Puder, Eghart Fischer, Jens Hain
Siemens Audiologische Technik, Erlangen, Germany

In this contribution a differential beamformer for hearing aids is presented combined with a direction dependent noise reduction. The target is to achieve good speech intelligibility and quality in adverse noisy environments while coping with the hearing aid constraints such as small microphone distances and head shading.

First, a differential beamformer in sub-bands is presented which allows a good interference cancellation for small beamformer apertures. The sub-band structure allows to optimize the direction dependent attenuation by an adaptation to the noise interference and the head shading. When applying those differential structures, high microphone noise amplification occurs which is generally limited by severely constraining the beamformer performance for low frequencies. In this contribution we present a fast adaptation control which allows to simultaneously minimize the ambient interference and the microphone noise: At each time instance, and independently for each sub-band, the adaptation selects the maximum possible interference cancellation where the residual interference after beamforming just masks the microphone noise.

The direction dependent noise reduction complements the well-known stationary and transient noise reduction procedures which are typically applied after the beamformer. It allows to suppress noise components independent of their stationarity properties. A noise reference is calculated within the beamformer by attenuating signals from the front direction which is assumed to be the target signal direction. The direction dependent noise reduction allows to suppress all kinds of interference components arriving from outside the look direction. Finally, we show approaches which allow an optimized combination of the direction dependent noise reduction and the well-known noise reduction procedure in order to minimize artefacts and optimize the sound quality.

 

Multi-Microphone Speech Enhancement

Thursday, September 6th, 14:00, Generali-Saal

Sharon Gannot

Sharon Gannot
Bar-Ilan University, Ramat Gan, Israel

Microphone array algorithms emerged in the early 1990s as viable solutions to speech processing problems. However, the adaptation of beamforming methods to speech processing is still an open issue. There are many difficulties which arise from the characteristics of the speech signal and the acoustic environment. The speech signal is a wide-band and nonstationary signal. Very long, time-varying, room impulse responses may be attributed to multiple reflections of the sound field and to moving objects in the acoustic enclosure.

In this talk, we will focus on spatial processors ("beamformers") based on the linearly constrained minimum variance (LCMV) criterion, and its special case, the minimum variance distortionless (MVDR) beamformer. The implementation of the LCMV beamformer in the short-time Fourier transform (STFT) domain and its structuring as a generalized sidelobe canceller (GSC) facilitate the application of the presented algorithms to speech signals in real acoustic environments.

We will show how the powerful LCMV criterion can be applied to various related problems. For example, speech enhancement, extraction of desired speakers in multiple competing speaker environment, and combined noise reduction and echo cancellation. Special attention will be given to blind estimation techniques of the GSC components and to the efficient design of its various blocks. We will also elaborate on the relative transfer function (RTF) and its importance in speech processing. We will conclude our talk with a discussion of the applicability of the LCMV to binaural processing. If time permits, novel distributed microphone array architectures will be reviewed, and the new advantages and challenges they raise will be explored. The presentation will be accompanied by processed audio files demonstrating the algorithms' performance.

Logo IWAENC

 

  • Home
  • Committees
  • Call for Papers
  • Program
    • Keynote Talks
    • Papers and Authors
    • Tuesday, Sep. 4th
    • Wednesday, Sep. 5th
    • Thursday, Sep. 6th
    • Social Events
    • Satellite Workshop
  • Download
  • Paper Submission
  • Registration
  • Accommodation
  • Venue
  • Sponsors
  • Previous IWAENCs
  • Conference Secretariat
  • Legal Notice

 

Sponsored by VDE   RWTH-Logo   Organized by IND