• Non ci sono risultati.

Measurement and numerical simulation of Binaural and B-format impulse responses in concert halls

N/A
N/A
Protected

Academic year: 2021

Condividi "Measurement and numerical simulation of Binaural and B-format impulse responses in concert halls"

Copied!
26
0
0

Testo completo

(1)

and B-format impulse responses in concert halls

Angelo Farina

1

, Lamberto Tronchin

2

1 Industrial Eng. Dept., University of Parma, Via delle Scienze 181/A - 43100 Parma – Italy – E-mail:

farina@pcfarina.eng.unipr.it

2DIENCA-CIARM, University of Bologna, Viale Risorgimento 2, 40136 Bologna – Italy – E-mail:

tronchin@ciarm.ing.unibo.it

ABSTRACT

The paper describes a new, hybrid method for measuring the spatial/temporal transfer function of a concert hall, which employs both a binaural probe and a B-format microphone.

It is shown how these two techniques can be complementary, both in terms of determination of objective parameters and for performing audible reconstructions (auralization).

The main advantage of this measurement/simulation method is that it makes it possible to conduct comparative listening tests with virtually zero delay for switching.

Consequently, the short-time perceptual memory of the subjects is employed, allowing for the detection of very subtle differences, which are not perceivable in normal listening tests, due to the excessive time delay between the presentation of the stimuli.

INTRODUCTION

The goal of this paper is to exploit the complementarities of two ways of representing the spatial/temporal transfer function of a concert hall: the binaural approach and the B-Format approach. In the first case, the impulse response is measured or computed between an omnidirectional point source and two pressure microphones located at the entrances of the ear channels of a dummy head and torso.

In the second case, 4 impulse responses are measured or computed, starting again from an omnidirectional sound source, and reaching a so-called Soundfield microphone, which is a 4-channel probe, which captures the sound pressure (omnidirectional microphone) and the three Cartesian components of the particle velocity.

From these two different data sets, different acoustical quantities can be derived: from binaural measurements, IACC and derivations can be computed; similarly, from B-format measurements, quantities such as LF or Front-to-back ratio can be derived.

The paper does not focus on the counter-position between the two techniques, but instead on the possibility to employ both of them simultaneously. In fact, a modern technique was recently developed by Ralph Glasgal [1], called Ambiophonics, which aims to the reproduction of a realistic sound field by simultaneous usage of the binaural approach (from which a pair of closely-spaced loudspeakers, with cross-talk canceling filters, can be driven) and of the B-format approach (driving, by convolution with the B-format impulse response, a suitable 3D array of loudspeakers, employing an Ambisonics decoder or other decoding schemes).

It must be noted that the original Ambiophonics scheme introduced by Ralph Glasgal is substantially less constrained than the implementation described here. In fact, the Glasgal method is applied to generic stereo recording (made with any true stereo recording technique, such as ORTF, sphere-microphone, etc.), and the

“surround” part of the system does not require a regular array of identical loudspeakers. In practice, the Glasgal approach is mainly devoted to high-quality playback of existing music recordings, whilst the technique described here is focused to instrumentation-degree playback under controlled conditions of special recordings made for conducting psychoacoustical tests.

The experiments conducted by the authors about the Ambiophonics approach made it possible to establish the strong and weak points of each of the two techniques which constitute it, and to find the optimal combination, which maximizes the benefits of both. In practice, it resulted that the binaural approach is far superior for describing the direct sound and the early reflections coming from the stage enclosure, whilst the B-format approach is better for describing the late part of the reverberant tail and its surrounding (or enveloping) effect.

This means that, for optimal Ambiophonics reproduction, the measured or computed impulse responses should be edited: the binaural ones, employed for driving the frontal pair of close loudspeakers (Stereo Dipole), need to be cut just after the first reflections. Instead, the B-format IR should be smoothed in its first part, leaving only the late reflections and the subsequent diffuse tail. Although this editing is somewhat arbitrary, a series of subjective blind listening test clearly demonstrated that this hybrid approach is much superior to each of the

(2)

This means that an optimal characterization of an existing concert hall, or the simulations performed during the design of a new one, should include both the binaural and the B-format IRs. The tools for doing the measurements or simulations are now easily available, and the paper presents some of them, which were developed by the authors.

METHODS AND EQUIPMENT FOR MEASUREMENT or RECORDING

First of all, what follows is applicable both to “live recordings”, made during a musical performance, and to measurements of the impulse responses. In the second case, the musical performance can be reconstructed later, by convolution of properly recorded anechoic signals with the measured impulse responses.

These measurements/recordings are made with two complementary techniques: dummy-head binaural recordings and 4-channels, B-format recordings made employing a Soundfield microphone. This means that a total of 6 channels need to be recorded simultaneously.

Figure 1 shows one of the dummy heads employed (a B&K Type 4100), side-by-side with the Soundfield microphone (MK-V). In this case, both are mounted on torso simulators, which can be easily placed on the chairs.

Fig. 1 – Binaural and B-format microphones

The signals coming from the dummy head and from the Soundfield microphone are recorded by

means of a professional multichannel soundboard (Echo Layla), which is capable of recording

up to 8 channels at 48 kHz, 24 bits.

(3)

This is a rack-mounted unit, which is installed, together with the industrial PC and the other audio equipment, in a portable rack. This is shown in figure 2.

Fig. 2 – The rack containing the measuring/recording equipment MEASUREMENTS IN CONCERT HALLS

M.Gerzon [2] first proposed to start a systematic collection of 3D impulse responses measured in ancient theatres and concert halls, for assessing their acoustical behavior and preserving it for the posterity. His proposal found sympathetic response only very recently, with the publication of the “Charta of Ferrara” [3] and the birth of an international group of researchers who agreed on the experimental methodology for collecting these measurements [4]. This methodology allows both for Binaural and/or B-format measurements.

Gerzon proposed to measure B-format impulse responses, however nowadays it is better to always measure simultaneously both Binaural and B-format impulse responses.

Only a small number of theatres have yielded a complete three-dimensional, dual-format impulse response characterization up till now. Among them, we employed for the present work the IRs measured in three Italian theatres:

- Auditorium “Paganini” in Parma

- Auditorium “S.Domenico” in Foligno

- Theatre “Mazzacorati” in Bologna

(4)

2 2+3 3 3+4 4 1+2

Fig. 3 – Plan and Section of Auditorium “Paganini” in Parma

S R

(5)

Fig. 4 – Plan and Sections of Auditorium “S.Domenico” in Foligno

42 m

16 m

S

R

(6)

Fig. 5 – Plan of Theatre “Mazzacorati” in Bologna

Figures 3, 4 and 5 show, for each theatre, a schematic plan of the room with the positions of the sound source and of the microphone.

The following table contains the main geometrical quantities for the three theatres:

Theater Volume (m

3

) Plan Area (m

2

) Seats

Paganini – Parma 16300 850 780 (stalls)

S.Domenico - Foligno 18400 1050 532 (stalls)

+130 (rear gradons)

Mazzacorati - Bologna 1000 90 70 (stalls)

+ 50 (balconies) Compared with their seat capacity and their volumes, these three rooms are quite live, much more than most Italian opera houses; in substance, these three performing spaces are among the few Italian theatres resembling the acoustical characteristics of north-European concert halls of the same size.

This is demonstrated by fig. 6, which shows the measured reverberation times in the three theatres (all measurements shown here were conducted with the room empty, but in the Parma Auditorium the measurements were also repeated with the room completely full, showing a reduction of the reverberation time substantially constant with frequency, and less than 5%).

S

R

(7)

Reverberation Time T20

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

63 125 250 500 1000 2000 4000 8000 16000

Frequency (Hz)

T20 (s)

Auditorium Parma Auditorium Foligno Teatro Mazzacorati

Fig. 6 – Reverberation time of the three theatres (unoccupied)

In the following figs. n. 7, 8, 9, the binaural impulse responses measured in the selected positions of the three rooms are shown.

Fig. 7 – Binaural Impulse Response. Auditorium of Parma, Source S1, Receiver n.10, row n. 23 – first 300 ms

time (ms)

amplitude

left

righ t

(8)

Fig. 8 – Binaural Impulse Response. Auditorium of Foligno, row n. 3 – first 300 ms

Fig. 9 – Binaural Impulse Response. Teatro Mazzacorati, row n. 8 – first 300 ms

time (ms)

time (ms)

amplitudeamplitude

left

right left

right

(9)

The following table contains the most important acoustical parameters (computed according to the ISO-3382 standard) in the three theatres (A-weighted band).

Parameter Parma Foligno Mazzacorati C50 [dB] -0.64 6.44 0.6 C80 [dB] 1.00 7.45 3.09

Ts [ms] 176 47 80

EDT [s] 2.40 2.10 1.23 T10 [s] 2.23 2.42 1.13 T20 [s] 2.23 2.54 1.14 T30 [s] 2.25 2.63 1.15

IACC 0.24 0.54 0.09

METHODS AND EQUIPMENT FOR SOUND SIMULATION

Once the recordings are done, these need to be replayed inside the listening room, making use of advanced digital signal processing techniques, for ensuring that the listener seating inside this special room is hearing exactly the same soundfield which was present inside the concert hall where the recordings were made.

In the case of impulse response measurements, the processing includes convolution of anechoic signals with the measured IRs, which is nowadays easily performed on the PC itself, thanks to a specific plug-in which makes use of the new, very fast FFT routines developed by Intel. After the convolution is done, there is no practical difference between the playback of “live”

recordings and of “virtual” recordings obtained by convolution; hence, in the following, no distinction is made among these two cases.

In practice, the binaural signal is reproduced by means of two stereo dipoles (one in front, one

behind the listener), as shown in figs. 10 and 11. It must be noted that the usage of a single

frontal pair of loudspeakers results in typical front-back confusion errors, so a dual stereo-dipole

system is definitely preferable, as clearly shown in [5].

(10)

Fig. 10 - Frontal Stereo Dipole

Fig. 11 – Rear Stereo Dipole

The frontal Stereo-Dipole is made with two Quested 2108 loudspeakers, the rear Stereo-Dipole is made with two Quested F11P monitors. Each loudspeaker pair is driven with a Crown K1 current-controlled power amplifier.

The signals driving the stereo dipoles are processed with cross-talk canceling filters, implemented making use of the methods described in the next chapter. The processed signals for the 2 stereo dipoles are reproduced by means of the first 4 channels of an Echo Layla soundboard.

Regarding the reproduction of the B-format soundtrack, a complete three-dimensional

Ambisonics system is employed. 8 identical loudspeakers (Turbosound Impact 50) are mounted

(11)

in double-square configuration (4 are at the corners of an horizontal square, and the other four are at the corners of a vertical square, at the sides of the listener). Figure 12 shows a panoramic view of the listening room, in which the position of the Ambisonics loudspeakers can be noticed on the walls and on the ceiling.

Fig. 12 – loudspeakers for Ambisonics reproduction

These 8 loudspeakers are powered by a single 8-channels amplifier (QSC CX168), which ensures that exactly the same gain is always maintained for the 8 channels.

The 4 channels of the B-format recording are played by means of the channels 5, 6, 7 and 8 of the Each Layla sound card. The conversion between the B-format signal and the 8 loudspeaker feeds (usually called “Ambisonics decoding”) is made be means of an external DSP processing unit (BSS Soundweb), which implements a suitable processing network, described in the following chapters.

The last two channels of the Echo Layla sound board (9, 10) are employed for driving a self- powered subwoofer (Audio Pro B1-20), which is located under the frontal Stereo Dipole, as shown in figure 13.

Finally, the computer employed for driving the reproduction of the recordings, and for collecting the automated questionnaires described at the end of this paper, required to be completely noiseless: a liquid-cooled, fan-less computer was employed for this (SID FutureClient). For the same reason, the cooling fans of the BSS Soundweb and of the QSC power amplifier were switched off.

Front-Right

Down-Right Up-Right

(12)

at 1m distance.

The computer is equipped with a large LCD display, and with wireless keyboard and mouse, so that it can be operated also from a listener seated at the center of the “sweet spot”, 2.5m far from the computer itself.

Fig. 13 – Active Subwoofer

The reproduction of the multichannel recordings is made employing a multichannel wave editor (CoolEditPro), which was also used, thanks to the availability of a set of specialized plug-ins (Aurora), for measuring the cross-talk functions and for computing the cross-talk canceling inverse filters.

Figure 14 shows the rack containing, from top to bottom, the Echo Layla sound board, the BSS Soundweb the two Crown K1 amplifiers, the QSC CX168 8-channels amplifier, and the liquid- cooled computer.

The semi-anechoic listening room utilized during these experiments has a rectangular shape (4.0

x 4.2 meters, 2.4 height).

(13)

Philips 15” Brilliance LCD display Logitech wireless keyboard & mouse Echo Layla Soundboard (8 ins, 10 outs)

BSS Soundweb digital processor 2 Crown K1 amplifiers 2 Crown K1 amplifiers QSC CX168 8-channels amplifier Signum Data Futureclient fanless PC

Fig. 14 – electronic equipment installed in the listening room.

The following chapters describe in detail the setup of the binaural reproduction over the stereo- dipoles, the Ambisonics reproduction of the B-format soundtrack and the subjective listening test system, employed for collecting the questionnaires being compiled by the subjects.

Finally, the results of a small comparative listening test between the three reproduction methods (Stereo Dipoles, Ambisonics and combined) are reported.

THEORY OF STEREO DIPOLE REPRODUCTION SYSTEM

The theory of the Stereo-Dipole reproduction system for binaural signals is widely known [6,7,8], and consequently it needs to be recalled here only very shortly.

As with other “transaural” reproduction systems, the concept is to cancel the cross-talk signal which is going from the left loudspeaker to the right ear, and vice versa.

The approach employed here is derived from the research about the Stereo Dipole implementation originally developed by Kirkeby and Nelson [9,10].

The 4 cross-talk canceling filters f, which are convolved with the original binaural material, have to be designed so that the signal collected at the ears of the listener are identical to the original signals. The following fig. 15 shows the cross-talk phenomenon in the reproduction space:

Binaural

stereo signal yl

p

l

p

r

x

l

x

r

Cross-talk canceling filters

f

lr

f

ll

f

rr

f

rl

y

l

y

L R

h

ll

h

rr

h

lr

h

rl r

Fig. 15 – cross-talk canceling scheme

(14)

Imposing that p

l

=x

l

and p

r

=x

r

, a 4x4 linear equation system is obtained. Its solution yields:

   

   

   

   

 

 

rl lr rr ll ll

rr

rl lr rr ll rl

rl

rl lr rr ll lr

lr

rl lr rr ll rr

ll

h h h h InvFilter h

f

h h h h InvFilter h

f

h h h h InvFilter h

f

h h h h InvFilter h

f

(1)

The problem is the computation of the

InvFilter

  function, as its argument is generally a mixed-phase function. One could try to compute the inverse as the reciprocal in frequency domain:

      

x IFFT FFT

) ( x

InvFilter

1

(2) But this approach is numerically unstable, as the reciprocal of a mixed-phase function is usually infinite and not-causal.

Of consequence, some sort of approximate inversion method is required: here the Kirkeby- Nelson frequency-domain regularization method was employed, due to its speed and robustness.

A further improvement over the previously published work [11] consists in the adoption of a frequency-dependent regularization parameter. In practice, the argument of the inverse filter is directly computed in the frequency domain, where the convolutions are simply multiplications, with the following formula:

) h ( FFT ) h ( FFT ) h ( FFT ) h ( FFT ) (

C   llrrlrrl

(3)

Then, the complex reciprocal of C() is taken, adding a small, frequency-dependent regularization parameter, and finally an IFFT yields the required inverse filter:

   

     





 

    

C C IFFT C h

h h h InvFilter

*

* rl

lr rr

ll

(4)

In practice, ) is chosen with a constant, small value in the useful frequency range of the

loudspeakers employed for reproduction (80 Hz – 16kHz in this case), and a much larger value

outside the useful range. A smooth, logarithmic transition between the two values is interpolated

over a transition band of 1/3 octave.

(15)

Fig. 16 – frequency-dependent regularization parameter

The advantage of this approach is that it takes into account the real performance of the loudspeakers and of the binaural dummy head, because the computation of the inverse filters is based on the measured transfer functions.

Fig. 17 – measured transfer functions of the frontal Stereo Dipole

Fig. 18 – inverse filters for the frontal Stereo Dipole

fNy q

0

(f)

0.01 1

80

Hz 16 kH

z

fll

hll

flr

frl

hlr

frr

hrl

hrr



time (ms)



time (ms)

(16)

and its inverse filters (both in terms of impulse responses and of frequency responses), obtained by a multiple MLS measured method, which was already presented in previous papers [12,13].

In practice, the inverse filters shown in fig. 18 do not simply provide effective cross-talk cancellation; they completely equalize both the magnitude and the phase response of each loudspeaker, and they also compensate for the particularities of the specific dummy head employed during the measurements.

Fig. 19 – verification of the frontal Stereo Dipole

Figure 19 shows the verification of the cross-talk cancellation and of the frequency response linearization obtained by means of the inverse filters on the frontal Stereo Dipole: comparing with Fig.17, the two transfer functions h

ll

and h

rr

have been equalized, obtaining a Dirac function in time domain i.e. flat function in frequency domain, while the two transfer functions h

rl

and h

lr

have been attenuated by more than 40 dB over the whole frequency range.

As both stereo-dipole systems are being operated simultaneously, this reproduction system is substantially more robust to head movements than traditional systems which make use of just two loudspeakers. In practice the system revealed to be almost immune from problems such as front-back confusion and apparent rotation of the sound field together with the rotation of the listener’s head: this means that the listening is much more natural than what can be obtained by means of headphones.

For quantitative results demonstrating the superiority of the dual Stereo-Dipole, see [7].

THEORY OF AMBISONICS REPRODUCTION SYSTEM

Although much older than the Binaural method, the Ambisonics technology needs a more detailed description, as it is nowadays almost unknown.

The Ambisonics method was initially developed by Michael Gerzon [14], and is based on the decomposition of the spatial sound pressure field in a point in its spherical harmonics of various orders.

Although “higher order” Ambisonics formulations do exist, here we refer simply to the so-called

“1

st

order” Ambisonics formulation, which employs the 0-th order spherical harmonics (which is simply a sphere) and the three first-order harmonics (which are clepsydrae-shaped surfaces).

Figure 20 shows the 3D shape of these spatial functions, which correspond to the directivity pattern of the 4 channels captured by a Soundfield microphone.

time (ms)



hll

hlr hrr

hrl

(17)

Fig. 20 – shape of spherical harmonics of orders 0 (top-left) and 1 (the other three).

The 4 channels coming from 4 microphones with the above-shown directivities are called respectively W (omnidirectional) and X, Y, Z (three orthogonal figure-of-eight microphones).

These 4 signals, together, constitute the so-called B-format signal. In the case of plane, progressive waves, it is easy to see how the three directive channels X,Y,Z are in substance proportional to three Cartesian components of the particle velocity, whilst the omnidirectional signal is proportional to the acoustical pressure.

The principle of the Ambisonics methodology is that these 4 channels can be processed in such a way to drive a three-dimensional array of loudspeakers, reconstructing properly the pressure, the particle velocity and hence the intensity vector at the center of the reproduction array (where the soundfield is nominally recreated identical to the one sampled in the original room where the recording of these 4 channels was done).

The computation of the speaker feeds from the B-format signal is called “Ambisonics decoding”.

This can be quite complex if the array of loudspeakers is irregular, but instead it becomes trivial if a regular array is employed (the simplest case being a cube, with the loudspeaker in its vertexes).

The following fig. 21 shows the geometry of such a simple case, where the angles between the

vectors pointing to the loudspeakers and the reference axes are denoted ,  and .

(18)

r

y

x

Fig. 21 – Geometry of the speaker array.

The feed of any of the 8 loudspeaker can be obtained simply as a weighted sum of the 4 input channels:

 

G W G X cos( ) Y cos( ) Z cos( )

2

Fi11  2        

(5)

Which corresponds to the block schematic reported in fig. 22.

W X Y Z

G1 G2 G2 G2

cos 1

cos 1

cos 1

InvFilt1

cos 2

cos 2

cos 2

InvFilt2

Fig. 22 – Flow diagram of the decoder – only the first two speaker feeds are shown

It must be noted that, just before the loudspeaker, an equalizing inverse filter is inserted, for

ensuring perfect gain and phase matching between all of them.

(19)

The gains G

1

and G

2

were originally delivered by proper analogue shelf filters, for adjusting the soundfield reconstruction to two different auditory mechanisms, one holding for the low frequencies (below 500 Hz), and the other for higher frequencies. These gains are also influenced by the number N of loudspeakers in the array.

If the array is not regular, it is necessary to introduce different gains for each loudspeaker, instead of employing the same gains for all of the loudspeakers. This topic was recently investigated by Moorer [15], who developed a mathematical formulation (derived from the original Ambisonics scheme) which allows for the computation of the gain matrix for any number and positions of a symmetrical 2D speaker array. The Moorer formulation, anyway, reduces simply to G

1

=G

2

=1 when the array is regular, and thus evidently it does not take into account the psychoacoustic shelf filtering. Furthermore, the case of complete 3D reconstruction is not covered, although certainly this is feasible with a simple extension of the Moorer equations.

For the following, only the cases regarding almost regular speaker arrays are considered.

Even in this case, the proper values of the gains were never published clearly, and probably different manufacturers of hardware decoders were using different values.

In U.S. pat. N. 3,997,725, M. Gerzon reports the optimal values of the gains G

1

and G

2

for a regular cubic array:

Frequency G

1

G

2

=G

2

/G

1

> 500 Hz

2 2

1

< 500 Hz 1

3 3

The above values are not consistent from an energetic point of view, as the total energy density at high frequency is 20% greater than at low frequency (probably this effect is not audible, and probably it is compensated by the fact that at low frequency the sum tends to happen more in pressure than in energy). What appears important is the ratio  between G

2

and G

1

, which ranges between 1 and 1.732.

It must be noted that a value of  = 1 means that no anti-phase signal comes out from the loudspeakers placed in the opposite direction from the original sound provenience, and this ensures that the surround effect is consistent in a wide listening area, at least at high frequency.

Although some authors reported that employing a value of  higher than 1 even at high frequency can yield some benefit in spatial localization, here a strict adherence to the Gerzon suggested values was employed (in our case, in fact, the goal for accurate localization is not very important, because the stereo-dipoles already provide it; instead, the Ambisonics array is devoted mainly to the recreation of the ambience and the surround effect).

The Ambisonics array of loudspeakers employed here is a derivation of the cubical system already described above, and presented in [16]. The principal difference is the different position of the 8 loudspeakers, but many other improvements were made, thanks to the better hardware now available, and to the fact that the DSP unit employed for the decoding (BSS Soundweb) can be easily programmed for performing a very complex real-time digital signal processing.

The location of the 8 loudspeakers chosen was the so-called “dual square” array, This is shown in fig. 23. In substance, a first quadrilateral array is created in the horizontal plane XY (resembling the old “quad” setup), and a second vertical array is created in the YZ plane.

If the two quadrilateral are perfect squares, the array is NOT isotropic: a third square lying on

the XZ plane would be required for completing the symmetry. An isotropic array is obtained if

the sides along X and Z are longer than the sides along Y, (with a ratio equal to

2

). This not

(20)

for the non-isotropic array inside the Ambisonics decoder.

Fig. 23 – geometry of the dual-square Ambisonics loudspeaker array

In practice, the use of the BSS Soundweb makes it possible to introduce easily variable gains, so that the Y signal is slightly reduced in comparison with X and Z (which instead are boosted):

this corrects for the fact that the not isotropic array tended to give more emphasis to the Y component of the B-format signal.

Figure 24 shows the processing network implemented on the BSS Soundweb. This network incorporates:

- Correction for the -3 dB offset of the W channel caused by the Soundfield microphone (standard B-format)

- Shelf-filters with a crossover frequency of 500 Hz, which implement max-Rv decoding (optimization for velocity vector) at low frequency and max-Re decoding (optimization for intensity vector) at high frequency [17].

- Reduction of the gain of the Y channel and increase of the gain of X and Z channels.

- Matrixing of the WXYZ modified signals for deriving the theoretical speaker feeds

- FIR equalization of each theoretical speaker feed for driving the corresponding loudspeaker,

so that any minor difference between individual transducer is completely compensated.

(21)

Fig. 24 – Soundweb processing network for Ambisonics decoding

The above features make this decoder implementation unique, and corrects for most of the

problems previously encountered with simpler implementations (which did not include shelf

filtering, nor FIR equalization of the individual reproduction channels).

(22)

Fig. 25 – response of the Turbosound loudspeaker Front-Left

Fig. 26 – 100-taps, minimum-phase inverse filter

Fig. 27 – Verification of the equalization

Of course, this approach required that the impulse response of each loudspeaker was separately measured and inverted. The processing for the loudspeaker Front-Left is shown here: fig. 25 is the measured impulse response, fig. 26 the corresponding inverse filter, and fig. 27 the verification of the equalization obtained.

Although the equalized curve is not perfectly flat, it can be seen how even employing a FIR filter of very limited length (only 100 taps) it is possible to improve significantly the reproduction fidelity.

time (ms)



time (ms)

time (ms)



(23)

AUTOMATIC COLLECTION OF QUESTIONNAIRES

The collection of subjective responses is based on the use of the same computer being employed for reproducing the sound samples. In practice, these two functions were integrated, by means of the creation of a special software tool. This program makes it possible for the listener to play at will several sound samples, and to fill up on the screen the questionnaire pertinent to each group of stimuli.

This way the listener can easily compare the samples in pairs or in triplets, switching back and forth among them, and this makes it possible to detect also very subtle differences.

Figure 28 shows the user’s interface of this Audio Testing software.

Fig. 28 – software for questionnaire collection

In practice, the software can be easily reconfigured for a different number of questions, with different attributes as left and right terms of each question, and also the number of sound samples (1,2,3,4 in the above example) for each concert hall (A,B,C in the above example) can be varied without the need to recompile the program.

The user has to fill up a different questionnaire for each of the “virtual halls” being presented (3, in this cases), making use of 4 different music samples recorded in each hall. Switching the hall causes automatically the appearance of the corresponding questionnaire on the screen.

Furthermore, switching the hall also switches the WAV file being reproduced, so that the listener can easily check if the differences in the responses to the questions fit with the listening perception. The sound switching is smooth and silent, and the exact position inside the sound file is kept, so that the switching causes no intrusion in the listening experience.

The analysis of the comparative subjective test between the three theatres has not been completed yet, so it is not possible, at this point, to attempt a correlation between the measured objective parameters presented in the previous chapters and the subjective responses to the questions shown in figure 28.

SUBJECTIVE RESULTS

(24)

reproduction systems (Stereo Dipoles, Ambisonics and Ambiophonics) produces a more realistic listening experience.

A panel of 9 students was employed for the comparative test. Each subject was asked to judge three repetitions of each music sample, 40s long. One repetition was presented through Stereo Dipoles only, one through the Ambisonics rig only, and the third one employing simultaneously both systems, that is by Ambiophonics.

It must be noted that all the signals were pre-computed off-line by means of the Aurora plugins [12], and CoolEditPro was employed for preparing the multi-channels files in the Microsoft Wave-EX format. The first two reproduction methods (Stereo Dipole and Ambisonics) were obtained by muting parts of the reproduction system, whilst the third (Ambiophonics) was obtained with all 14 channels working simultaneously (2 for the frontal Stereo Dipole, 2 for the rear Stereo Dipole, 2 for the subwoofer and the 4 channels of B-format being routed to the 8 loudspeakers by means of the Ambisonics decoder implemented on the Soundweb).

The Stereo-Dipole signals were prepared convolving the anechoic recordings with the whole binaural impulse responses; the B-format signals, instead, were obtained convolving the anechoic recordings with a modified version of the measured B-format impulse responses, in which the direct sound was attenuated by means of a smooth fade-in envelope.

The above methodology is partially misleading regarding Ambisonics, because in practice the B- format impulse responses employed for convolution were deprived of the direct sound, and this caused the reproduction of the sound through the Ambisonics rig to be excessively diffuse, with poor localization of the sound coming from the frontal stage. On the other hand, a brief informal test conducted employing IRs containing also the direct sound did show that the difference was not very evident, and thus it was considered not worth the increased complexity required.

What above can explain, however, the bad result obtained by the “pure” Ambisonics method.

The following music samples were employed:

N. Music piece

1 Mozart, Te Deum K141, Sennheiser MKE2002 (“Mozart Sacro”, track n. 1) 2 Bruxethude KFM-6 (Ambiopole demo CD n. 1, track n.13)

3 Mozart, Overture “Le nozze di Figaro”, bars 1-50, ORTF (Denon PG 6006, track n. 37) 4 Bruckner, 1

st

mov. symph. n. 4 in E-flat major “Romantic”, bars 560-573 (Denon PG

6006, track n. 42)

The results of this preliminary subjective test are indeed relevant, and show that the combined method (Ambiophonics) produces significantly better results than each of its subcomponents.

The following table reports the average scores (in the range 1 to 5) obtained by the three methods, with reference only to the question “pleasant-unpleasant”:

Method Stereo Dipoles Ambisonics Ambiophonics

Avg. Score 3.02 1.46 4.52

A multi-dimensional statistical test applied to the raw subjective responses demonstrated that the ranking is statistically significant (the 95% confidence interval resulted equal to 0.95, and thus both differences between average scores are significant).

Of course, the limited number of subjects and sound samples for each subject, and the fact that

they were not trained, nor selected for their discriminative listening capabilities, means that the

above results must be considered just a very initial confirmation of the superiority of the

Ambiophonics hybrid method.

(25)

This can also be interpreted as the solution of the dilemma between the better technique for recording or measuring the three-dimensional soundfield in a concert hall: with a binaural probe or with a Soundfield microphone? The response is clear: it is better to employ both these methods simultaneously, because each of them captures only part of the directional listening cues, and only employing both simultaneously a truly realistic sound reproduction can be achieved.

As it is not easy to conduct a listening experiment moving the listeners between the three concert halls, it is not possible to establish at what degree this artificial evaluation method is corresponding to the perceptions in the real life. On the other hand, the availability of this hybrid method will be in any case a powerful tool for the comparative evaluation of different acoustical spaces, and for the assessment of their “absolute” performance under conditions certainly closer to the real life than in traditional listening tests performed by headphone listening or by pure Ambisonics reproduction.

CONCLUSIONS AND FUTURE WORK

This preliminary paper reported on the steps taken for setting up a recording/measurement and reproduction/simulation system capable of recreating a realistic reconstruction of the three- dimensional soundfield inside an existing concert hall.

The method can be applied either to multichannel “realtime” recordings, or to synthetically simulated sound samples obtained by convolution of anechoic music with measured impulse responses. These signals are replayed inside a special listening room, equipped with two integrated reproduction chains: a dual Stereo Dipole for “transaural” presentation of binaural signals, and an advanced Ambisonics decoder for periphonic (3D) presentation of B-format signals.

The two systems can be operated separately or simultaneously, provided that, in the latter case, a proper correction for the gain and for the processing delay of the two systems is applied.

The paper described the theoretical principles and the actual implementation of the methods, alongside with the software employed. The reproduction of the sound samples employed for the listening tests is driven by specially-written software, which also enables for the automatic collection of questionnaires.

The hybrid Ambiophonics system resulted in very natural and convincing listening experience, and consequently this opens the possibility to comparatively assess minor acoustical differences between halls very far each other, particularly with reference to the spatial perception (envelopment, source imaging, depth, etc.) and to the temporal factors (Initial Time Delay Gap, difference between EDT and the subsequent reverberation time, etc.).

The encouraging results obtained by the first comparative experiments allows for the

continuation of the research, which will move to the execution of several listening tests, aimed

principally to defining the optimal listening conditions in terms of spatial attributes of the sound

field and of system’s frequency response, which are actually the less-explored perceptual aspects

for a concert hall.

(26)

[1] R. Glasgal, K. Yates, “Ambiophonics – Beyond Surround Sound to Virtual Sonic Reality”, Ambiophonics Institute, 1995.

[2]

M.A. Gerzon, "Recording Concert Hall Acoustics for Posterity", J. Audio Eng. Soc., vol. 23, pp. 569, 571 (1975 Sept.).

[3]

“Carta di Ferrara”, CIARM, http://acustica.ing.unife.it/ciarm/Carta.htm

[4]

"Guidelines for acoustical measurements inside historical opera houses: procedures and validation", CIARM, http://pcfarina.eng.unipr.it/Public/Ciarm

[5] R Miller, “Contrasting ITU 5.1 and Panor-ambiophonic 4.1 Surround Sound Recording Using OCT and Sphere Microphones,” Proceedings of AES 112th International Convention, Munich, Germany 2002, preprint #5577.

[6] H. Moller – “Fundamentals of Binaural Technology” – Applied Acoustics vol. 36 (1992) pp.

171-218.

[7] A. Farina, E. Ugolotti, “Subjective comparison of different car audio systems by the auralization technique”, Pre-prints of the 103rd AES Convention, New York, 26-29 September 1997.

[8] O. Kirkeby, P. A. Nelson, and H. Hamada – “Virtual Source Imaging Using the Stereo Dipole””, Pre-prints of the 103rd AES Convention, New York, 26-29 September 1997.

[9] O. Kirkeby, P. A. Nelson, H. Hamada – “The "Stereo Dipole"-A Virtual Source Imaging System Using Two Closely Spaced Loudspeakers” – JAES vol. 46, n. 5, 1998 May, pp. 387- 395.

[10] O. Kirkeby and P. A. Nelson – “Digital Filter Design for Virtual Source Imaging Systems”, Pre-prints of the 104th AES Convention, Amsterdam, 15 - 20 May, 1998.

[11] A. Farina, E. Ugolotti - "Spatial Equalization of sound systems in cars" - Proc. of 15th AES Conference "Audio, Acoustics & Small Spaces", Copenhagen, Denmark, 31/10-2/11 1998.

[12] A. Farina, F. Righini, ‘Software implementation of an MLS analyzer, with tools for convolution, auralization and inverse filtering’, Pre-prints of the 103rd AES Convention, New York, 26-29 September 1997.

[13] Farina, E. Ugolotti, “Automatic Measurement System For Car Audio Application”, Pre-prints of the 104rd AES Convention, Amsterdam, 15 - 20 May, 1998.

[14]

Gerzon M., “Ambisonics in Multichannel Broadcasting and Video” - Journal of Audio

Engineering Society, Vol. 33, Number 11 pp. 859 (1985).

[15]

Moorer J.A., “Music recording in the age of multi-channel” - Pre-prints of the 103

rd

AES Convention, New York, 26-29 September 1997.

[16]

A. Farina, E. Ugolotti, “Subjective comparison between Stereo Dipole and 3D Ambisonics surround systems for automotive applications”, 16th AES Conference, Rovaniemi (Finland) 12-14 April 1999.

[17] J. Daniel, J.B. Rault, J.D. Polack - "Ambisonics encoding of other audio formats for multiple listening conditions" - Pre-prints of the 105th AES Convention, S.Francisco, 26 - 29 September, 1998.

INTERNET REFERENCES

The Aurora software employed for performing the measurements and computing the inverse filters can be downloaded for free from HTTP://www.ramsete.com/aurora .

The Ambisonics Decoder setup for the Soundweb processor can be downloaded from

HTTP://pcfarina.eng.unipr.it/public/Soundweb .

Riferimenti

Documenti correlati

The main idea, in the application of elastic wavelets to the evolution of solitary waves, consists not only in the wavelet representation of the wave, but also in the assumption

Peculiarity of such integrable systems is that the generating functions for the corresponding hierarchies, which obey Euler-Poisson-Darboux equation, contain information about

In the second case it can be seen that, by taking the signed values, it is not possible to obtain values of the sound reduction index at many frequencies, while with unsigned

This confirms that, even without an explicit treatment of the diffuse sound field, the original pyramid tracing algorithm was capable of correctly modelling the statistical

− Azimuth angle θθθθ is the angle formed between the axis X' and the projection of the vector u on the X'Y' plane; it is considered positive if oriented anticlockwise respect the

Only the direct sound and the first reflections coming from the frontal hemispace are sent to the sphere microphone, which drives the frontal Stereo Dipole (again by means

The results obtained from the measurements are compatible with the already proposed methods for meas- urements in concert halls (Binaural, B-format), but allow also to derive

The paper shows how it is possible to record a 32-channels spatial PCM waveform employing a spherical microphone array, how a corresponding synthetic signal can be generated on