Spatial PCM Sampling: a new method for sound recording and playback

(1)

ANGELO FARINA, ALBERTO AMENDOLA, LORENZO CHIESI, ANDREA CAPRA, SIMONE CAMPANINI

Dipartimento di Ingegneria Industriale, Università di Parma, Via delle Scienze 181/A 43124 Parma, ITALY

HTTP://pcfarina.eng.unipr.it - mail: angelo.farina@unipr.it

Spatial PCM Sampling:

a new method for sound recording

and playback

(2)

Introduction

• This paper presents the first attempt to use a new

recording/processing/playback method, and tells the story of its failure, from which we all can learn something.

• Of course, after the failure, a modified approach was

refined, and this revealed some significant advantages over traditional High Order Ambisonics, despite the number of theoretical and practical arguments affecting the new

method.

• A set of listening tests did proof the failure of the original approach, and the success of the modified approach, that surpassed the performances of HOA, even if not reaching the excellent results obtained with the simpler approach provided by the 3DVMS technology.

(3)

Topics

1. Definition of Spatial PCM Sampling (SPS) 2. Virtual Microphones, an example

3. SPS signals from a microphone array

4. SPS signals from theoretical encoding formulas 5. Processing SPS signals

6. SPS decoding, the “exact” way 7. SPS decoding, the “modified” way

8. Comparison with High Order Ambisonics (HOA) and with 3D Virtual Microphone System (3DVMS)

9. Conclusions

(4)

Spatial PCM Sampling

PCM modelling of a waveform and of a spatial balloon

A waveform is represented by a sequence of pulses, a balloon is a “sea urchin” of spikes

(5)

Spherical Harmonics vs. Spatial PCM Sampling

Whilst Sherical Harmonics are the “spatial” equivalent of the Fourier analysis of a waveform,

The SPS approach is the equivalent of representing a waveform with a sequence of pulses (PCM, pulse code

modulation)

1

2 32 3

32 virtual microphones

(6)

Spatial Fourier Sampling

A waveform can be expanded as the sum of a number of sinusoids (Fourier), Exactly as a balloon can be represented by the sum of a number of spherical

harmonics (Ambisonics)

Composite spatial balloon Composite spatial balloon

(7)

Recording-processing-playback

Both SPS and HOA have the same

recording-processing-playback framework

Microphone Array Microphone Array

Encoding

Encoding DecodingDecoding

Loudspeaker Array Loudspeaker Array

B-format Or P-format B-format

Or P-format

The intermediate format can be manipulated

Intermediate Format Intermediate Format

(8)

Direct recording-playback

3DVMS, instead, computes directly one virtual microphone for each loudspeaker, feeding it directly: less processing stages, less constraints

Microphone Array Microphone Array

Virtual Microphones (3DVMS) Virtual Microphones (3DVMS)

Loudspeaker Array Loudspeaker Array

(9)

Virtual microphones for encoding and decoding

Either in SPS or HOA, both the encoding and decoding stages can also be

represented as the synthesis of virtual microphone signals

Encoding

Encoding DecodingDecoding

(10)

Virtual microphones for encoding and decoding

In Encoding, they have the shapes of the spatial functions employed as intermediate format (spatial Dirac’s deltas for SPS,

spherical harmonics for HOA)

Encoding Encoding

(11)

Virtual microphones for encoding and decoding

In Decoding, they have the shapes defined by the decoding procedure for feeding the corresponding loudspeakers

Decoding Decoding

(12)

Decoding virtual microphones from HOA

Spherical Harmonics (H.O.Ambisonics)

The virtual microphones are obtained by linear combination of the B-format intermediate signals by applying proper weights. This limits spatial resolution, dynamic range and frequency range.

Virtual microphones

(13)

Decoding virtual microphones from SPS

Spatial PCM Sampling signals

The virtual microphones are obtained by linear combination of the SPS (P-format) intermediate signals by applying proper weights. As the SPS signals come from a larger number of microphones with simpler directivity patterns, they exhibit better spatial resolution, dynamic range and frequency range.

Virtual Microphones

(14)

The “total virtual microphone”

The complete encoding-processing- playback procedure can always be represented as a set of virtual

microphones feeding the loudspeakers Looking at the polar patterns of these

“total virtual microphones” provides a

visual display of the behaviour of the

complete system

(15)

Example: the 2

^nd

-order “exact”

decoder for 5.0 “surround”

The decoding coefficients were computed

imposing that placing a 2^nd-order microphone in the center of the loudspeaker rig, it re-records the same B-format signals being fed to the decoder

2^nd order Ambisonics microphone

Matrix of decoding Coefficients

2^nd-order B-format signal (5 channels)

2^nd-order B-format signal (5 channels) 5.0 “surround” loudspeaker rig5.0 “surround” loudspeaker rig

(16)

Example: the 2

^nd

-order “exact”

decoder for 5.0 “surround”

Computing the total virtual microphones for

each loudspeaker shows that this solution is

completely wrong…

(17)

Another decoder for 5.0 “surround”

Here is how the total virtual microphones of a

proper 2

^nd

-order decoder should behave…

(18)

The RAI-3DVMS project

• “Virtual” microphones with high directivity, controlled by mouse/joystick in order to follow in realtime actors on the stage. They should be capable to modify their directivity in a sort of

“acoustical zoom”.

• Surround recordings with microphones that can be modified (directivity, angle, gain, ecc..) in post- production.

• Get rid of problems with Spherical Harmonics GOALS:

(19)

We want to synthesize virtual microphones highly directive, steerable, and with variable directivity pattern

VIRTUAL MICROPHONES

(20)

Virtual Microphones from arrays of transducers

Linear Array Planar Array Cylindrical Array Spherical Array Processing Algorithm

processor

N inputs

M outputs _i

N i

ij

j h x

y









1

(21)

Computation of filter coefficients

• No theory is assumed: the set of h_i,j filters are derived directly from a set of impulse response measurements, designed

according to a least-squares principle.

• STEP1: a matrix C of impulse responses is measured,

• STEP2: the target polar pattern P of the virtual microphone is defined

• STEP3: the processing filters H are found by imposing that

and inverting the matrix.

• This way, the outputs of the microphone array are maximally close to the ideal responses prescribed

• This method also inherently corrects for transducer deviations and acoustical artifacts (shielding, diffractions, reflections, etc.)

 

^C ^

   

^H ^ ^P

(22)

The microphone array impulse responses c_m,d , are measured for a number of D incoming directions.

c_ki

STEP1: anechoic measurements

(23)

The microphone array impulse responses c_m,d , are measured for a number of D incoming directions.

We get a matrix C of measured impulse responses for a large number P of directions

m=1…M mikes d=1…D sources

c_ki

STEP1: anechoic measurements















D , M d

, M 2

, M 1

, M

D , m d

, m 2

, m 1

, m

D , 2 d

, 2 2

, 2 1

, 2

D , 1 d

, 1 2

, 1 1

, 1

c ...

c c

...

c ...

c c

...

c ...

c c

c ...

c c

C

(24)

For SPS, the “virtual” microphone is chosen as a 4^th order cardioid:

STEP2: Target Directivity

 ^ ^, ^  ^  ⁰ ^. ⁵ ^ ⁰ ^. ⁵ ^ ^cos( ^ ⁾ ^ ^cos( ^ ⁾ 

⁴

P

n

(25)

STEP3 – solution of linear equation system

m = 1…M microphones

d = 1…D directions

Applying the filter matrix H to the measured impulse responses C, the system should behave as a virtual microphone with prescribed directivity P

h₁(t) h₂(t)

h_M(t)

p_d(t)

Target function

c_1,d(t)

P_D,v δ(t)

P_1,v δ(t)









M



m

d m

d

m

h p d D

c

1

,

1 ..

But in practice the result of the filtering will never be exactly equal to the prescribed functions p …..

c_2,d(t) c_M,d(t)

(26)

We go now to frequency domain, where convolution becomes simple multiplication at every frequency k, by taking an N-point FFT of all those impulse responses:

We now try to invert this linear equation system at every frequency k, and for every virtual microphone v:







 





 0.. / 2

..

1

, ,

, k N

D P d

H C

M m

d k

m k

d m

   

 

_k ^DxV_DxM

k MxV

C H  P

This over-determined system doesn't admit an exact solution, but it is possible to find an approximated solution with the Least Squares method

STEP3 – solution of linear equation system

(27)

Least-squares solution

We compare the results of the numerical inversion with the theoretical response of our target microphones for all the D directions, properly delayed, and sum the

squared deviations for defining a total error:

The inversion of this matrix system is now performed adding a regularization parameter , in such a way to minimize the total error (Nelson/Kirkeby

approach):

It revealed to be advantageous to employ a frequency-dependent regularization parameter _k.

Q

     

 

_k _MxD

 

_k _DxM _k

 

_MxM

k j MxD DxV

k k MxV

I C

C

e Q

H C









  ^





*

(28)

Spectral shape of the regularization parameter 

• At very low and very high frequencies it is advisable to increase the value of .

(29)

Not-uniform spatial sampling with 32 spatial Diracs

Unfortunately, the largest regular polyhedron has 20 faces (icosaedron) – a 32-points sampling is slightly irregular

(30)

Creating synthetic SPS signals

Creating the SPS signals from a mono signal means to “spatially pan” it across the 32

P-format channels

Virtual source at (Az_in, El_in) Virtual source at (Az_in, El_in)

m-th virtual microphone at (Az , El ) m-th virtual microphone at (Az , El )

_m

m

(31)

Creating synthetic SPS signals

The gain for each channel is easy to compute:

first the angle m between the direction of the sound source and the direction of the virtual

microphone is found with the Haversine formula:

Then the gain Q_m is found by means of the 4^th-order cardioid formula:

   

_^









 



 



 



 



 



 



 cos cos sin 2

sin 2 arcsin

2 ² ^m ⁱⁿ _m _in ² ^m ⁱⁿ

m

Az El Az

El El

 El

 ⁰ ^. ⁵ ⁰ ^. ⁵ ^cos(

_m

⁾ 

⁴

Q

m

   

(32)

Processing the SPS signals

Basically, it is important to perform two operations on the whole soundfield:

1. Rotate the whole scene around an arbitrary axis

2. “Stretch” the sound field, giving more

emphasis to the sound coming from some directions and reducing the sound from

other directions

(33)

Rotating the SPS signal

As with PCM in time domain, only “discrete” shifts are easy.

“Fractional” rotations require either SPS oversampling or going to spatial frequency domain (HOA)

The only simple “discrete” rotation is based on the permutation of the faces of a dodechaedron

Hence the “unit rotation step” is 72°, and just 6 rotation axes are available

(34)

Stretching the SPS signal

In SPS it is trivial to boost the sound coming from some directions and reduce the sound from others, it is just matter of adjusting the gain of the corresponding virtual microphones

(35)

Decoding the SPS signal

It is possible to derive a number of signals for feeding a loudspeaker rig by means of another “decoding matrix” of FIR filters

No decoding would be required if the rig is made of 32 loudspeakers located in the

same directions as the SPS virtual

microphones and all at the same distance

form the listening spot

(36)

From 32 virtual microphones to 16 loudspeakers

Casa della Musica, (Parma, ITALY)

(37)

Loudspeaker positions

Horizontal Ambisonics octagon Ambisonics 3D cubeStandard Stereo

Frontal Stereo-Dipole Rear Stereo DipoleUpper Stereo Dipole

(38)

16-loudspeakers playback system

Decoding the SPS signals to loudspeakers

 ^



³²

( )

_,

( ) )

(

_i _i _r

r

t y t f t

s

Eigenmike^TM

32 virtual microphone signals y (SPS = P-format)

16 speaker feeds s

Matrix of 32x32 encoding FIR

filters

Matrix of 32x16 decoding FIR

filters

(39)

The transfer functions of the sound system are measured

The Eigenmike^TM is placed in the center of the loudspeaker rig, and the transfer functions [k] are measured from each

loudspeaker to each of the 32 virtual microphones (SPS).

Then we impose that the resulting signals {y_out} are identical to the virtual

microphone signals recorded by the Eigenmike in the original room {y}:

And consequently we solve for the unknown filters [f]

k₁ k₂

k_R

   

^y_out ^ ^s ^*

 

^k ^

 

^y ^*

   

^f ^* ^k

Re-recorded virtual microphone signals y

(40)

Theoretically “exact” solution

As for the encoding filter matrix, we employ Least-Squares with regularization:

However in this case the system is NOT over-

determined, as the number of measured transfer functions equates the number of filters to

compute

   

 

₁₆^* ₃₂

 

₃₂ ₁₆

 

₃₂ ₃₂

*

32 32 16

16

f x x

x

k j x x

I K

K

e F K







 

^





(41)

Theoretically “exact” solution

Hence the resulting filter matrix has some troubles:

There is signal coming from every virtual microphone to every loudspeaker…

Let’s focus on the first 8 loudspeakers (horizontal ring)

(42)

Theoretically “exact” solution

The troubles are even more evident looking at the

“total virtual microphones” for the 8 loudspeakers on the horizontal plane:

(43)

“Modified” solution

For avoiding these problems, a second set of encoding coefficients was computed, adding the constrain that each speaker feed is

obtained by just 1, 2 or maximum 3 virtual microphones

Real Loudspeakers

Real Loudspeakers SPS Virtual MicrophonesSPS Virtual Microphones

(44)

“Modified” solution

The resulting filter matrix is more “sensible”:

Each of the feeds for the first 8 loudspeakers (horizontal ring) get

(45)

“Modified” solution

The correct decoding is evident looking at the “total virtual microphones” for the 8 loudspeakers on the horizontal plane:

(46)

Alternative solutions

Other two decoding matrixes were used for comparison:

(47)

Results of subjective listening tests

Overall preference score (11 subjects)

(48)

Results of subjective listening tests

Spectral balance (9 subjects)

(49)

Results of subjective listening tests

Transient performance score (8 subjects)

(50)

Results of subjective listening tests

Overall preference score (10 subjects)

(51)

Conclusions

The most versatile method currently available for capturing a 3D acoustical scene is to employ a microphone array, and to derive a number of virtual microphones

Three processing methods have been developed and tested:

1. High Order Ambisonics (spherical harmonics functions)

2. Spatial PCM Sampling (spatial Dirac’s Delta functions)

3. Direct synthesis of discrete speaker feeds (3DVMS) Each of the three methods has some advantages and

disadvantages

In all cases it is possible to process and reproduce the same recording over a given playback system (loudspeaker array)

Currently the direct synthesis of speaker feeds (3DVMS) resulted to work better, but SPS, when employing a suitable decoding scheme, resulted better than traditional in-phase 3^rd-order Ambisonics, and quite close to 3DVMS

Spatial PCM Sampling: a new method for sound recording and playback

Spatial PCM Sampling:

a new method for sound recording

and playback

Introduction

Topics

Spatial PCM Sampling

Spherical Harmonics vs. Spatial PCM Sampling

Spatial Fourier Sampling

Recording-processing-playback

Both SPS and HOA have the same

recording-processing-playback framework

The intermediate format can be manipulated

Direct recording-playback

Virtual microphones for encoding and decoding

Either in SPS or HOA, both the encoding and decoding stages can also be

represented as the synthesis of virtual microphone signals

Virtual microphones for encoding and decoding

In Encoding, they have the shapes of the spatial functions employed as intermediate format (spatial Dirac’s deltas for SPS,

spherical harmonics for HOA)

Virtual microphones for encoding and decoding

In Decoding, they have the shapes defined by the decoding procedure for feeding the corresponding loudspeakers

Decoding virtual microphones from HOA

Decoding virtual microphones from SPS

The “total virtual microphone”

The complete encoding-processing- playback procedure can always be represented as a set of virtual

microphones feeding the loudspeakers Looking at the polar patterns of these

“total virtual microphones” provides a

visual display of the behaviour of the

complete system

Example: the 2

-order “exact”

decoder for 5.0 “surround”

Example: the 2

-order “exact”

decoder for 5.0 “surround”

Computing the total virtual microphones for

each loudspeaker shows that this solution is

completely wrong…

Another decoder for 5.0 “surround”

Here is how the total virtual microphones of a

proper 2

-order decoder should behave…

The RAI-3DVMS project

We want to synthesize virtual microphones highly directive, steerable, and with variable directivity pattern

VIRTUAL MICROPHONES

Virtual Microphones from arrays of transducers



Computation of filter coefficients

 

   

STEP1: anechoic measurements

STEP1: anechoic measurements

STEP2: Target Directivity

  ,     0 . 5  0 . 5  cos(  )  cos(  ) 

P

STEP3 – solution of linear equation system







h p d D

c

1 ..



   

 

C H  P

STEP3 – solution of linear equation system

Least-squares solution

     

 

 

 



Spectral shape of the regularization parameter 

Creating synthetic SPS signals

Creating synthetic SPS signals

   

 0 . 5 0 . 5 cos(

) 

 ^ ^, ^  ^  ⁰ ^. ⁵ ^ ⁰ ^. ⁵ ^ ^cos( ^ ⁾ ^ ^cos( ^ ⁾ 

 ⁰ ^. ⁵ ⁰ ^. ⁵ ^cos(

⁾ 

 ^