Yong Qiu Liu's Web Page--G.711 Audio

Introduction of multimedia codecs

4. Audio codecs

Audio codecs are similar to video codecs, they take a very important role in network multimedia transmission. For different network media systems, there are different audio codec methods. Sometimes audio codecs are more crucial than video codecs. In a videoconference, if a visual effect is not good but the voice transfer is OK, then communication may still continue, except it ceases to be an audio conference. On the other hand, if the voice is corrupted but the people at the remote end still can be seen on the screen, could you continue this conference? This section describes some frequently used audio codecs.

4.1 G.711 PCM Audio

G.711 is an ITU standard for the conversion of an analogue voice signal into a stream of digital messages.^[37] It is a PCM (Pulse Code Modulation) scheme operating at an 8 kHz sample rate with 8 bits per sample. Then 64kbps data bandwidth is required (8k * 8 bits = 64kbps). G.711 can encode frequencies between 0 and 4 kHz, according to the Nyquist theorem, a signal must be sampled at twice of its highest frequency component.^[36] Normally G.711 is used to encode the 4kHz analogue signal that defines "toll-quality" speech. G.711 also supports the worst-case bandwidth for a single voice channel.^[37]

PCM coding:

First, the input analogue signal is sampled like a series of pulse modulation signals (Figure 2-21). The heights of these pulses are from zero to full scale which are divided into discrete steps called quantisation levels or discrete levels. In this way, each step can be represented by a series of binary codes, as shown in Table 2-5. This coded arrangement of binary pulses is the PCM signal.^[39]

Figure 2-21 Quantising and digitising a signal^[38]

Table 2-5 Quantisation levels with belonging code words

Level	Code word
0	000
1	001
2	010
3	011
4	100
5	101
6	110
7	111

So this PCM encoded signal in binary form is like this:

010 101 111 111 110 100 010 001 010 100 110 111

For each sample, its value is just an approximation of the original signal to the nearest discrete level. Then the reconstructed sampling signal is distorted. The difference between the original waveform and the quantised digital signal is called quantising noise. The accuracy of an approximation is primarily a matter of economics.^[38]

So the number of samples and the number of quantisation levels are the two main factors which influence the quality of the digitalized signal. G.711 uses 8 kHz sample rate with 8 bits per sample to create "toll-quality" digitalized speech.

Last update April 1, 2002