Lpc Coding
Essay Preview: Lpc Coding
Report this essay
Chapter 1
Introduction
Speech Production
When you speak, Air is pushed from your lung through your vocal tract and out of your mouth comes speech.
For certain voiced sound, your vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of your voice. Women and young children tend to have high pitch (fast vibration) while adult males tend to have low pitch (slow vibration).
For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not vibrate but remain constantly opened.
The shape of your vocal tract determines the sound that you make. As you speak, your vocal tract changes its shape producing different sound.
The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec).
The amount of air coming from your lung determines the loudness of your voice.
1.2 Speech Signal
Speech signal carries with it both message and speaker information. Speech is used to convey the message through a sequence of sound units, which are produced by exciting the time varying vocal tract system with time varying excitation. Each sound unit is produced by a specific combination of excitation and vocal tract dynamics. For representation of speech message information, the vocal tract system is modeled as a time varying filter, and the excitation as voiced or unvoiced or plosive or combination of these types. The time varying filter characteristics capture the variations in the shape of the vocal tract system in the form of resonances, antiresonances, and spectral roll-off characteristics. This representation of speech has been very effective for developing speech recognition systems. Since the vocal tract shape and its dynamics are also unique for a given speaker, the same time varying filter representation has been exploited for developing speaker recognition systems as well.
Historically, the primary use of encryption has been, of course, to protect messages in text form. Advancing technology has allowed images and audio to be stored and communicated in digital form. A particularly effective method of compressing images is the Discrete Cosine Transform, which is used in the JPEG (Joint Photographic Experts Group) file format.
When sound is converted to an analogue electrical signal by an appropriate transducer (a device for converting changing levels of one quantity to changing levels of another) such as a microphone, the resulting electrical signal has a value that changes over time, oscillating between positive and negative.
A Compact Disc stores stereo musical recordings in the form of two digital audio channels, each one containing 44,100 16-bit signed integers for every second of sound. This leads to a total data rate of 176,400 bytes per second.
For transmitting a telephone conversation digitally, the same level of fidelity is not required. Only a single audio channel is used, and only frequencies of up to 3000 cycles per second (or 3000 Hertz) are required, which requires (because of a mathematical law called the Nyquist theorem) 6000 samples of the level of the audio signal (after it has been bandlimited to the range of frequencies to be reproduced, otherwise aliasing may result) to be taken each second.
For many communications applications, samples of audio waveforms are one byte in length, and they are represented by a type of floating-point notation to allow one byte to represent an adequate range of levels.
Chapter 2
Speech Compression
Speech Coding
“Speech Coding” is the term used for algorithms or devices whose purpose is to decrease the bit rate of transmission of a digital speech signal across a digital channel. This channel could be either a digital cellular channel , a satellite channel or the Internet.
In other words, Speech coding is the act of transforming the speech signal at hand, to a more compact form, which can then be transmitted with a considerably smaller memory. The motivation behind this is the fact that access to unlimited amount of bandwidth is not possible. Therefore, there is a need to code and compress speech signals. Speech compression is required in long-distance communication, high-quality speech storage, and message encryption. For example, in digital cellular technology many users need to share the same frequency bandwidth. Utilizing speech compression makes it possible for more users to share the available system. Another example where speech compression is needed is in digital voice storage. For a fixed amount of available memory, compression makes it possible to store longer messages
Speech coding is a lossy type of coding, which means that the output signal does not exactly sound like the input. The input and the output signal could be distinguished to be different. Coding of audio however, is a different kind of problem than speech coding. Audio coding tries to code the audio in a perceptually lossless way. This means that even though the input and output signals are not mathematically equivalent, the sound at the output is the same as the input. This type of coding is used in applications for audio storage, broadcasting and Internet streaming .
The compression of speech signals has many practical applications. One example is in digital cellular technology where many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer messages to be stored than otherwise.
Historically, digital speech signals are sampled at a rate of 8000 samples/sec. Typically, each sample is represented by 8 bits (using mu-law). This corresponds to an uncompressed rate of 64 kbps (kbits/sec).
As with all other coding systems, a speech coding algorithm has two primary parts , the “Encoder” , located at the transmitting or the source end of the system, and a “Decoder” located at the receiving end or the sink end of the system.
Speech Coders
Speech coders are algorithms that compress digital representations of speech signals to minimize the number