In order to be able to use audio effectively in Web design, several factors need to be considered, including appropriateness of use, file format compatibility, quality, and size of file. The roles of these factors are discussed here.
1. Audio File Formats and Audio File Size
Like image files, audio files can be large, and the size generated is directly related to quality. The sound that we hear is converted into the form that can be stored in the computer by converting it to electrical signal, taking many samples of it at regular intervals per second, giving each sample a value, and storing it as digital data. In theory, the more samples taken per second and the larger the number of bits used to represent each sample, the more the produced digital audio is a true representation of the original sound; in other words, the better the quality. The number of samples taken per second is known as sampling rate; it is measured in terms of samples or cycles per second, or Hertz (Hz). For example, 10,000 samples/sec is 10,000 Hz or 10 kilohertz (10 kHz). The number of bits per sample is known as sampling resolution (or bit depth) and is specified, for example, as 16-bit, meaning that the value assigned to each sample is determined using 16 bits. In theory, this means that the same audio sampled (or digitized) at 44.1 kHz and 16-bit will produce a better quality than if it were sampled at 22.05 kHz and 8-bit.
Unfortunately, high-quality audio produces large file sizes, and without the means of reducing them, it would be impractical to freely use audio on the Web, as we do today; Internet connection bandwidths (i.e., connection speeds) will simply not be able to cope. This is why audio files, like image files, require compression when used on the Web. The following calculations demonstrate the file size that digital audio can generate.
1.1. Audio File Size
File size generated when 3 min of music is digitized at CD quality, which is audio digitized at 44.1 kHz and 16-bit in stereo (i.e., 2 channels), is:
File Size = (sampling rate x bit depth x duration in seconds x number of channels)/8 = (44.1 kHz x 16-bit x 3 min x 2 channels)/8 = [(44.1 x 1000 Hz) x 16 x (3 x 60 s) x 2]/8 = 31,752,000 bytes = [31,752,000/(1000 x 1000)]MB = 31.75 MB
Dividing by 8 converts the amount in bits to bytes. To convert bytes to kilobytes, you divide by 1000; to convert kilobytes to megabytes, you divide by 1000; and so on.
Although the quality of audio can be described in terms of sampling rate and bit depth, it is bitrate (or bit-rate) that is usually used to describe it; bitrate is the amount of bits processed, generated, or delivered per second. Higher sampling rates and bit depths naturally produce higher bitrates. The higher the bitrate, the higher the quality of an audio. The bitrate for CD, which is a standard for audio quality, for example, is 1.4 Mbps (i.e., 1.4 megabits per second). How this is derived is shown in the NOTE box.
1.2. Audio File Formats
There are many audio file formats, and many of them support at least a codec, which, like with images, could be lossless (designed to retain original quality) or lossy (designed to reduce quality to reduce size). Lossy formats typically reduce file size more significantly than lossless formats. For example, some lossy formats, such as MP3, are capable of reducing CD-quality audio (i.e., 1.4 Mbps bitrate) to about 192 kbps, without discernible quality loss. The main advantage of lossy file formats is that they allow you to balance quality against size. The most commonly used formats on the Web are listed in Table 7.6.
2. Guidelines on Effective Use of Sound
The ways in which sound is used in Web design depend on the type of Web application. For example, in games and kids-learning applications, different types of sounds are used, including speech, music, and sound effects. On the other hand, an academic website is unlikely to use any sound or sound in the same way as used in games. For the majority of websites that do not use sound for the purpose of communicating specific messages, the use of sound is usually limited to music. The problem is that when not used to communicate specific messages, it is easy to misuse sound or use it in ways that make Web content inaccessible to people with disabilities, particularly the visually impaired. Some guidelines for use are presented here.
2.1. Automatic Starting of Sound
Sound should not start automatically, whether when users arrive at a page, after they have been there for a while and are in the middle of browsing, or when an object or area receives focus, such as when the cursor is over it. This is because users generally do not appreciate this, especially as they then have the added task of turning the sound off if they do not like it. Even when the means of turning the sound off are provided, it can be difficult to figure them out if not designed properly. It can especially be more difficult for users with disabilities who use assistive technologies and can only interact with a Web page by using the keyboard.
- If you must start sound automatically when users enter a page, then the control to turn it off should be provided and it should be near the beginning of the page, where it can be easily found. The control should be clearly labeled, keyboard- operable, and located early in the tab and reading order, so that it is quickly and easily encountered by users using assistive technologies (e.g., screen reader).
- If you must start sound automatically when an object or area receives focus, such as when the cursor is over it, you should provide a notice when focus reaches the object or area, such as through a pop-up or a callout, to say what to do to listen to the sound. Such pop-up or callout should disappear immediately after focus leaves the object or area and should not require users to click it, for example, to make it disappear, as this would require users to do something unnecessary that they would rather not be asked to do.
- If sound must be used to announce a message on entry into a page, it should play for no more than 3 s and should stop automatically. This is particularly useful for users who use screen readers, for whom other sounds can make it difficult to hear the screen reader, even if the screen reader informs them about how to control the sound or where to find the control.
2.2. Let Users Decide
Ideally, users should be given the courtesy of being the ones that control the use of sounds in Web content. They should be the ones that decide whether or not to start a sound. Again, this is especially beneficial for users who use screen readers, as it allows them to decide when turning on a sound will not interfere with the output of the screen reader. The option to choose can be in the form of a button that says, for example, “Turn sound on,” which after it has been activated and the sound is playing should change to “Turn sound off’. The option can also be in the form of a link to the relevant audio file.
2.3. When Multiple Sounds Are Involved
When a sound, such as music or the background sound of a scene, is behind a speech, the sound should be 20 decibels (dB) quieter (i.e., about four times quieter) to allow the speech to be heard clearly and understood. Doing this is particularly useful for preventing situations where people with hearing problems find it difficult to understand a speech when other sounds are simultaneously playing. Decibel is a measure of sound level, and applications used to create audio generally use it.
2.4. When Narrations Are Involved
If a narration is used in Web content, a text version should also be provided, as narration can be difficult for international users to understand for various reasons, such as accent and poor audio quality. Providing a text version gives these users time to figure out meanings and even check words in the dictionary.
Source: Sklar David (2016), HTML: A Gentle Introduction to the Web’s Most Popular Language, O’Reilly Media; 1st edition.