WebAL - Interactive audio for browsers
This document is a "White Paper" describing my proposal to produce a "WebAL" audio subsystem within web browsers - analogous to the WebGL graphics API.
At time of writing, there is really only one mechanism supported within web browsers for producing sounds: The HTML5 <audio> tag. Failing that, one must resort to plugins - most likely being Flash.
The HTML5 audio markup - and especially the JavaScript API - appear strongly oriented to streaming large audio files over the Internet, a task for which it is reasonably well suited.
However, when it comes to producing compelling games, simulations and other interactive content for the web, the demands placed on the audio system are vastly different and HTML5 audio becomes nearly useless.
- The markup/API described in the HTML5 specifications is very loosely described.
- None of the mainstream browsers actually implement what the specification describes.
- Even within the subset that these browsers claim to support - there are many bugs which have gone un-fixed for over a year.
- There appears to be no support forum of any kind where HTML5 audio experts can be found to answer questions.
- Even if the specified API were more tightly described and implemented perfectly, it would still be inadequate for agressively interactive applications such as games.
Contents
What do these applications need?
At the barest minimum, an interactive application will typically need the following features:
- The ability to have some kind of background or "ambient" sound track (eg Music, chirping cricket, the sound of the ocean) looping without a break. Firefox provides no viable mechanism for this other than to inform you that a track has finished playing in order that you can re-trigger it. But that process takes time - during which there will be an unacceptable break in the audio. Chrome does honor the "looping" command in the HTML5 spec - but it does so with a considerable break in the sound track. An acceptable implementation would have to guarantee automatic looping such that the first sample of the sound is played immediately after the last with no delay whatever.
- The ability to trigger a short sound with almost no latency. The HTML5 audio system has a complex set of commands you can use to control how a sound is "preloaded" - however, by some bizarre logic, the one truly important option...to completely preload the sound into memory and keep it there...is missing! Both Firefox and Chrome appear to stream data no matter what - so there is always a considerable delay between "pulling the trigger" and hearing the gun go "BANG!". Given the tiny amount of memory that a short sound sample might occupy (compared to, say a photograph or a WebGL texture), this is an unforgivable omission.
- The ability to reliably play some number of sounds simultaneously. The <audio> specification makes no mention whatever about what happens if you try to play multiple sounds at once - much less provide a means to find what the maximum number actually is (it is surely not infinite!) - or how you control the use of available numerical precision and range during the mixing of multiple sounds. Browsers do seem to be able to play multiple sounds - but it's sproradic and unspecified. Games can manage the number of sounds they are playing - but they need control over that in order that (for example) an ambient cricket chirp doesn't mask the sound of your gun going off when you pull that trigger.
- The ability to control the frequency of replay and volume of sounds dynamically in order to simulate (for example) doppler shift.
- The ability to control reverb/echo of sounds in order to adjust the audio to the virtual space in which it's being played.
- The ability to place monophonic sounds anywhere in the stereo or 5.1 surround-sound space.
What is proposed here
Clearly we need something entirely new here. The <audio> tag would be exceedingly difficult to "repair" at this point. What is needed is the adoption of an existing, widely-accepted and "open sourced" sound specification - much along the lines that WebGL was developed from OpenGL-ES by the Khronos group and various other interested parties (Mozilla, Apple, Google, etc). Ideally, we would mirror the approach of taking an existing standard, "wrapping" it with JavaScript bindings and tweaking it for the web's needs for security and networkability. That model has proven extremely successful - and we should emulate it here.
I call this hypothetical API "WebAL" (AL=Audio Library). I propose that it be based on the existing and widely used OpenAL library.
Why OpenAL and not something else?
There is an existing standard that Khronos manage called "OpenSL" (SL==Sound Library) - and it has a version called "OpenSL-ES" that is intended for the mobile marketplace, just as OpenGL-ES is for graphics. However, unlike OpenGL, OpenSL is not widely used - although OpenSL-ES is becoming popular for some cellphone applications. OpenSL also lacks many of the higher level features present in the "wish list" for games and simulation, above.
A much more popular (and practical) standard for the desktop is "OpenAL" (AL==Audio Library). This standard has been around for a very long time and is widely implemented and used in hundreds of commercial games and simulations across PC's and game consoles. OpenAL is probably the number one choice for these applications - and it implements every one of the "Wish List" items above.
In discussion with the OpenAL people, it seems that the OpenAL specification is moderately well formalized - although perhaps not as well as OpenGL or OpenSL - but it's good enough to make a superb starting point. Because we know that it is widely used, we also know that it is complete - which is more than can be said for OpenSL - which has probably never been used in a commercial game. OpenAL is also a higher level specification than OpenSL-ES. Features like doppler shift and spatialized stereo can be built on top of OpenSL-ES, but they aren't a part of it. However, doimg those things in software in JavaScript would be impractical at best - so these things do need to be included into the API.
The "Software OpenAL" implementation is claimed to be easily portable onto an OpenSL or OpenSL-ES implementation.
So:
- Take the OpenAL specification, and apply Khronos Group's level of formality to it - without changing much of what it is or how it works.
- For PC-based applications, have the browser either find a "native" OpenAL driver (such as Creative sound cards support) - or use the "Soft OpenAL" library.
- For embedded systems such as cellphones, layer OpenAL on top of OpenSL-ES.
- Provide JavaScript bindings for all of the OpenAL API.
- Provide loaders for at least Ogg/Vorbis and "Raw" audio formats.
- Provide interfaces from the "typed array" mechanism in WebGL to enable blocks of raw audio to be efficiently accessed or created via JavaScript.
- Tie down any security issues such as when audio is loaded from a site other than the one originating the web page - just as is already managed for WebGL textures.
- Build an acceptance-test suite as we go.
- Do it quickly - and get early versions into daily builds of FireFox and WebKit so that developers can beat on it to find the holes.
In short, repeat - as closely as possible - the work done on WebGL to produce a "WebAL".
What about the <audio> stuff?
The existing audio tag works reasonably well for playing long sounds - such as music tracks - that would benefit from streaming. Unlike WebGL, which is built as a layer atop the existing <canvas> system, the situation with <audio> would be reversed. Browser writers would be well-advised to rework the half-finished and broken audio tag support - and instead build it on top of the foundations provided by OpenAL/SL. The existing audio features should be considered a specialization of WebAL. This would permit things like the placement of streaming audio sources on moving objects - or placing them out in the stereo/surround-sound field.
Who does this? Who pays for it? When will it happen?
I have no idea.
I would hope that this would be a natural follow-on for the groups who have come together so successfully to build WebGL - and that the cooperative mechanisms that have achieved this feat could be extended or replicated to solve the audio problem.
We need this soon. If we are to have a future of high-end interactive applications on the web - as promised by WebGL on the graphics side, then audio support cannot be far behind. Fortunately, I believe that the OpenAL API is sufficiently similar to OpenGL that we could put together a draft standard and a rough implementation in short order. Many of the issues that have surrounded the philosophy of WebGL would carry perfectly over to WebAL. Both would use the same 4x4 matrix support - both would have loaders that work from a URL - both would use the same 'typed array' mechanisms. Issues of where we stand vis-a-vis extensions are already understood.
The OpenAL specification
Version 1.1 is the latest:
http://connect.creativelabs.com/openal/Documentation/Forms/AllItems.aspx
There are additional places where extensions to the spec reside.
Implementations of OpenAL
The main implementations are in MacOS X, OpenAL-Soft (OpenSourced - any platform), and the Creative driver for Windows.
Conclusion
I hope everyone who reads this can understand the need - and find this proposal as compelling as I do.
Interested Parties
- Myself: steve@sjbaker.org - Professional graphics engineer working in games and simulation for 25 years. Also was an early proponent of OpenAL. Wrote the "ALUT" companion library.
- The OpenAL developer list: openal-devel@opensource.creative.com
- The WebGL developer list: public_webgl@khronos.org