
Hi Harald,
Section 5.2 lists a number of requirements, but doesn't link them back to use cases. For some, this is obvious (they all need them); for others, less so. In cases where only one or two scenarios are the basis for the recommendation, linking would be good.
We can take care of that in the next version. ----
There's also some inconsistency between "MUST" and "must" - are they intended to mean the same thing here?
They are intented to mean the same thing. ----
Some comments:
F9: echo cancellation MUST be provided. Is this "provided" as in "made available", or "provided" as "must be used"? There are situations (headsets are one) where echo cancellation is not needed.
"Made available". I can modify the requirement to make it more clear. ----
F13: The browser MUST be able to pan, mix and render several concurrent video streams. "Render" is obvious, "mix" is a prerequisite for "render" for n > # of speakers, but what is "pan", and why do we need it?
"Panning" is the capability to move the direction/point from where a user experience a sound to originate from. If you have several incoming mono audio streams, and stereo (or better) playout you could when playing the mono streams create the impression that they are coming from different directions in the room. This enhances intelligibility in multiparty situations (motivated by the multiparty use case). The W3C Audio XG (becoming a WG) has done some work that could be re-used. There is an early Chrome/Safari implementation <http://chromium.googlecode.com/svn/trunk/samples/audio/index.html> of one API for this. There is also a Mozilla implementation (using another API). ----
F15: The browser MUST be able to process and mix sound objects with audio streams. What is a "sound object", and in which scenario did this one occur?
A sound object is media that is retreived from another source than the established media stream(s) with the peer(s). It appears in the game example (section 4.4), where the sound of the tank might be generated locally, but needs to be mixed with other media received over established media streams. I can modify the requirement to make it more clear. ----
F18: Which use case mandates the audio media format commonly supported by existing telephony services (G.711?), and why is this a MUST? Is it impossible (as opposed to just expensive) to handle this requirement by a transcoding gateway?
The requirement is based on the Telephony use-case, and the wish to interoperate with legacy. The requirement can of course be met by transcoding, but the idea is to avoid that. I thought that is the reason we have been trying to agree on a base codec in general. ----
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs. If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this. If the codec negotiation is handled by the browser, then the app might not need to not have as much control. We try to cover that in the note associated with A5. ----
In section 7.1, "security introduction", I think it would be more accurate to say that "this section will in the future describe"... there will be more text here as we get down to the details. Offhand, stuff that should get into section 7.2 (browser):
- the browser has to provide mechanisms to assure that streams are the ones the recipient intended to receive, and signal to sender that it's ok to start sending media (this translates to "STUN handshake" in currently-imagined implementations) - the browser has to ensure that sender doesn't begin to emit media until the stream has been OKed ("stun handshake completed" is the currently imagined implementation) - the browser has to ratelimit the # of attempts to negotiate a stream, so that this itself isn't a DOS attack - the browser should ensure that recipient-specified limits on send rate are not exceeded - it would be nice if the browser could keep some secrets from the Javascript so that it's not possible for a malicious webapp to use permission obtained from one interaction to get authorization for sending media from somewhere else (this may be impossible, however)
Thanks for the input! We'll use it in the next version. ----
There will be more here. Good start!
Thanks for Your comments! Regards, Christer