Re: [RTW] Review of draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)

10 Mar 2011

      Hi Harald,
...
Section 5.2 lists a number of requirements, but doesn't link 
them back to use cases. For some, this is obvious (they all 
need them); for others, less so. In cases where only one or 
two scenarios are the basis for the recommendation, linking 
would be good.
We can take care of that in the next version.

----
...
There's also some inconsistency between "MUST" and "must" - 
are they intended to mean the same thing here?
They are intented to mean the same thing.

----
...
Some comments:
F9: echo cancellation MUST be provided. Is this "provided" as 
in "made available", or "provided" as "must be used"? There 
are situations (headsets are one) where echo cancellation is 
not needed.
"Made available".

I can modify the requirement to make it more clear.

----
...
F13: The browser MUST be able to pan, mix and render several 
concurrent video streams.
"Render" is obvious, "mix" is a prerequisite for "render" for 
n > # of speakers, but what is "pan", and why do we need it?
"Panning" is the capability to move the direction/point from 
where a user experience a sound to originate from.

If you have several incoming mono audio streams, and stereo (or better)
playout you could when playing the mono streams create the impression 
that they are coming from different directions in the room.

This enhances intelligibility in multiparty situations (motivated by
the multiparty use case). 

The W3C Audio XG (becoming a WG) has done some work that could be re-used.

There is an early Chrome/Safari implementation 
<http://chromium.googlecode.com/svn/trunk/samples/audio/index.html> 
of one API for this. There is also a Mozilla
implementation (using another API).

----
...
F15: The browser MUST be able to process and mix sound 
objects with audio streams.
What is a "sound object", and in which scenario did this one occur?
A sound object is media that is retreived from another source than the 
established media stream(s) with the peer(s). It appears in the game example 
(section 4.4), where the sound of the tank might be generated locally,
but needs to be mixed with other media received over established media 
streams.

I can modify the requirement to make it more clear.

----
...
F18: Which use case mandates the audio media format commonly 
supported by existing telephony services (G.711?), and why is 
this a MUST? Is it impossible (as opposed to just expensive) 
to handle this requirement by a transcoding gateway?
The requirement is based on the Telephony use-case, and the wish to 
interoperate with legacy.

The requirement can of course be met by transcoding, but the idea is 
to avoid that. I thought that is the reason we have been trying to 
agree on a base codec in general.

----
...
A5: The web application MUST be able to control the media 
format (codec) to be used for the streams sent to a peer. I 
think the MUST is that the sender and recipient need to be 
able to find a common codec, if one exists; I'm not sure I 
see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform 
codec negotiation, in order to find the common codecs.

If the codec negotiation is handled by the web application 
(i.e. JavaScript based) the API must support this. 

If the codec negotiation is handled by the browser, then the app 
might not need to not have as much control. 

We try to cover that in the note associated with A5.

----
...
In section 7.1, "security introduction", I think it would be 
more accurate to say that "this section will in the future 
describe"... there will be more text here as we get down to 
the details. Offhand, stuff that should get into section 7.2 
(browser):
- the browser has to provide mechanisms to assure that 
streams are the ones the recipient intended to receive, and 
signal to sender that it's ok to start sending media (this 
translates to "STUN handshake" in currently-imagined implementations)
- the browser has to ensure that sender doesn't begin to emit 
media until the stream has been OKed ("stun handshake 
completed" is the currently imagined implementation)
- the browser has to ratelimit the # of attempts to negotiate 
a stream, so that this itself isn't a DOS attack
- the browser should ensure that recipient-specified limits 
on send rate are not exceeded
- it would be nice if the browser could keep some secrets 
from the Javascript so that it's not possible for a malicious 
webapp to use permission obtained from one interaction to get 
authorization for sending media from somewhere else (this may 
be impossible, however)
Thanks for the input! We'll use it in the next version.

----
...
There will be more here. Good start!
Thanks for Your comments!

Regards,

Christer

Re: [RTW] Review of draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)

Christer Holmberg