Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)

Hi browser lovers, We've submitted a use-case/requirement draft, draft-holmberg-rtcweb-ucreqs-00.txt, that we think could be used as a base when those discussions start. The document describes some use-cases, and based on those proposes browser requirement, and application-browser API requirements. It focues on media related issue. Ie issues related to privacy, signalling between the browser and web server etc, are currenly not considered. Best regards, Christer

For those that are looking for a link like me: http://tools.ietf.org/html/draft-holmberg-rtcweb-ucreqs-00 S. On Tue, Mar 8, 2011 at 5:13 AM, Christer Holmberg <christer.holmberg@ericsson.com> wrote:
Hi browser lovers,
We've submitted a use-case/requirement draft, draft-holmberg-rtcweb-ucreqs-00.txt, that we think could be used as a base when those discussions start.
The document describes some use-cases, and based on those proposes browser requirement, and application-browser API requirements. It focues on media related issue. Ie issues related to privacy, signalling between the browser and web server etc, are currenly not considered.
Best regards,
Christer
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

On Mar 7, 2011, at 10:27 PM, Silvia Pfeiffer wrote:
For those that are looking for a link like me: http://tools.ietf.org/html/draft-holmberg-rtcweb-ucreqs-00 S.
Thanks for forwarding a link.
On Tue, Mar 8, 2011 at 5:13 AM, Christer Holmberg <christer.holmberg@ericsson.com> wrote:
Hi browser lovers,
We've submitted a use-case/requirement draft, draft-holmberg-rtcweb-ucreqs-00.txt, that we think could be used as a base when those discussions start.
The document describes some use-cases, and based on those proposes browser requirement, and application-browser API requirements. It focues on media related issue. Ie issues related to privacy, signalling between the browser and web server etc, are currenly not considered.
Section 7 does not cover the previously-discussed requirement that the browser MUST be able to ensure that a receiving device has consented to the sending of media data (this is irrespective of user permission, reference the voice hammer attack) Matthew Kaufman

Thanks for providing the link! My appologies for not doing it. Regards, Christer
-----Original Message----- From: Matthew Kaufman [mailto:matthew.kaufman@skype.net] Sent: 8. maaliskuuta 2011 0:46 To: Silvia Pfeiffer Cc: Christer Holmberg; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
On Mar 7, 2011, at 10:27 PM, Silvia Pfeiffer wrote:
For those that are looking for a link like me: http://tools.ietf.org/html/draft-holmberg-rtcweb-ucreqs-00 S.
Thanks for forwarding a link.
On Tue, Mar 8, 2011 at 5:13 AM, Christer Holmberg <christer.holmberg@ericsson.com> wrote:
Hi browser lovers,
We've submitted a use-case/requirement draft, draft-holmberg-rtcweb-ucreqs-00.txt, that we think could
be used as a
base when those discussions start.
The document describes some use-cases, and based on those proposes browser requirement, and application-browser API requirements. It focues on media related issue. Ie issues related to privacy, signalling between the browser and web server etc, are currenly not considered.
Section 7 does not cover the previously-discussed requirement that the browser MUST be able to ensure that a receiving device has consented to the sending of media data (this is irrespective of user permission, reference the voice hammer attack)
Matthew Kaufman

-----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of Matthew Kaufman Sent: den 7 mars 2011 23:46 To: Silvia Pfeiffer Cc: rtc-web@alvestrand.no; Christer Holmberg Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
On Mar 7, 2011, at 10:27 PM, Silvia Pfeiffer wrote:
For those that are looking for a link like me: http://tools.ietf.org/html/draft-holmberg-rtcweb-ucreqs-00 S.
Thanks for forwarding a link.
On Tue, Mar 8, 2011 at 5:13 AM, Christer Holmberg <christer.holmberg@ericsson.com> wrote:
Hi browser lovers,
We've submitted a use-case/requirement draft, draft-holmberg-rtcweb-ucreqs-00.txt, that we think could
be used as a
base when those discussions start.
The document describes some use-cases, and based on those proposes browser requirement, and application-browser API requirements. It focues on media related issue. Ie issues related to privacy, signalling between the browser and web server etc, are currenly not considered.
Section 7 does not cover the previously-discussed requirement that the browser MUST be able to ensure that a receiving device has consented to the sending of media data (this is irrespective of user permission, reference the voice hammer attack)
Noted and will be addressed in updates. Thanx!
Matthew Kaufman
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Hi Christer, Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game? regards, Ted

Hi Ted, Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C. Therefore our focus has been on browser based applications, and we haven't really considered native applications. If that is unclear in the draft, we can clarify it in the next version. Regards, Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted

On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C.
Therefore our focus has been on browser based applications, and we haven't really considered native applications.
If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page. In the strict IETF effort, the Javascript API boundary is out-of-scope - but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet. What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling the API presented to the less trusted parts of the overall system (the downloaded Javascripts). Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Hi Harald
In the total RTCWEB effort (IETF and W3C), we need to consider the fact
that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page. Is this also the case, even if the browser was downloaded from a Web page and Several times updated via Internet? BR Christian -----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of ext Harald Alvestrand Sent: Tuesday, March 08, 2011 2:35 PM To: Christer Holmberg Cc: Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements) On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C.
Therefore our focus has been on browser based applications, and we haven't really considered native applications.
If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page. In the strict IETF effort, the Javascript API boundary is out-of-scope - but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet. What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling the API presented to the less trusted parts of the overall system (the downloaded Javascripts). Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

On 03/09/2011 09:45 AM, Schmidt, Christian 1. (NSN - DE/Munich) wrote:
Hi Harald
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
Is this also the case, even if the browser was downloaded from a Web page and Several times updated via Internet? Good question, unfortunately not many users seem to think that far....
If it was downloaded from a web page using HTTPS with a valid certificate chain, and each update followed the same constraint (possibly with additional verification mechanisms), you should have as much faith in the browser as you have in the integrity of the least trustworthy of the links involved in that process. The same actually goes for the Javascript, but where browser downloads/updates happen to an user a few times a month, Javascript downloads happen multiple times a minute.
BR Christian
-----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of ext Harald Alvestrand Sent: Tuesday, March 08, 2011 2:35 PM To: Christer Holmberg Cc: Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C. Therefore our focus has been on browser based applications, and we haven't really considered native applications. If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
In the strict IETF effort, the Javascript API boundary is out-of-scope -
but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet.
What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling
the API presented to the less trusted parts of the overall system (the downloaded Javascripts).
Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Hi Harald, thank you for the fast reply. I just tried to download a popular Webbrowser and received an offer for an exe file based http://.. What about this browser? Can I trust him? BR Christian -----Original Message----- From: ext Harald Alvestrand [mailto:harald@alvestrand.no] Sent: Wednesday, March 09, 2011 10:02 AM To: Schmidt, Christian 1. (NSN - DE/Munich) Cc: Christer Holmberg; Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements) On 03/09/2011 09:45 AM, Schmidt, Christian 1. (NSN - DE/Munich) wrote:
Hi Harald
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
Is this also the case, even if the browser was downloaded from a Web page and Several times updated via Internet? Good question, unfortunately not many users seem to think that far....
BR Christian
-----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of ext Harald Alvestrand Sent: Tuesday, March 08, 2011 2:35 PM To: Christer Holmberg Cc: Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C. Therefore our focus has been on browser based applications, and we haven't really considered native applications. If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
In the strict IETF effort, the Javascript API boundary is out-of-scope
If it was downloaded from a web page using HTTPS with a valid certificate chain, and each update followed the same constraint (possibly with additional verification mechanisms), you should have as much faith in the browser as you have in the integrity of the least trustworthy of the links involved in that process. The same actually goes for the Javascript, but where browser downloads/updates happen to an user a few times a month, Javascript downloads happen multiple times a minute. -
but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet.
What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by
controlling
the API presented to the less trusted parts of the overall system (the downloaded Javascripts).
Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

On 03/09/11 10:23, Schmidt, Christian 1. (NSN - DE/Munich) wrote:
Hi Harald,
thank you for the fast reply. I just tried to download a popular Webbrowser and received an offer for an exe file based http://.. What about this browser? Can I trust him? Good question. I'll file a bug against the installer of at least one..... BR Christian
-----Original Message----- From: ext Harald Alvestrand [mailto:harald@alvestrand.no] Sent: Wednesday, March 09, 2011 10:02 AM To: Schmidt, Christian 1. (NSN - DE/Munich) Cc: Christer Holmberg; Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
On 03/09/2011 09:45 AM, Schmidt, Christian 1. (NSN - DE/Munich) wrote:
Hi Harald
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
Is this also the case, even if the browser was downloaded from a Web page and Several times updated via Internet? Good question, unfortunately not many users seem to think that far....
If it was downloaded from a web page using HTTPS with a valid certificate chain, and each update followed the same constraint (possibly with additional verification mechanisms), you should have as much faith in the browser as you have in the integrity of the least trustworthy of the links involved in that process.
BR Christian
-----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of ext Harald Alvestrand Sent: Tuesday, March 08, 2011 2:35 PM To: Christer Holmberg Cc: Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C. Therefore our focus has been on browser based applications, and we haven't really considered native applications. If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
In the strict IETF effort, the Javascript API boundary is out-of-scope
The same actually goes for the Javascript, but where browser downloads/updates happen to an user a few times a month, Javascript downloads happen multiple times a minute. -
but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet.
What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling the API presented to the less trusted parts of the overall system (the downloaded Javascripts).
Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

I believe the browser model is fundamentally different than the downloaded application case, and it has everything to do with the trust model. The problem we are solving for, is how to enable real-time communications when the raw communications capability is provided by one component (the browser), and that component is controlled by another, separate, untrusted component (the web application). Pictorially, we're trying to solve the following system problem: Please view in a fixed-width font such as Monaco or Courier. +------------+ +------------+ | | | | | | | | | App | | Browser | | | ------------> | | | | Control | | | | "Protocol" | | +------------+ +------------+ | | Real-Time | Protocols | | | V What is the design of the "protocol" (which is really the API within the browser itself) and what is the design of the real-time protocols emitted by the browser, so that the untrusted application (from the perspective of the browser) can create real-time comms features for users. I find it illustrative to think of the browser API as a protocol since it helps clarify the lack of trust and clearly shows that these two components are owned and controlled by different entities. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net On 3/8/11 3:34 PM, "Harald Alvestrand" <harald@alvestrand.no> wrote:
On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C.
Therefore our focus has been on browser based applications, and we haven't really considered native applications.
If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
In the strict IETF effort, the Javascript API boundary is out-of-scope - but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet.
What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling the API presented to the less trusted parts of the overall system (the downloaded Javascripts).
Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Hi Jonathan, Just to clarify, in your picture, where it says "Real-Time Protocols", does that also cover session control and codec negotiation (e.g. SIP/SDP, for using familiar terms), or only media plane protocols (e.g RTP, STUN)? Regards, Christer ________________________________________ From: Jonathan Rosenberg [jonathan.rosenberg@skype.net] Sent: Saturday, March 12, 2011 7:47 PM To: Harald Alvestrand; Christer Holmberg Cc: Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements) I believe the browser model is fundamentally different than the downloaded application case, and it has everything to do with the trust model. The problem we are solving for, is how to enable real-time communications when the raw communications capability is provided by one component (the browser), and that component is controlled by another, separate, untrusted component (the web application). Pictorially, we're trying to solve the following system problem: Please view in a fixed-width font such as Monaco or Courier. +------------+ +------------+ | | | | | | | | | App | | Browser | | | ------------> | | | | Control | | | | "Protocol" | | +------------+ +------------+ | | Real-Time | Protocols | | | V What is the design of the "protocol" (which is really the API within the browser itself) and what is the design of the real-time protocols emitted by the browser, so that the untrusted application (from the perspective of the browser) can create real-time comms features for users. I find it illustrative to think of the browser API as a protocol since it helps clarify the lack of trust and clearly shows that these two components are owned and controlled by different entities. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net On 3/8/11 3:34 PM, "Harald Alvestrand" <harald@alvestrand.no> wrote:
On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C.
Therefore our focus has been on browser based applications, and we haven't really considered native applications.
If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
In the strict IETF effort, the Javascript API boundary is out-of-scope - but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet.
What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling the API presented to the less trusted parts of the overall system (the downloaded Javascripts).
Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Just the media plane protocols. See figure 2 of http://datatracker.ietf.org/doc/draft-rosenberg-rtcweb-framework/. -Jonathan R. Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net -----Original Message----- From: Christer Holmberg [mailto:christer.holmberg@ericsson.com] Sent: Sunday, March 13, 2011 4:18 AM To: Jonathan Rosenberg; Harald Alvestrand Cc: Ted Hardie; rtc-web@alvestrand.no Subject: RE: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements) Hi Jonathan, Just to clarify, in your picture, where it says "Real-Time Protocols", does that also cover session control and codec negotiation (e.g. SIP/SDP, for using familiar terms), or only media plane protocols (e.g RTP, STUN)? Regards, Christer ________________________________________ From: Jonathan Rosenberg [jonathan.rosenberg@skype.net] Sent: Saturday, March 12, 2011 7:47 PM To: Harald Alvestrand; Christer Holmberg Cc: Ted Hardie; rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements) I believe the browser model is fundamentally different than the downloaded application case, and it has everything to do with the trust model. The problem we are solving for, is how to enable real-time communications when the raw communications capability is provided by one component (the browser), and that component is controlled by another, separate, untrusted component (the web application). Pictorially, we're trying to solve the following system problem: Please view in a fixed-width font such as Monaco or Courier. +------------+ +------------+ | | | | | | | | | App | | Browser | | | ------------> | | | | Control | | | | "Protocol" | | +------------+ +------------+ | | Real-Time | Protocols | | | V What is the design of the "protocol" (which is really the API within the browser itself) and what is the design of the real-time protocols emitted by the browser, so that the untrusted application (from the perspective of the browser) can create real-time comms features for users. I find it illustrative to think of the browser API as a protocol since it helps clarify the lack of trust and clearly shows that these two components are owned and controlled by different entities. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net On 3/8/11 3:34 PM, "Harald Alvestrand" <harald@alvestrand.no> wrote:
On 03/08/11 14:08, Christer Holmberg wrote:
Hi Ted,
Our understanding, based on the discussions regarding the charter, is that the working group will focus on the browser, with the purpose being to ensure alignment with the work in W3C.
Therefore our focus has been on browser based applications, and we haven't really considered native applications.
If that is unclear in the draft, we can clarify it in the next version. One nice feature of the doc is that it has a few different use cases that don't strictly use web browsers - in particular, the talent scout of section 4.6.1 uses an app on a smartphone while his manager uses a desktop PC (presumably with a browser-based app).
In the total RTCWEB effort (IETF and W3C), we need to consider the fact that the user will likely have more trust in the non-maliciouisness of the browser than in the non-maliciousness of Javascript downloaded from a Web page.
In the strict IETF effort, the Javascript API boundary is out-of-scope - but at the moment, this is the mailing list that contains the people interested in both efforts; we haven't started splitting up yet.
What I draw from that is that the IETF needs to specify security in terms of acceptable and unacceptable behaviour of end systems, whether they are browsers or not (video slamming, congestion-causing behaviour and making eavesdroppers' lives easy are all failures that can be observed on the network interface), while the W3C effort will have to address means of making it easy to prevent those problems by controlling the API presented to the less trusted parts of the overall system (the downloaded Javascripts).
Harald
Regards,
Christer
-----Original Message----- From: Ted Hardie [mailto:ted.ietf@gmail.com] Sent: 8. maaliskuuta 2011 6:23 To: Christer Holmberg Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Draft new: draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Hi Christer,
Thanks for putting together the document. One thing that struck me in reading it is that it has both some use cases in which the downloadable web application is paramount, but others (notably 4.4 and 4.6) in which the description could equally apply to standalone applications. In side conversations, Harald and I have discussed whether the threat model in standalone applications, even those using the same underlying protocol mechanics for rendezvous and media streaming, is really the same. Would you see a MMORG application using this method as having different threats than a downloaded casual game?
regards,
Ted
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

On 03/07/11 19:13, Christer Holmberg wrote:
Hi browser lovers, We've submitted a use-case/requirement draft, draft-holmberg-rtcweb-ucreqs-00.txt, that we think could be used as a base when those discussions start. The document describes some use-cases, and based on those proposes browser requirement, and application-browser API requirements. It focues on media related issue. Ie issues related to privacy, signalling between the browser and web server etc, are currenly not considered.
Some detailed questions/comments: Section 5.2 lists a number of requirements, but doesn't link them back to use cases. For some, this is obvious (they all need them); for others, less so. In cases where only one or two scenarios are the basis for the recommendation, linking would be good. There's also some inconsistency between "MUST" and "must" - are they intended to mean the same thing here? Some comments: F9: echo cancellation MUST be provided. Is this "provided" as in "made available", or "provided" as "must be used"? There are situations (headsets are one) where echo cancellation is not needed. F13: The browser MUST be able to pan, mix and render several concurrent video streams. "Render" is obvious, "mix" is a prerequisite for "render" for n > # of speakers, but what is "pan", and why do we need it? F15: The browser MUST be able to process and mix sound objects with audio streams. What is a "sound object", and in which scenario did this one occur? F18: Which use case mandates the audio media format commonly supported by existing telephony services (G.711?), and why is this a MUST? Is it impossible (as opposed to just expensive) to handle this requirement by a transcoding gateway? A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one. In section 7.1, "security introduction", I think it would be more accurate to say that "this section will in the future describe"... there will be more text here as we get down to the details. Offhand, stuff that should get into section 7.2 (browser): - the browser has to provide mechanisms to assure that streams are the ones the recipient intended to receive, and signal to sender that it's ok to start sending media (this translates to "STUN handshake" in currently-imagined implementations) - the browser has to ensure that sender doesn't begin to emit media until the stream has been OKed ("stun handshake completed" is the currently imagined implementation) - the browser has to ratelimit the # of attempts to negotiate a stream, so that this itself isn't a DOS attack - the browser should ensure that recipient-specified limits on send rate are not exceeded - it would be nice if the browser could keep some secrets from the Javascript so that it's not possible for a malicious webapp to use permission obtained from one interaction to get authorization for sending media from somewhere else (this may be impossible, however) There will be more here. Good start!

Hi Harald,
Section 5.2 lists a number of requirements, but doesn't link them back to use cases. For some, this is obvious (they all need them); for others, less so. In cases where only one or two scenarios are the basis for the recommendation, linking would be good.
We can take care of that in the next version. ----
There's also some inconsistency between "MUST" and "must" - are they intended to mean the same thing here?
They are intented to mean the same thing. ----
Some comments:
F9: echo cancellation MUST be provided. Is this "provided" as in "made available", or "provided" as "must be used"? There are situations (headsets are one) where echo cancellation is not needed.
"Made available". I can modify the requirement to make it more clear. ----
F13: The browser MUST be able to pan, mix and render several concurrent video streams. "Render" is obvious, "mix" is a prerequisite for "render" for n > # of speakers, but what is "pan", and why do we need it?
"Panning" is the capability to move the direction/point from where a user experience a sound to originate from. If you have several incoming mono audio streams, and stereo (or better) playout you could when playing the mono streams create the impression that they are coming from different directions in the room. This enhances intelligibility in multiparty situations (motivated by the multiparty use case). The W3C Audio XG (becoming a WG) has done some work that could be re-used. There is an early Chrome/Safari implementation <http://chromium.googlecode.com/svn/trunk/samples/audio/index.html> of one API for this. There is also a Mozilla implementation (using another API). ----
F15: The browser MUST be able to process and mix sound objects with audio streams. What is a "sound object", and in which scenario did this one occur?
A sound object is media that is retreived from another source than the established media stream(s) with the peer(s). It appears in the game example (section 4.4), where the sound of the tank might be generated locally, but needs to be mixed with other media received over established media streams. I can modify the requirement to make it more clear. ----
F18: Which use case mandates the audio media format commonly supported by existing telephony services (G.711?), and why is this a MUST? Is it impossible (as opposed to just expensive) to handle this requirement by a transcoding gateway?
The requirement is based on the Telephony use-case, and the wish to interoperate with legacy. The requirement can of course be met by transcoding, but the idea is to avoid that. I thought that is the reason we have been trying to agree on a base codec in general. ----
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs. If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this. If the codec negotiation is handled by the browser, then the app might not need to not have as much control. We try to cover that in the note associated with A5. ----
In section 7.1, "security introduction", I think it would be more accurate to say that "this section will in the future describe"... there will be more text here as we get down to the details. Offhand, stuff that should get into section 7.2 (browser):
- the browser has to provide mechanisms to assure that streams are the ones the recipient intended to receive, and signal to sender that it's ok to start sending media (this translates to "STUN handshake" in currently-imagined implementations) - the browser has to ensure that sender doesn't begin to emit media until the stream has been OKed ("stun handshake completed" is the currently imagined implementation) - the browser has to ratelimit the # of attempts to negotiate a stream, so that this itself isn't a DOS attack - the browser should ensure that recipient-specified limits on send rate are not exceeded - it would be nice if the browser could keep some secrets from the Javascript so that it's not possible for a malicious webapp to use permission obtained from one interaction to get authorization for sending media from somewhere else (this may be impossible, however)
Thanks for the input! We'll use it in the next version. ----
There will be more here. Good start!
Thanks for Your comments! Regards, Christer

On Thu, Mar 10, 2011 at 4:57 AM, Christer Holmberg
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs.
If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this.
If the codec negotiation is handled by the browser, then the app might not need to not have as much control.
We try to cover that in the note associated with A5.
So, I think we're all in agreement that rtc-web must specify a mechanism that allows for codec negotiation. But I think we may need some more discussion on the expected mechanics. The options include: Web app queries browser/host system via API for available codecs and sends selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection. Web app queries browser/host system via API for available codecs and sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there. Web app requests browser/host system to select candidate codecs based on some set of characteristics; it then sends the selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection. Web app requests browser/host system to select candidate codecs based on some set of characteristics; it sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there. The two axes which vary in that set of choices are: whether the web app makes the selection from among candidate codecs or the browser/host system makes the selection based on info provided; whether the negotiation takes place in the rendezvous server or in a peer-base offer/answer/acknowledgement set. An obvious consequence of these choices is that the logic for condec selection moves around. An additional consequence of these choices will be what element in the system needs to know about the possibility of network-provided transcoding. I think some discussion of which negotiation method is expected would be useful. If, for example, we rule out the negotiation server acting as the agent for negotiation, we can re-use the same protocol mechanics for offer-answer-acknowledgement, no matter whether the web app or browser/host system provides the codec selections. regards, Ted Hardie

Hi Ted,
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs.
If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this.
If the codec negotiation is handled by the browser, then the app might not need to not have as much control.
We try to cover that in the note associated with A5.
So, I think we're all in agreement that rtc-web must specify a mechanism that allows for codec negotiation.
I sure hope so. Choosing an exclusive set of codecs, and not allowing negotiation of other codecs, is not forward compatible, as new codecs are defined etc. -----
But I think we may need some more discussion on the expected mechanics. The options include:
Web app queries browser/host system via API for available codecs and sends selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
Web app queries browser/host system via API for available codecs and sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
Web app requests browser/host system to select candidate codecs based on some set of characteristics; it then sends the selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
Web app requests browser/host system to select candidate codecs based on some set of characteristics; it sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
The two axes which vary in that set of choices are: whether the web app makes the selection from among candidate codecs or the browser/host system makes the selection based on info provided;
Yes. A "hybrid" alternative could be that the web app provides characterics when querying the browser/host system for candicate codecs, only get codec candidates that match the characteristics, but then still makes the final codec(s) choise. -----
whether the negotiation takes place in the rendezvous server or in a peer-base offer/answer/acknowledgement set. An obvious consequence of these choices is that the logic for condec selection moves around. An additional consequence of these choices will be what element in the system needs to know about the possibility of network-provided transcoding.
The rendezvous server alternative might work when each peer use the same web app. But, e.g. in the case of legacy interworking things might become more tricky. In addition, the rendezvous server alternative might not be as flexible when it comes to change of codecs etc during a session. -----
I think some discussion of which negotiation method is expected would be useful. If, for example, we rule out the negotiation server acting as the agent for negotiation, we can re-use the same protocol mechanics for offer-answer-acknowledgement, no matter whether the web app or browser/host system provides the codec selections.
Yes. As I haven't spent much time thinking about it, I would be interested in hearing what advantages people see in the rendezvous server alternative. (For example, codec selection shouldn't be a very "CPU/resource heavy" process, which could be a good reason for not doing it in the browser) Regards, Christer

On 03/10/2011 07:05 PM, Ted Hardie wrote:
On Thu, Mar 10, 2011 at 4:57 AM, Christer Holmberg
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one. First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs.
If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this.
If the codec negotiation is handled by the browser, then the app might not need to not have as much control.
We try to cover that in the note associated with A5.
So, I think we're all in agreement that rtc-web must specify a mechanism that allows for codec negotiation. But I think we may need some more discussion on the expected mechanics. The options include: I numbered them.... 1) Web app queries browser/host system via API for available codecs and sends selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
2) Web app queries browser/host system via API for available codecs and sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
3) Web app requests browser/host system to select candidate codecs based on some set of characteristics; it then sends the selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
4) Web app requests browser/host system to select candidate codecs based on some set of characteristics; it sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there. Don't forget the option of "Web app tells the browser/host system to negotiate with another system about codecs, never passing the information about codecs to where the Web app sees it". This of course requires the negotiation protocol to be pretty browser-embedded.
The difference between 1+2) and 3+4) is that the Web app tells the system something about what it requires; there's some basic part of this going on in all reasonable cases, since the Web app is going to tell the system whether it wants audio or video codecs, but it can be extended with more parameters. For instance, for Opus, it's important to tell the system whether it's going to be used for music or for voice (in this case, the codec is the same, but when media starts flowing, the parameters are different). With parameter-rich codecs such as H.264, the enumeration of all the possible parameter combinations that the hardware/OS/browser might support might be an unreasonable task, and it's not certain the necessary interfaces are even available.
The two axes which vary in that set of choices are: whether the web app makes the selection from among candidate codecs or the browser/host system makes the selection based on info provided; whether the negotiation takes place in the rendezvous server or in a peer-base offer/answer/acknowledgement set. An obvious consequence of these choices is that the logic for condec selection moves around. An additional consequence of these choices will be what element in the system needs to know about the possibility of network-provided transcoding.
I think some discussion of which negotiation method is expected would be useful. If, for example, we rule out the negotiation server acting as the agent for negotiation, we can re-use the same protocol mechanics for offer-answer-acknowledgement, no matter whether the web app or browser/host system provides the codec selections. I'm not sure the Web app can actually tell the difference between the rendezvous server acting as the agent and the negotiation being forwarded by the rendezvous server to a third party (the destination); in both cases, the Web app sends an offer and gets an answer back (in the O/A model).
If we can't detect it, it's hard to rule it out.

Hi Harald,
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one. First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs.
If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this.
If the codec negotiation is handled by the browser, then the app might not need to not have as much control.
We try to cover that in the note associated with A5.
So, I think we're all in agreement that rtc-web must specify a mechanism that allows for codec negotiation. But I think we may need some more discussion on the expected mechanics. The options include: I numbered them.... 1) Web app queries browser/host system via API for available codecs and sends selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
2) Web app queries browser/host system via API for available codecs and sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
3) Web app requests browser/host system to select candidate codecs based on some set of characteristics; it then sends the selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
4) Web app requests browser/host system to select candidate codecs based on some set of characteristics; it sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
Don't forget the option of "Web app tells the browser/host system to negotiate with another system about codecs, never passing the information about codecs to where the Web app sees it". This of course requires the negotiation protocol to be pretty browser-embedded.
Yes.
The difference between 1+2) and 3+4) is that the Web app tells the system something about what it requires; there's some basic part of this going on in all reasonable cases, since the Web app is going to tell the system whether it wants audio or video codecs, but it can be extended with more parameters. For instance, for Opus, it's important to tell the system whether it's going to be used for music or for voice (in this case, the codec is the same, but when media starts flowing, the parameters are different).
With parameter-rich codecs such as H.264, the enumeration of all the possible parameter combinations that the hardware/OS/browser might support might be an unreasonable task, and it's not certain the necessary interfaces are even available.
The two axes which vary in that set of choices are: whether the web app makes the selection from among candidate codecs or the browser/host system makes the selection based on info provided; whether the negotiation takes place in the rendezvous server or in a peer-base offer/answer/acknowledgement set. An obvious consequence of these choices is that the logic for condec selection moves around. An additional consequence of these choices will be what element in the system needs to know about the possibility of network-provided transcoding.
I think some discussion of which negotiation method is expected would be useful. If, for example, we rule out the negotiation server acting as the agent for negotiation, we can re-use the same protocol mechanics for offer-answer-acknowledgement, no matter whether the web app or browser/host system provides the codec selections. I'm not sure the Web app can actually tell the difference between the rendezvous server acting as the agent and the negotiation being forwarded by the rendezvous server to a third party (the destination); in both cases, the Web app sends an offer and gets an answer back (in the O/A model).
If we can't detect it, it's hard to rule it out.
I think the difference is the following: - If the r-server does the negotiation, the app will send a list of his codecs to the r-server, and then be told "this is the codec(s) you are going to use" by the server. - If the app itself does the negotiation, the app will send a list of his codecs to the r-server (which may forward them to the remote peer), then be told "this is the codec(s) supported by the remote peer", and then the apps makes the decission on what codec(s) to use. Of course, what exactly happens in the network might be impossible to say, and probably depends on what session control/codec negotation protocol is used in the first place. But, the main difference is whether it's the app or the r-server that makes the decission on what codec(s) to use. Regards, Christer

I believe that our most important goal here is to keep things flexible and not bake too much into the solution. We want to enable models where codec selection is supported by a server, and models where its in the Javascript app. We want to enable models where the selection is based strictly on capability intersection, and we want to enable models where its some super complex selection algorithm. We want to enable models where the protocol machinery follows our well-understood offer-model, but we also want to enable more complex scenarios which might work better in different application scenarios (e.g., in a central mixing approach, one can argue that a command and control protocol model is better). We want to enable models where multiparty calls are handled with distributed mixing, and where they are handled with central mixing, and hybrids in between. The way we most easily enable all of these variations is to bake the absolute minimum functionality into the browser itself, and then enable all of this variation through whatever combination of local Javascript and server processing is desired by the application provider. As such, I favor a model where the browser supports an API which allows a Javascript application to interrogate the browser for its supported codecs. It also has an API which allows a Javascript application to tell the browser which codec to send and/or receive with for each particular session. That¹s it. With that basic toolset, you can build all of the variations I suggest above. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net On 3/10/11 8:05 PM, "Ted Hardie" <ted.ietf@gmail.com> wrote:
On Thu, Mar 10, 2011 at 4:57 AM, Christer Holmberg
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs.
If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this.
If the codec negotiation is handled by the browser, then the app might not need to not have as much control.
We try to cover that in the note associated with A5.
So, I think we're all in agreement that rtc-web must specify a mechanism that allows for codec negotiation. But I think we may need some more discussion on the expected mechanics. The options include:
Web app queries browser/host system via API for available codecs and sends selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
Web app queries browser/host system via API for available codecs and sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
Web app requests browser/host system to select candidate codecs based on some set of characteristics; it then sends the selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
Web app requests browser/host system to select candidate codecs based on some set of characteristics; it sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
The two axes which vary in that set of choices are: whether the web app makes the selection from among candidate codecs or the browser/host system makes the selection based on info provided; whether the negotiation takes place in the rendezvous server or in a peer-base offer/answer/acknowledgement set. An obvious consequence of these choices is that the logic for condec selection moves around. An additional consequence of these choices will be what element in the system needs to know about the possibility of network-provided transcoding.
I think some discussion of which negotiation method is expected would be useful. If, for example, we rule out the negotiation server acting as the agent for negotiation, we can re-use the same protocol mechanics for offer-answer-acknowledgement, no matter whether the web app or browser/host system provides the codec selections.
regards,
Ted Hardie _______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

I agree with this in general. However, I'd suggest that the API needs to provide more information on browser capabilities than just the supported codecs, if the goal is to permit detailed negotiations such as Jingle or SDP offer/answer.
Date: Sat, 12 Mar 2011 18:36:55 +0100 From: jonathan.rosenberg@skype.net To: ted.ietf@gmail.com; christer.holmberg@ericsson.com CC: harald@alvestrand.no; rtc-web@alvestrand.no Subject: Re: [RTW] Review of draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
I believe that our most important goal here is to keep things flexible and not bake too much into the solution.
We want to enable models where codec selection is supported by a server, and models where its in the Javascript app.
We want to enable models where the selection is based strictly on capability intersection, and we want to enable models where its some super complex selection algorithm.
We want to enable models where the protocol machinery follows our well-understood offer-model, but we also want to enable more complex scenarios which might work better in different application scenarios (e.g., in a central mixing approach, one can argue that a command and control protocol model is better).
We want to enable models where multiparty calls are handled with distributed mixing, and where they are handled with central mixing, and hybrids in between.
The way we most easily enable all of these variations is to bake the absolute minimum functionality into the browser itself, and then enable all of this variation through whatever combination of local Javascript and server processing is desired by the application provider.
As such, I favor a model where the browser supports an API which allows a Javascript application to interrogate the browser for its supported codecs. It also has an API which allows a Javascript application to tell the browser which codec to send and/or receive with for each particular session. That¹s it. With that basic toolset, you can build all of the variations I suggest above.
Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net
On 3/10/11 8:05 PM, "Ted Hardie" <ted.ietf@gmail.com> wrote:
On Thu, Mar 10, 2011 at 4:57 AM, Christer Holmberg
A5: The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer. I think the MUST is that the sender and recipient need to be able to find a common codec, if one exists; I'm not sure I see a MUST for the webapp actually picking one.
First, the sender and recipient of need to be able to perform codec negotiation, in order to find the common codecs.
If the codec negotiation is handled by the web application (i.e. JavaScript based) the API must support this.
If the codec negotiation is handled by the browser, then the app might not need to not have as much control.
We try to cover that in the note associated with A5.
So, I think we're all in agreement that rtc-web must specify a mechanism that allows for codec negotiation. But I think we may need some more discussion on the expected mechanics. The options include:
Web app queries browser/host system via API for available codecs and sends selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
Web app queries browser/host system via API for available codecs and sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
Web app requests browser/host system to select candidate codecs based on some set of characteristics; it then sends the selected codecs to rendezvous server, which runs the selection algorithm. All peers acknowledge the selection.
Web app requests browser/host system to select candidate codecs based on some set of characteristics; it sends selected codecs to peers, which answer the offer. The original app acknowledges the answer, and things move on from there.
The two axes which vary in that set of choices are: whether the web app makes the selection from among candidate codecs or the browser/host system makes the selection based on info provided; whether the negotiation takes place in the rendezvous server or in a peer-base offer/answer/acknowledgement set. An obvious consequence of these choices is that the logic for condec selection moves around. An additional consequence of these choices will be what element in the system needs to know about the possibility of network-provided transcoding.
I think some discussion of which negotiation method is expected would be useful. If, for example, we rule out the negotiation server acting as the agent for negotiation, we can re-use the same protocol mechanics for offer-answer-acknowledgement, no matter whether the web app or browser/host system provides the codec selections.
regards,
Ted Hardie _______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web
_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

On 03/13/11 19:44, Bernard Aboba wrote:
I agree with this in general.
However, I'd suggest that the API needs to provide more information on browser capabilities than just the supported codecs, if the goal is to permit detailed negotiations such as Jingle or SDP offer/answer. What capabilities are you thinking of?
The obvious one I think about is that the negotiation/setup needs to contain info about supportable resolutions - on a 240x400 phone screen, I have no reason to ask for 1024x768.

Hi, We've submitted a new version of the ucreqs draft. The new version contains fixes and additions based on comments given on the previous version on the list. The draft can be found at: http://www.ietf.org/id/draft-holmberg-rtcweb-ucreqs-01.txt Regards, Christer

Hi, I think this is useful input. A few comments. 1. A5 "The web application MUST be able to control the media format (codec) to be used for the streams sent to a peer". It should also be able to control the codec used for the received stream (i.e. control what if offered to the peer). 2. A8 "The web application MUST be able to pause/unpause the sending of a stream to a peer". I think it should be able to set/change the direction of the stream(s) which in SDP terms means sendonly/recvonly/sendrecv/inactive. 3. A12 "The web application MUST be informed when a stream from a peer is no longer received". The web application MUST also be informed when a stream from a peer starts to be received. This probably also needs to take account of some early media scenarios when multiple streams could potentially be received from multiple peers. The application needs to be able to control which streams are rendered. This also impacts the browser requirements as obviously the browser needs to be able to detect when the stream starts etc. Regards Andy
-----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of Christer Holmberg Sent: 14 March 2011 12:03 To: rtc-web@alvestrand.no; dispatch@ietf.org Subject: [RTW] Draft new version: draft-holmberg-rtcweb-ucreqs-01
Hi,
We've submitted a new version of the ucreqs draft.
The new version contains fixes and additions based on comments given on the previous version on the list.
The draft can be found at: http://www.ietf.org/id/draft-holmberg-rtcweb-ucreqs-01.txt
Regards,
Christer _______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Hi, Even if only talking about codecs, we also need to talk how deep into detail we want to go. For example, things like: "codec X can only be used when codec Y is not used" etc. We might not need that detail of information when we only query for capabilities, because later when we actually try to reserve codecs we will find out whether the browser support a specific combination or not. But, we need to have a clear and common understanding about what we want when we talk about "query supported codecs" - and other capabilities. To answer Harald's "What capabilities" question: just take a look at a typical SDP message (or the equivalent in Jingle) :) Regards, Christer ________________________________ From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of Harald Alvestrand Sent: 13. maaliskuuta 2011 21:23 To: Bernard Aboba Cc: rtc-web@alvestrand.no; jonathan.rosenberg@skype.net Subject: Re: [RTW] Review of draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements) On 03/13/11 19:44, Bernard Aboba wrote: I agree with this in general. However, I'd suggest that the API needs to provide more information on browser capabilities than just the supported codecs, if the goal is to permit detailed negotiations such as Jingle or SDP offer/answer. What capabilities are you thinking of? The obvious one I think about is that the negotiation/setup needs to contain info about supportable resolutions - on a 240x400 phone screen, I have no reason to ask for 1024x768.

On 03/14/11 13:10, Christer Holmberg wrote:
Hi,
Even if only talking about codecs, we also need to talk how deep into detail we want to go.
For example, things like: "codec X can only be used when codec Y is not used" etc.
Is there a real life example of this situation, or are you just imagining the possibility?

Hi,
Hi,
Even if only talking about codecs, we also need to talk how deep into detail we want to go.
For example, things like: "codec X can only be used when codec Y is not used" etc. Is there a real life example of this situation, or are you just imagining the possibility?
H.245. Probably you can do it with SDP CapNeg also... Anyway, my point was that even if the browser implements codec X, it doesn't mean it able to use it (or, use all resolution variants etc) in all situations. Also, related to that, when a browser resources reservation (e.g. a codec) fails, we need to consider how detailed information about the actual error(s) the browser needs to provide to the application. Regards, Christer

On 03/14/11 14:40, Christer Holmberg wrote:
Hi,
Hi,
Even if only talking about codecs, we also need to talk how deep into detail we want to go.
For example, things like: "codec X can only be used when codec Y is not used" etc. Is there a real life example of this situation, or are you just imagining the possibility? H.245. Probably you can do it with SDP CapNeg also... Not whether you can express it - is there a real life situation on a real life device where you are capable of using two codecs, but not at the same time? Anyway, my point was that even if the browser implements codec X, it doesn't mean it able to use it (or, use all resolution variants etc) in all situations.
Also, related to that, when a browser resources reservation (e.g. a codec) fails, we need to consider how detailed information about the actual error(s) the browser needs to provide to the application.
Regards,
Christer

Hi Harald,
Even if only talking about codecs, we also need to talk how deep into detail we want to go.
For example, things like: "codec X can only be used when codec Y is not used" etc. Is there a real life example of this situation, or are you just imagining the possibility? H.245. Probably you can do it with SDP CapNeg also... Not whether you can express it - is there a real life situation on a real life device where you are capable of using two codecs, but not at the same time?
I think the following are valid real life situations: - Device restrictions (CPU, DSP etc) - Network restrictions (bandwidth etc) - Number of streams Regards, Christer

On 03/15/11 07:27, Christer Holmberg wrote:
Hi Harald,
Even if only talking about codecs, we also need to talk how deep into detail we want to go.
For example, things like: "codec X can only be used when codec Y is not used" etc. Is there a real life example of this situation, or are you just imagining the possibility? H.245. Probably you can do it with SDP CapNeg also... Not whether you can express it - is there a real life situation on a real life device where you are capable of using two codecs, but not at the same time? I think the following are valid real life situations:
- Device restrictions (CPU, DSP etc) - Network restrictions (bandwidth etc) - Number of streams I think it's possible to construct devices for which this can be a problem, yes. That's not what I was asking. To be even more specific:
Do you know of ONE device, presently existing in the Real World (that is, outside of labs) that supports TWO specific, different codecs, and is able to use either of them, but is not able to use them at the same time? If the name needs to be witheld, that's fine, but I would very much like to see if this is a requirement that comes out of experience, or if it is a problem we just imagine could happen. Network, bandwidth, processing power and so on restrictions are usually a problem even if the same codec is used for all streams, so that is something we have to deal with. I'm specifically trying to figure out if we have a requirement driven by a real life scenario for "you can choose this codec, or you can choose that codec, but you can't choose both". My reason for drilling down so hard on this is that a requirement to express set difference complicates a negotiation language by a rather large amount compared to just doing set intersection. (The problem is most acute when dealing with ACLs, where adding set difference usually makes it impossible for an administrator to figure out what exactly he's specified for any rule set of some complexity, but it's a problem for any such language.) Harald

Hi,
Even if only talking about codecs, we also need to talk how deep into detail we want to go.
For example, things like: "codec X can only be used when codec Y is not used" etc. Is there a real life example of this situation, or are you just imagining the possibility? H.245. Probably you can do it with SDP CapNeg also... Not whether you can express it - is there a real life situation on a real life device where you are capable of using two codecs, but not at the same time? I think the following are valid real life situations:
- Device restrictions (CPU, DSP etc) - Network restrictions (bandwidth etc) - Number of streams I think it's possible to construct devices for which this can be a problem, yes. That's not what I was asking. To be even more specific:
Do you know of ONE device, presently existing in the Real World (that is, outside of labs) that supports TWO specific, different codecs, and is able to use either of them, but is not able to use them at the same time?
If the name needs to be witheld, that's fine, but I would very much like to see if this is a requirement that comes out of experience, or if it is a problem we just imagine could happen.
Network, bandwidth, processing power and so on restrictions are usually a problem even if the same codec is used for all streams, so that is something we have to deal with.
Yes. It's not only about codecs, but how many streams etc the browser is able to handle.
I'm specifically trying to figure out if we have a requirement driven by a real life scenario for "you can choose this codec, or you can choose that codec, but you can't choose both".
Unfortunately I don't have such information. Maybe the people that have been working on H.245, and/or the different cap neg extensions, know more? But, note that the codecs don't need to be for the same media type (audio, video). For example, a device might be able to use audio codec X when video is not used, but if video is also used the device is only able to use audio codec Y. Also, when we talk about video, it's not only about codecs as such, but different modes, different resolutions etc for specific codecs. Of course, many cases can probably be solved by offering a list of codecs, and then offer a reduced list when it's know which codecs the peers support. I would assume that these issues will also be discussed in CLUE.
My reason for drilling down so hard on this is that a requirement to express set difference complicates a negotiation language by a rather large amount compared to just doing set intersection. (The problem is most acute when dealing with ACLs, where adding set difference usually makes it impossible for an administrator to figure out what exactly he's specified for any rule set of some complexity, but it's a problem for any such language.)
Again, the point is that we need to have a clear understanding and agreement on what exactly we want to do, in order to try to avoid "what if" situations later on. Regards, Christer

Harald Alvestrand wrote:
Do you know of ONE device, presently existing in the Real World (that is, outside of labs) that supports TWO specific, different codecs, and is able to use either of them, but is not able to use them at the same time?
I can confirm that there are currently mobile phones on the market, that do have this type of restrictions, i.e. they in general support certain audio and certain video codecs with certain modes, but not all combinations of them, due to limited HW. And these are phones that do have SIP VoIP/video clients, so in that sense they are relevant. So if we want to take RTC-Web to as low end as SIP has already gone, the issue is probably real. At the moment these devices don't typically have a browser with all bells and whistles, though. Markus

I can recall desk phones from the past where hardware support provided for things like encoding/decoding and encryption/decryption, and the number of parallel streams or stream combinations was limited. Whether this is still true for today's/tomorrow's hardware I am not so sure. But even if there is a restriction on the number of streams or stream combinations, we also have to consider that different applications might be using the browser (or different instances of the browser, or even different browsers), and these different applications could each place their own demands on hardware/software resources. For example, one application could be doing video streaming, and another application could be doing real-time voice or voice/video. The different applications will certainly be unaware of each other, so any assumption by an application that a given combination of streams is possible, based on its browser's static capabilities, might not be true if that combination is actually requested. In this circumstance the browser would either have to deny the request, or accept it and grant a "best effort" use of shared resources. John
-----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of Markus.Isomaki@nokia.com Sent: 15 March 2011 08:45 To: harald@alvestrand.no; christer.holmberg@ericsson.com Cc: bernard_aboba@hotmail.com; rtc-web@alvestrand.no; jonathan.rosenberg@skype.net Subject: Re: [RTW] Review of draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)
Harald Alvestrand wrote:
Do you know of ONE device, presently existing in the Real World (that is, outside of labs) that supports TWO specific, different
codecs, and
is able to use either of them, but is not able to use them at the same time?
I can confirm that there are currently mobile phones on the market, that do have this type of restrictions, i.e. they in general support certain audio and certain video codecs with certain modes, but not all combinations of them, due to limited HW. And these are phones that do have SIP VoIP/video clients, so in that sense they are relevant. So if we want to take RTC-Web to as low end as SIP has already gone, the issue is probably real. At the moment these devices don't typically have a browser with all bells and whistles, though.
Markus _______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

On Mon, Mar 14, 2011 at 11:42 PM, Harald Alvestrand <harald@alvestrand.no> wrote:
My reason for drilling down so hard on this is that a requirement to express set difference complicates a negotiation language by a rather large amount compared to just doing set intersection. (The problem is most acute when dealing with ACLs, where adding set difference usually makes it impossible for an administrator to figure out what exactly he's specified for any rule set of some complexity, but it's a problem for any such language.)
So, I've been assuming that this group would end up with something similar to CONNEG-style set intersection, in part because that the choice we saw in SIP. In RFC 2533 semantics, this sort of grouping of permitted features into feature sets is certainly possible. I believe it would handle the issue of related features reasonably well (e.g. Video codec FOO can be used with audio codec BAR or BAZ, but not GOO; video codec HOO can be used audio codec BAR or GOO, but not BAZ). But it won't be simple if the issue is actually one of overall system constraints. If the first stream consuming X amount of resources by negotiating video codec FOO with audio codec BAZ means that second one can only have FOO if it will take BAR, the situation will get ugly fast. In general, CONNEG style negotiation relies on the ability of an end system to express its constraints well, and shifting constraints are hard to deal with. Functionally, you'd have to restart the negotiation completely after each stream started to consumer resources; that's going to make server-based negotiation pretty hard. It seems to me it would be far better to add a "number of streams" parameter to the API requesting the supported codecs, so that the host system/browser can determine what it can support for that number of streams. It may still express feature sets, but it should eliminate those which it could not support over the full set of streams on resource grounds. This has some suboptimal cases, but the overall approach seems to me more likely to complete the negotiation in a reasonable time with acceptable results. regards, Ted Hardie

Hi,
My reason for drilling down so hard on this is that a requirement to express set difference complicates a negotiation language by a rather large amount compared to just doing set intersection. (The problem is most acute when dealing with ACLs, where adding set difference usually makes it impossible for an administrator to figure out what exactly he's specified for any rule set of some complexity, but it's a problem for any such language.)
So, I've been assuming that this group would end up with something similar to CONNEG-style set intersection, in part because that the choice we saw in SIP. In RFC 2533 semantics, this sort of grouping of permitted features into feature sets is certainly possible. I believe it would handle the issue of related features reasonably well (e.g. Video codec FOO can be used with audio codec BAR or BAZ, but not GOO; video codec HOO can be used audio codec BAR or GOO, but not BAZ).
But it won't be simple if the issue is actually one of overall system constraints. If the first stream consuming X amount of resources by negotiating video codec FOO with audio codec BAZ means that second one can only have FOO if it will take BAR, the situation will get ugly fast. In general, CONNEG style negotiation relies on the ability of an end system to express its constraints well, and shifting constraints are hard to deal with. Functionally, you'd have to restart the negotiation completely after each stream started to consumer resources; that's going to make server-based negotiation pretty hard.
It seems to me it would be far better to add a "number of streams" parameter to the API requesting the supported codecs, so that the host system/browser can determine what it can support for that number of streams. It may still express feature sets, but it should eliminate those which it could not support over the full set of streams on resource grounds. This has some suboptimal cases, but the overall approach seems to me more likely to complete the negotiation in a reasonable time with acceptable results.
That can be done, but we need to remember that different types of streams can consume different amount of resources. Take video conference as an example: the stream for the main display (e.g. representing the active speaker) might use a very resource consuming codec/format/resolution, while there can be a number of streams for small "thumbnail" displays that require much less resources. We also need to ensure that we don't make life to difficult for the web app developer, by mandating him/her to browse through a potentially large list of alternatives before requesting streams. So, I agree that being able to query for different alternatives is a good thing, but it should also be possible for the web app to simply request an explicit stream set (based on web app logic and/or an offer received from the remote peer), which is then either accepted or rejected by the browser. And, if rejected, it would then be good if the browser provided some information on why it was rejected. The web app could then either query for alternatives, OR simply try to request a new stream set (based on the error information provided by the browser for the previous request). Regards, Christer

On Wed, Mar 16, 2011 at 12:11 AM, Christer Holmberg <christer.holmberg@ericsson.com> wrote:
Take video conference as an example: the stream for the main display (e.g. representing the active speaker) might use a very resource consuming codec/format/resolution, while there can be a number of streams for small "thumbnail" displays that require much less resources.
Just to be sure I understand the example, the assumption here is that the "Active speaker" stream can switch to a different speaker, so that there would be a renegotiation for both the previously "Active speaker" and the newly selected one, right? I think this is pretty workable, provided we don't have to restart the negotiation from the beginning. If there are multiple known good feature sets for each speaker, switching among them isn't such a problem. It's easier, obviously, if it is a straight swap (high-end codec moves from speaker A to speaker B). It's a little harder if the codecs available for B are different, but it still works reasonably well if the resource consumption is similar. But if switching from speaker A to speaker B forces a complete renegotiation of all streams because speaker B's requirements are very onerous, we can expect some impact on app performance.
We also need to ensure that we don't make life to difficult for the web app developer, by mandating him/her to browse through a potentially large list of alternatives before requesting streams.
My personal concern is more with complexity than size. If there is a large list of capabilities, all to the good. But if the complexity of the feature sets is high or the interaction among them high, you can end up with least-common-denominator behavior pretty easily. There are some ways to improve that--CONNEG allows for named features sets, for example, and well-known names for common clusters could significantly speed processing--but we don't want to make computing the optimal set intersection a major drain on time or resources.
So, I agree that being able to query for different alternatives is a good thing, but it should also be possible for the web app to simply request an explicit stream set (based on web app logic and/or an offer received from the remote peer), which is then either accepted or rejected by the browser. And, if rejected, it would then be good if the browser provided some information on why it was rejected. The web app could then either query for alternatives, OR simply try to request a new stream set (based on the error information provided by the browser for the previous request).
I really don't think the browser providing information on why it was rejected is all that useful; we don't want these to be consumed by humans. I personally think it is better if the web app passes down a list of requested feature sets and lets the browser compute the best intersection. You could manipulate that by having the web app decompose the offers itself and pass them down one by one in the order it felt was appropriate, taking the first match. There's no API difference there, it's just a bunch of serial one-possibility offers rather than a full set. But the typical APP is going to want "give me the best I can get", at least at my guess, rather than wanting to keep on top of what the preferred set intersections are itself. regards, Ted Hardie
Regards,
Christer

Hi Ted,
Take video conference as an example: the stream for the main display (e.g. representing the active speaker) might use a very resource consuming codec/format/resolution, while there can be a number of streams for small "thumbnail" displays that require much less resources.
Just to be sure I understand the example, the assumption here is that the "Active speaker" stream can switch to a different speaker, so that there would be a renegotiation for both the previously "Active speaker" and the newly selected one, right?
The stream can switch to a different speaker, yes. Whether a renegotiation is needed depends on how the system is built up. If there is a middlebox that mixes the streams etc there will be no need to renegotiation, because the broswer will always receive the streams from that middlebox - no matter who is the active speaker. But, in peer-to-peer scenarios I guess a re-negotiation would be needed.
I think this is pretty workable, provided we don't have to restart the negotiation from the beginning. If there are multiple known good feature sets for each speaker, switching among them isn't such a problem. It's easier, obviously, if it is a straight swap (high-end codec moves from speaker A to speaker B). It's a little harder if the codecs available for B are different, but it still works reasonably well if the resource consumption is similar. But if switching from speaker A to speaker B forces a complete renegotiation of all streams because speaker B's requirements are very onerous, we can expect some impact on app performance.
True. But, renegotiation in general should be supported, in my opinion.
We also need to ensure that we don't make life to difficult for the web app developer, by mandating him/her to browse through a potentially large list of alternatives before requesting streams.
My personal concern is more with complexity than size. If there is a large list of capabilities, all to the good. But if the complexity of the feature sets is high or the interaction among them high, you can end up with least- common-denominator behavior pretty easily. There are some ways to improve that--CONNEG allows for named features sets, for example, and well-known names for common clusters could significantly speed processing--but we don't want to make computing the optimal set intersection a major drain on time or resources.
So, I agree that being able to query for different alternatives is a good thing, but it should also be possible for the web app to simply request an explicit stream set (based on web app logic and/or an offer received from the remote peer), which is then either accepted or rejected by the browser. And, if rejected, it would then be good if the browser provided some information on why it was rejected. The web app could then either query for alternatives, OR simply try to request a new stream set (based on the error information provided by the browser for the previous request).
I really don't think the browser providing information on why it was rejected is all that useful; we don't want these to be consumed by humans.
Well, I don't consider the web app a "human" :) The web app doesn't need to forward that information to the users (humans), but the web app itself may use the information.
I personally think it is better if the web app passes down a list of requested feature sets and lets the browser compute the best intersection. You could manipulate that by having the web app decompose the offers itself and pass them down one by one in the order it felt was appropriate, taking the first match. There's no API difference there, it's just a bunch of serial one-possibility offers rather than a full set. But the typical APP is going to want "give me the best I can get", at least at my guess, rather than wanting to keep on top of what the preferred set intersections are itself.
We would need to define what those feature sets are. Also, it might work when the web app represents the originating user, ie the one who is going to send the initial offer. And, in case the web app terminating users, and it uses SIP, it needs to be able to request the explicit codecs etc (rather than a featre set) that it receives in the SDP offer. Regards, Christer

Wrapping up (?) on the "expressive power of constraint language" subthread: My (tentative) conclusion is that we have uncovered a requirement that even when a correspondent chooses a codec / # streams combination that seems to be within the specification given by the other end, it should handle adequately the situation where the stream is refused from the other end - and, conversely, that the entity that accepts the stream needs to be able to say "no, I can't handle that at the moment, even though it's listed on my capabilities". The number of (sometimes borderline) situations where it turns out that the system doesn't have the resources needed seems too large to express in a constraint language of finite complexity. Harald On 03/16/11 08:11, Christer Holmberg wrote:
Hi,
My reason for drilling down so hard on this is that a requirement to express set difference complicates a negotiation language by a rather large amount compared to just doing set intersection. (The problem is most acute when dealing with ACLs, where adding set difference usually makes it impossible for an administrator to figure out what exactly he's specified for any rule set of some complexity, but it's a problem for any such language.)
So, I've been assuming that this group would end up with something similar to CONNEG-style set intersection, in part because that the choice we saw in SIP. In RFC 2533 semantics, this sort of grouping of permitted features into feature sets is certainly possible. I believe it would handle the issue of related features reasonably well (e.g. Video codec FOO can be used with audio codec BAR or BAZ, but not GOO; video codec HOO can be used audio codec BAR or GOO, but not BAZ).
But it won't be simple if the issue is actually one of overall system constraints. If the first stream consuming X amount of resources by negotiating video codec FOO with audio codec BAZ means that second one can only have FOO if it will take BAR, the situation will get ugly fast. In general, CONNEG style negotiation relies on the ability of an end system to express its constraints well, and shifting constraints are hard to deal with. Functionally, you'd have to restart the negotiation completely after each stream started to consumer resources; that's going to make server-based negotiation pretty hard.
It seems to me it would be far better to add a "number of streams" parameter to the API requesting the supported codecs, so that the host system/browser can determine what it can support for that number of streams. It may still express feature sets, but it should eliminate those which it could not support over the full set of streams on resource grounds. This has some suboptimal cases, but the overall approach seems to me more likely to complete the negotiation in a reasonable time with acceptable results. That can be done, but we need to remember that different types of streams can consume different amount of resources.
Take video conference as an example: the stream for the main display (e.g. representing the active speaker) might use a very resource consuming codec/format/resolution, while there can be a number of streams for small "thumbnail" displays that require much less resources.
We also need to ensure that we don't make life to difficult for the web app developer, by mandating him/her to browse through a potentially large list of alternatives before requesting streams.
So, I agree that being able to query for different alternatives is a good thing, but it should also be possible for the web app to simply request an explicit stream set (based on web app logic and/or an offer received from the remote peer), which is then either accepted or rejected by the browser. And, if rejected, it would then be good if the browser provided some information on why it was rejected. The web app could then either query for alternatives, OR simply try to request a new stream set (based on the error information provided by the browser for the previous request).
Regards,
Christer
participants (13)
-
Bernard Aboba
-
Christer Holmberg
-
Elwell, John
-
Göran Eriksson AP
-
Harald Alvestrand
-
Hutton, Andrew
-
Jonathan Rosenberg
-
Markus.Isomaki@nokia.com
-
Matthew Kaufman
-
Rosenberg, Jonathan
-
Schmidt, Christian 1. (NSN - DE/Munich)
-
Silvia Pfeiffer
-
Ted Hardie