
* no SIP or Jingle; leave that to proprietary over websockets/HTTP
One of the issues that will have to be solved, even without SIP or Jingle in the browser, is codec negotiation. If the webserver is using SIP on the back-end, then it needs to know about the clients' codec support in enough detail to enable the SIP negotiation process. Your simple example sends no details at all about the clients' supported codecs, which I guess means the server can only assume support for the MTI codecs. Maybe this is a question for the W3C, and not IETF, but I don't think the existing canPlayType API is sufficient. It doesn't give a guarantee more promising than "probably", and doesn't allow for negotiating specific codec parameters (e.g., bitrate, sampling rate). For example, even with an MTI codec like Opus, some devices with limited memory may not support stereo. Presumably these would not be devices capable of running a web browser, but they might be the one talking to the web browser, and the browser needs to know not to send them stereo frames.