[RTW] FYI, a draft IETF charter

23 Sep 2010

      Guys

the draft IETF charter below might be of interest to the potential participants of the workshop.  FYI...

* * * * ** * * * * **** ** * * * * *   * ** * *

MALT - Multi-stream Attributes for Lifelike Telepresence
COCKTAIL - Communication and Correlation of Key Telepresence Attributes
for Interoperable Links
MAITAI - Multi-stream Attributes for Improving Telepresence Application
Interoperability
TEQUILA - Telepresence Encoding of QUalifiers for Interoperable Lifelike
Applications
MOJITO - Multi-stream Orientation for Joining of Interoperable
Telepresence Operations

In the context of this WG, the term telepresence is used in a general
manner to describe systems that provide high definition, high quality
audio/video enabling a "being-there" experience.  One example is an
immersive telepresence system using specially designed and special
purpose rooms with multiple displays permitting life size image
reproduction using multiple cameras, encoders, decoders, microphones and
loudspeakers.

Current telepresence systems are based on open standards such as RTP,
SIP, H.264, the H.323 suite, however, they cannot easily interoperate
with each other without operator assistance and expensive additional
equipment which translates from one vendor to another. A major factor in
the inability of telepresence systems to interwork is that there is no
standardized way to describe and negotiate the use of the multiple
streams of audio and video that comprise the media flows. In addition,
there is no standardized way to exchange semantic information about what
each media stream represents.  

The WG will create specifications for SIP-based conferencing systems to
enable communication of enough information about each media stream so
that each receiving system or bridge system can make reasonable
decisions about selecting and rendering media streams. This enables
systems to make display choices that optimize the "just like being
there" experience. 

This working group is chartered to specify the information about media
streams from one entity to another entity:

* Spatial relationships of cameras, displays, microphones, and
Speakers - in relation to each other and to likely positions of
participants

* Specific characteristics such as viewpoint, field of view/capture
for camera/microphone/display/speaker - so that senders and

middleboxes can understand how best to compose streams for
receivers, and the receivers will know the characteristics of its 
received streams

*Usage of the stream, for example whether the stream is presentation, or
document camera output

* Aspect ratio of cameras and displays

* Which sources a receiver wants to receive.  For example, it might want
the source for the left camera, or might want the source chosen by VAD
(Voice Activity Detection).

Information between sources and sinks about media stream capabilities
will be exchanged. 

The working group will define the semantics, syntax,  and transport
mechanism necessary for communicating the necessary information. It will
consider whether the existing signaling mechanisms (e. g., SDP) can be
extended, or another messaging method should be used.  

The scope of the work includes describing relatively static relations
between entities (participants and devices). It also includes handling
more dynamic relationships, such as identifying the audio and video
streams for the current speaker. The scope includes both systems that
provide a fully immersive experience, and systems that interwork with
them and therefore need to understand the same multiple stream
semantics.  

The focus of this work is on multiple audio and video streams.  Other
media types may be considered, however development of methodologies for
them is not within the scope of this work.

Interoperation with SIP and related standards for audio and video is
required.  However, backwards compatibility with existing non-standards
compliant telepresence systems is not required.

This working group is not currently chartered to work on issues of
continuous conference control including: far end camera control,
indication of fast frame update for video codecs or other rapid
switches, floor control, conference roster. 

Reuse of existing protocols and backwards compatibility with
SIP-compliant audio/video endpoints  are important factors for the
working group to consider. The work will closely coordinate with the
appropriate areas and working groups including OPS Area, AVT, MMUSIC,
MEDIACTRL, XCON, and SIPCORE.

Milestones  

Nov 2010 Submit information draft to IESG on use cases and requirements

Nov 2011 Submit standards track specification to IESG  indicating
spatial relationships 
of  screens  cameras (including variable field of view and orientation),
speakers and microphones; and the "usage" of a stream as defined in the
charter.  Semantics, language and transport mechanism will be specified.

David Singer
Multimedia and Software Standards, Apple Inc.