Criteria for what one can do in Javascript vs what one has to do inside the browser

Harald Alvestrand

17 Feb 2011 17 Feb '11

5:05 p.m.

Trying to sort out thoughts in my own head.... I've run a few experiments, and thought a few thoughts. There are a couple of things that should NOT be permitted from Javascript in an RTC-Web enabled application, and there are a couple of things that just can't be done at the present stage of our codebase. Things that should not be permitted: - Sending an UDP packet with script-specified content to a script-specified destination. This has far too much potential for havoc (imagine them being fired at business critical systems, DNS servers or SNMP monitoring ports). - Setting up a TCP connection to a script-specified destination and sending script-specified data down it. Same issues as above. The traditional defense against the second type is the "same origin policy", policed by browsers in their implementation of XmlHttpGet, WebSockets and similar interfaces, which also limits requests to more-or-less valid HTTP (but see http://www.alvestrand.no/ietf/http-traps.html for some fun abuses that used to work...) So far, we have assumed that the STUN handshake is our defense against the first one, and that it's OK to send out a moderate amount of STUN-formatted UDP packets to ports and IP addresses chosen by the script, believing that the STUN format prevents them from being parsed as valid packets by other protocols. (Query: What other operations need to be protected against?) Things that can't be done: - Anything that requires timing of events within 20-100 ms of each other - Anything that depends on multithreading behaviour in the browser In both cases, Javascript just doesn't work that way. I think the TCP constraint + the UDP constraint means that we can't implement SIP or XMPP in Javascript without a gateway that talks SIP-over-HTTP - you just can't get around the security features. I think the timing constraint means that if you implement ICE in Javascript, you're going to need to have seriously relaxed timing constraints - the standard specifies that you should try candidates at <complicated expression that usually turns out to be 20 ms>. What else are serious limitations on what we can or cannot do? What are the consequences? Harald

Show replies by date

Ted Hardie

17 Feb 17 Feb

7:17 p.m.

Hi Harald, Some comments below. On Thu, Feb 17, 2011 at 9:05 AM, Harald Alvestrand <harald@alvestrand.no> wrote:

...

Trying to sort out thoughts in my own head.... I've run a few experiments, and thought a few thoughts.

There are a couple of things that should NOT be permitted from Javascript in an RTC-Web enabled application, and there are a couple of things that just can't be done at the present stage of our codebase.

Things that should not be permitted:

- Sending an UDP packet with script-specified content to a script-specified destination. This has far too much potential for havoc (imagine them being fired at business critical systems, DNS servers or SNMP monitoring ports).

- Setting up a TCP connection to a script-specified destination and sending script-specified data down it. Same issues as above.

The traditional defense against the second type is the "same origin policy", policed by browsers in their implementation of XmlHttpGet, WebSockets and similar interfaces, which also limits requests to more-or-less valid HTTP (but see http://www.alvestrand.no/ietf/http-traps.html for some fun abuses that used to work...)

I'd like to understand a bit better what you see as the line between "in Javascript" and "in the Browser". To take an example, if the javascript contains a dns URI with a pointer to a specific authoritative server (RFC 4501 gives the form as: dns:[//authority/]domain[?CLASS=class;TYPE=type] ), it is handing the crafting of the actual DNS packet to the OS subsystem that does that work. Is this a UDP packet with script-specified content to a script-specified destination, or does the OS subsystem's involvement mean that it is not script-specified content? A script could theoretically do the same thing with snmp URIs, for example, but the DNS one is a bit more compelling, because we know the browser must handle DNS resolution at some level. If same-origin means it will never honor an authority section in a DNS uri, then the only thing a script could do would be to request resolution of a huge number of a records to DoS the local resolver. I'm also curious about the extent to which document.domain style hacks would be okay for folks contemplating this work. If a web server providing rendezvous can point to other hosts within its domain as "same origin", some scaling will get easier because it will be able to hand off some tasks. I also wonder whether shifting from same origin to pawn ticket uri-style authorization might not be better for this problem space, but I haven't baked an actual method yet. That would obviously not work in the current Javascript environment, but it might be possible to extend things to include a document.pawnticket in future. regards, Ted

...

So far, we have assumed that the STUN handshake is our defense against the first one, and that it's OK to send out a moderate amount of STUN-formatted UDP packets to ports and IP addresses chosen by the script, believing that the STUN format prevents them from being parsed as valid packets by other protocols.

(Query: What other operations need to be protected against?)

Things that can't be done:

- Anything that requires timing of events within 20-100 ms of each other - Anything that depends on multithreading behaviour in the browser

In both cases, Javascript just doesn't work that way.

I think the TCP constraint + the UDP constraint means that we can't implement SIP or XMPP in Javascript without a gateway that talks SIP-over-HTTP - you just can't get around the security features.

I think the timing constraint means that if you implement ICE in Javascript, you're going to need to have seriously relaxed timing constraints - the standard specifies that you should try candidates at <complicated expression that usually turns out to be 20 ms>.

What else are serious limitations on what we can or cannot do?

What are the consequences?

Harald

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Harald Alvestrand

9:47 p.m.

On 02/17/2011 08:17 PM, Ted Hardie wrote:

...

Hi Harald,

Some comments below.

On Thu, Feb 17, 2011 at 9:05 AM, Harald Alvestrand<harald@alvestrand.no> wrote:

...
Trying to sort out thoughts in my own head.... I've run a few experiments, and thought a few thoughts.

There are a couple of things that should NOT be permitted from Javascript in an RTC-Web enabled application, and there are a couple of things that just can't be done at the present stage of our codebase.

Things that should not be permitted:

- Sending an UDP packet with script-specified content to a script-specified destination. This has far too much potential for havoc (imagine them being fired at business critical systems, DNS servers or SNMP monitoring ports).

- Setting up a TCP connection to a script-specified destination and sending script-specified data down it. Same issues as above.

The traditional defense against the second type is the "same origin policy", policed by browsers in their implementation of XmlHttpGet, WebSockets and similar interfaces, which also limits requests to more-or-less valid HTTP (but see http://www.alvestrand.no/ietf/http-traps.html for some fun abuses that used to work...) I'd like to understand a bit better what you see as the line between "in Javascript" and "in the Browser". To take an example, if the javascript contains a dns URI with a pointer to a specific authoritative server (RFC 4501 gives the form as: dns:[//authority/]domain[?CLASS=class;TYPE=type] ), it is handing the crafting of the actual DNS packet to the OS subsystem that does that work. Is this a UDP packet with script-specified content to a script-specified destination, or does the OS subsystem's involvement mean that it is not script-specified content? In this case, we're guaranteed that what comes out is a DNS packet, and it is (presumably, I don't remember if the dns URL allows server specification) sent towards a destination that the OS/browser thinks is a DNS server. So I think of this as "browser-vetted content". A script could theoretically do the same thing with snmp URIs, for example, but the DNS one is a bit more compelling, because we know the browser must handle DNS resolution at some level. If same-origin means it will never honor an authority section in a DNS uri, then the only thing a script could do would be to request resolution of a huge number of a records to DoS the local resolver. Hopefully, safeguards against this can be installed in the browser, since it knows it's dealing with DNS packets. I'm also curious about the extent to which document.domain style hacks would be okay for folks contemplating this work. If a web server providing rendezvous can point to other hosts within its domain as "same origin", some scaling will get easier because it will be able to hand off some tasks.

I also wonder whether shifting from same origin to pawn ticket uri-style authorization might not be better for this problem space, but I haven't baked an actual method yet. That would obviously not work in the current Javascript environment, but it might be possible to extend things to include a document.pawnticket in future. Yup, something similar is contemplated within the Caja effort with regard to passing Javascript resources between contexts running in the same browsers without compromising security. And of course the whole STUN handshake is an example of "pawn ticket" in action (if I understand the term correctly). regards,

Ted

...
So far, we have assumed that the STUN handshake is our defense against the first one, and that it's OK to send out a moderate amount of STUN-formatted UDP packets to ports and IP addresses chosen by the script, believing that the STUN format prevents them from being parsed as valid packets by other protocols.

(Query: What other operations need to be protected against?)

Things that can't be done:

- Anything that requires timing of events within 20-100 ms of each other - Anything that depends on multithreading behaviour in the browser

In both cases, Javascript just doesn't work that way.

I think the TCP constraint + the UDP constraint means that we can't implement SIP or XMPP in Javascript without a gateway that talks SIP-over-HTTP - you just can't get around the security features.

I think the timing constraint means that if you implement ICE in Javascript, you're going to need to have seriously relaxed timing constraints - the standard specifies that you should try candidates at<complicated expression that usually turns out to be 20 ms>.

What else are serious limitations on what we can or cannot do?

What are the consequences?

Harald

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Ted Hardie

10:20 p.m.

On Thu, Feb 17, 2011 at 1:47 PM, Harald Alvestrand <harald@alvestrand.no> wrote:

...

In this case, we're guaranteed that what comes out is a DNS packet, and it is (presumably, I don't remember if the dns URL allows server specification) sent towards a destination that the OS/browser thinks is a DNS server. So I think of this as "browser-vetted content".

Guaranteed to be a dns packet, but the "authority" section allows you to name where the packet is meant to go. Something similar to a same-origin policy could say "can't name your own authority in a DNS URI", but as far as I know it does not. <snip>

...

Yup, something similar is contemplated within the Caja effort with regard to passing Javascript resources between contexts running in the same browsers without compromising security. And of course the whole STUN handshake is an example of "pawn ticket" in action (if I understand the term correctly)

I'm thinking of the URLAUTH mechanism described by LEMONADE: http://tools.ietf.org/search/rfc4467 That's a limited-use proof-of-possession model for authorization, with no authentication implied (just as anyone in possession of a pawn ticket can redeem the item out of pawn). STUN is a user-name and password model either long term or short term. The short-term method can use some out-of-band mechanism to assign time-limited username/passwords. I'm still noodling out the pawn-ticket model's applicability here, and some of it depends on the use case at hand. For the "see the faces of the poker players", I can see using something like a limited-use cooking being shared by the players as a nonce for creating the session. In more DTLS terms: Client 1 visits gaming site and gets a token: "32bitF000" when assigned to game 1. Client 2 gets the same token when assigned to game 1. Each uses 32bitF000 as a nonce to replace their address when supplying a client Hello to their peers. The shared token indicates they got it from the same place and intend to be in the same game. Each peer sees the shared token, and uses it to provide a session cookie (since it uses more than the nonce to create the cookie, the peer cannot gen one up on his own). This provides the return routability check. Whenever a player drops out of the game, they drop their recognition of 32bitF00 token, so they will no longer start sessions with anyone presenting that nonce/token. Using a similar idea to replace same-origin in javascript is harder than the dtls example, and I'm still working through in my head whether it will work. regards, Ted

...

...
regards,

Ted

...
So far, we have assumed that the STUN handshake is our defense against the first one, and that it's OK to send out a moderate amount of STUN-formatted UDP packets to ports and IP addresses chosen by the script, believing that the STUN format prevents them from being parsed as valid packets by other protocols.

(Query: What other operations need to be protected against?)

Things that can't be done:

- Anything that requires timing of events within 20-100 ms of each other - Anything that depends on multithreading behaviour in the browser

In both cases, Javascript just doesn't work that way.

I think the TCP constraint + the UDP constraint means that we can't implement SIP or XMPP in Javascript without a gateway that talks SIP-over-HTTP - you just can't get around the security features.

I think the timing constraint means that if you implement ICE in Javascript, you're going to need to have seriously relaxed timing constraints - the standard specifies that you should try candidates at<complicated expression that usually turns out to be 20 ms>.

What else are serious limitations on what we can or cannot do?

What are the consequences?

Harald

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Harald Alvestrand

10:36 p.m.

On 02/17/2011 11:20 PM, Ted Hardie wrote:

...

I'm thinking of the URLAUTH mechanism described by LEMONADE: http://tools.ietf.org/search/rfc4467

That's a limited-use proof-of-possession model for authorization, with no authentication implied (just as anyone in possession of a pawn ticket can redeem the item out of pawn). STUN is a user-name and password model either long term or short term. The short-term method can use some out-of-band mechanism to assign time-limited username/passwords. The reason I think of this as a proof-of-possession mechanism is that in the use I'm most familiar with, both the username and password are random strings generated at the time-of-use; they are carried in fields named "username" and "password" in SDP / Jingle, but that doesn't mean they are tied to an user in the traditional sense - that's what makes them "short-term".

It would be nice if the STUN spec had called the fields something different, but that's what you get from not wanting to reinvent protocols all the time.... Harald

Jonathan Rosenberg

18 Feb 18 Feb

2:34 p.m.

In my view the STUN/ICE solution is most definitely proof-of-possession. It works under the assumption of a trusted intermediary (the web server and/or a network behind it). If user A wants to connect to user B, this connection is only permitted through an out-of-band introduction of A to B through the trusted intermediary. The introduction manifests in the form of a one time use token which is then sent directly from A to B in the handshake. As you say, it is unfortunate that STUN calls these "username" and "password" as they are not that; however as you know this is a consequence of the evolution of STUN from its original purpose of an unauthenticated NAT probing technology to a p2p handshake technique for ICE. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net On 2/17/11 5:36 PM, "Harald Alvestrand" <harald@alvestrand.no> wrote:

...

On 02/17/2011 11:20 PM, Ted Hardie wrote:

...
I'm thinking of the URLAUTH mechanism described by LEMONADE: http://tools.ietf.org/search/rfc4467

That's a limited-use proof-of-possession model for authorization, with no authentication implied (just as anyone in possession of a pawn ticket can redeem the item out of pawn). STUN is a user-name and password model either long term or short term. The short-term method can use some out-of-band mechanism to assign time-limited username/passwords. The reason I think of this as a proof-of-possession mechanism is that in the use I'm most familiar with, both the username and password are random strings generated at the time-of-use; they are carried in fields named "username" and "password" in SDP / Jingle, but that doesn't mean they are tied to an user in the traditional sense - that's what makes them "short-term".

It would be nice if the STUN spec had called the fields something different, but that's what you get from not wanting to reinvent protocols all the time....

Harald

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Bernard Aboba

3:03 p.m.

New subject: Criteria for what one can do in Javascript vs what one has to do inside the browser

To be clear, we are not talking about just any STUN exchange. For example, an unauthenticated binding request/response doesn't prove anything. I think we're talking about a STUN request which is authenticated with a username/password attribute exchanged out-of-band (e.g. via offer/answer), and which elicits a successful response (e.g. according to the criteria in RFC 5245 Section 7.1.3.2) with a matching transaction ID and the nominated flag set, no? This in turn implies that the STUN Javascript API has to provide a high level of control of the details of the STUN request and response. -----Original Message----- From: rtc-web-bounces@alvestrand.no [mailto:rtc-web-bounces@alvestrand.no] On Behalf Of Jonathan Rosenberg Sent: Friday, February 18, 2011 6:34 AM To: Harald Alvestrand; Ted Hardie Cc: rtc-web@alvestrand.no Subject: Re: [RTW] Criteria for what one can do in Javascript vs what one has to do inside the browser In my view the STUN/ICE solution is most definitely proof-of-possession. It works under the assumption of a trusted intermediary (the web server and/or a network behind it). If user A wants to connect to user B, this connection is only permitted through an out-of-band introduction of A to B through the trusted intermediary. The introduction manifests in the form of a one time use token which is then sent directly from A to B in the handshake. As you say, it is unfortunate that STUN calls these "username" and "password" as they are not that; however as you know this is a consequence of the evolution of STUN from its original purpose of an unauthenticated NAT probing technology to a p2p handshake technique for ICE. Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net On 2/17/11 5:36 PM, "Harald Alvestrand" <harald@alvestrand.no> wrote:

...

On 02/17/2011 11:20 PM, Ted Hardie wrote:

...
I'm thinking of the URLAUTH mechanism described by LEMONADE: http://tools.ietf.org/search/rfc4467

That's a limited-use proof-of-possession model for authorization, with no authentication implied (just as anyone in possession of a pawn ticket can redeem the item out of pawn). STUN is a user-name and password model either long term or short term. The short-term method can use some out-of-band mechanism to assign time-limited username/passwords. The reason I think of this as a proof-of-possession mechanism is that in the use I'm most familiar with, both the username and password are random strings generated at the time-of-use; they are carried in fields named "username" and "password" in SDP / Jingle, but that doesn't mean they are tied to an user in the traditional sense - that's what makes them "short-term".

It would be nice if the STUN spec had called the fields something different, but that's what you get from not wanting to reinvent protocols all the time....

Harald

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Ted Hardie

3:31 p.m.

Howdy, On Fri, Feb 18, 2011 at 6:34 AM, Jonathan Rosenberg <jonathan.rosenberg@skype.net> wrote:

...

In my view the STUN/ICE solution is most definitely proof-of-possession. It works under the assumption of a trusted intermediary (the web server and/or a network behind it). If user A wants to connect to user B, this connection is only permitted through an out-of-band introduction of A to B through the trusted intermediary. The introduction manifests in the form of a one time use token which is then sent directly from A to B in the handshake.

As you say, it is unfortunate that STUN calls these "username" and "password" as they are not that; however as you know this is a consequence of the evolution of STUN from its original purpose of an unauthenticated NAT probing technology to a p2p handshake technique for ICE.

I agree that the STUN/ICE context with time-of-use creation acts pretty close to proof-of-possession. But the long term credential mechanism in STUN itself (5389 version) looks much closer to a standard username and password, with the realm and cookie mechanism giving you the opportunity to deal in specific durations. The original context of my comment was in thinking through how same-origin policies limit the options for sharing context among servers, though, and to think about what better options might eventually be available without raising too much worry for cross-site attacks. My main point wasn't to argue about whether STUN qualified for proof-of-possession or not, but to say I think proof of possession (however arrived at) is likely our best bet here. To restate this, we're talking about the web server providing proof-of-possession style credentials (one time username/password or some other style) to each party, and those parties using that as authorization to join or setup a specific session whether the host they talk to do so qualifies under same origin or not. Getting that right without enabling cross-site attacks is not going to be simple, but I think it is the best way forward. regards, Ted Hardie

...

Thanks, Jonathan R. -- Jonathan D. Rosenberg, Ph.D. SkypeID: jdrosen Skype Chief Technology Strategist jdrosen@skype.net http://www.skype.com jdrosen@jdrosen.net http://www.jdrosen.net

On 2/17/11 5:36 PM, "Harald Alvestrand" <harald@alvestrand.no> wrote:

...
On 02/17/2011 11:20 PM, Ted Hardie wrote:

...
I'm thinking of the URLAUTH mechanism described by LEMONADE: http://tools.ietf.org/search/rfc4467

That's a limited-use proof-of-possession model for authorization, with no authentication implied (just as anyone in possession of a pawn ticket can redeem the item out of pawn). STUN is a user-name and password model either long term or short term. The short-term method can use some out-of-band mechanism to assign time-limited username/passwords. The reason I think of this as a proof-of-possession mechanism is that in the use I'm most familiar with, both the username and password are random strings generated at the time-of-use; they are carried in fields named "username" and "password" in SDP / Jingle, but that doesn't mean they are tied to an user in the traditional sense - that's what makes them "short-term".

It would be nice if the STUN spec had called the fields something different, but that's what you get from not wanting to reinvent protocols all the time....

Harald

_______________________________________________ RTC-Web mailing list RTC-Web@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtc-web

Stefan Håkansson LK

1:01 p.m.

...

(Query: What other operations need to be protected against?) I think there should be a protection against starving other traffic. As the transport will in many cases be UDP, TCP back off will not protect other traffic, instead this must be handled by the implementation.

Stefan

5357

Age (days ago)

5358

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Bernard Aboba
Harald Alvestrand
Jonathan Rosenberg
Stefan Håkansson LK
Ted Hardie

Criteria for what one can do in Javascript vs what one has to do inside the browser

tags

participants (5)