Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket lifetime and its effect on privacy #387

Open
ghost opened this issue May 2, 2018 · 29 comments
Open

Socket lifetime and its effect on privacy #387

ghost opened this issue May 2, 2018 · 29 comments

Comments

@ghost
Copy link

ghost commented May 2, 2018

Continuing my research of browsers as explained in #365 I found something interesting. The browsers which I tested in the last 2 days (for some it was a re-test with newer versions) were Midori, Epiphany, qutebrowser, Chromium, Firefox, Dooble.

The test procedure is fairly simple:

In 2 different consoles I run:

rcnetwork restart;tcpdump -i eth1 ip src host pc and dst host not router and dst host not pc -tq
watch 'netstat -anpt'

then I start the browser in which I have set beforehand homepage about:blank and tightened everything possible (disable JS, cookies, plugins etc). Then I visit an URL of a simple text file, e.g. http://fsf.org/robots.txt and I look at packets and connections.

Results:

In my test all browsers show some weird behavior. Although the simple text file takes less then a second to download the browser continues to "chatter" with the remote host for several minutes. Also netstat shows that there are active connections. I see that as a privacy issue because it literally means the user is telling the remote host "I am still online, here are some more TCP packets".

I received an explanation from Dooble's developer that this is due to the underlying web engine:

textbrowser/dooble#23

Regardless of my hope that testing browsers with different web engines may give different result that doesn't seem to be the case. All of them keep sending TCP packets. The one and only browser which does not do that is lynx - it simply downloads the document and instantly closes the socket.

Using my user.js (a modified version of pyllyukko's one with some added settings which ensure zero packets sent to Mozilla etc) I tested Firefox 59.0.3 too. What I noticed as a difference from non-Firefox browsers is that FF quite actively sends the after-packets. Here is what happens:

Open http://fsf.org/robots.txt:

IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 325
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 517
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 342
IP pc.37792 > www.fsf.org.https: tcp 373
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 517
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 342
IP pc.37794 > www.fsf.org.https: tcp 389
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 357
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 517
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 326
IP pc.33248 > svnweb.fsf.org.https: tcp 37
IP pc.33248 > svnweb.fsf.org.https: tcp 357
IP pc.33248 > svnweb.fsf.org.https: tcp 0

Page loaded.
Waiting (touch nothing)... tcpdump shows:

IP pc.33248 > svnweb.fsf.org.https: tcp 37
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 53
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 53
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0

All of the above are sent in groups of 3 lines every 8-10 seconds. netstat shows:

tcp        0      0 pc:37794                www.fsf.org:https       ESTABLISHED 10346/firefox
tcp        0      0 pc:37792                www.fsf.org:https       ESTABLISHED 10346/firefox
tcp        0      0 pc:43500                www.fsf.org:www-http    ESTABLISHED 10346/firefox

After about 2-3 minutes all this chattering stops.

tcp        0      0 pc:43500                www.fsf.org:www-http    TIME_WAIT   -

Another minute and this socked disappears too.

In summary: Several minutes of TCP chatter for a 6-line text file which loads in a few milliseconds. In different browsers this time and the number of the additional packets varies, as well as the time until all sockets "die". In Firefox the number of additional packets is particularly high although upon browser exit it closes them somewhat faster than others. Still it is far from as good as lynx.

So I was wondering: is there a way to control this through about:config settings? Or are all modern engine-based browsers already doomed?

@aramazano
Copy link

The one and only browser which does not do that is lynx - it simply downloads the document and instantly closes the socket.

Have you ever tried this test with dillo? It is quite a strict browser, albeit somewhat spartan. I have just checked its dependencies on Debian against lynx:

aramazan@torik:~$ LANG=C apt-cache depends lynx
lynx
  Depends: libbsd0
  Depends: libbz2-1.0
  Depends: libc6
  Depends: libgnutls30
  Depends: libidn11
  Depends: libncursesw5
  Depends: libtinfo5
  Depends: zlib1g
  Depends: lynx-common
  Conflicts: <lynx-ssl>
  Breaks: <lynx-cur>
  Breaks: <lynx-cur-wrapper>
  Recommends: mime-support
  Replaces: <lynx-cur>
  Replaces: <lynx-cur-wrapper>

aramazan@torik:~$ LANG=C apt-cache depends dillo
dillo
  Depends: wget
  Depends: libc6
  Depends: libfltk1.3
  Depends: libgcc1
  Depends: libjpeg62-turbo
  Depends: libpng16-16
  Depends: libssl1.1
  Depends: libstdc++6
  Depends: libx11-6
  Depends: zlib1g
  Recommends: perl
  Recommends: <perl:any>
    perl

They neither use web engines. The correlation between web engine usage and background chatter is noteworthy. (Assuming dillo behaves the way lynx does.)

@ghost
Copy link
Author

ghost commented May 2, 2018

Have you ever tried this test with dillo?

No. Isn't it abandoned?

@aramazano
Copy link

The changelog you've linked to seems to be not updated for a long while. However dillo is still hosted by Debian Sid, which suggests no problems. Had it been abandoned, Debian would phase it out, like Midori.

Also I am beginning to wonder if these multiple connections are due to some feature or performence reasons. E.g. having multiple open connections handy for parallel loading, in case there be a need for multiple downloads from the same page visited. (I am no browser expert, so please take it with a grain of salt.)

@beerisgood
Copy link

Maybe pipelining Feature?

@ghost
Copy link
Author

ghost commented May 2, 2018

The changelog you've linked to seems to be not updated for a long while.

OK, I will try look deeper.

some feature or performence reasons

I really don't know and I am not an expert either but approaching it logically:

  1. lynx's performance is excellent.
  2. There is no need for 3 sockets to download a text file of few bytes only
  3. Maintaining open sockets does not reduce TTFB (e.g. if the page contains link to another page). Only the so called HTTP prefetching does (or rather works in the background for) that but that is turned off.
  4. It does not affect caching (e.g. if the server replies with HTTP 304).

I can't think of any other performance aspects. In fact - cloaking the net with unnecessary packets may have negative effect (and probably drain device battery faster).

@ghost
Copy link
Author

ghost commented May 2, 2018

Maybe pipelining Feature?

Can you explain?

@aramazano
Copy link

There is no need for 3 sockets to download a text file of few bytes only

But the browser cannot know this beforehand, and may not be that intelligent to infer how many parallel connections will likely be needed for a given page from its extension (.txt). If a page contains multiple frames, images, etc. from the same host, then they can be loaded in parallel. I am just speculating.

@aramazano
Copy link

Also, the browser may be oblivious to the number of connections. It may well be delegating all the plumbery work to the web engine. And the web engine being a bit too diligent, may open multiple connections.

As I am not into browser design whatsoever, I can't assess how the work is shared between bowser and web engine. There is a possibility that browser just occupies itself with the user side, delegating all the network job to the web engine. In that case, it is the web engine development tyeam that needs to be addressed. Or maybe it is all mixed - i.e. both the browser and the web engine may be doing their share of chatternig. E.g. advert sites may be accessed by browser, whereas others by the engine. I don't know how to tell which is responsible fro which.

@ghost
Copy link
Author

ghost commented May 2, 2018

I am not so sure. It is not the extension but the HTTP header which determines what the browser should load:

Content-Length: 185
Content-Type: text/html

Additionally this particular page also sends Connection: keep-alive which is generally a way to reduce the needed TCP connections, not to increase them:

https://en.wikipedia.org/wiki/HTTP_persistent_connection#Advantages

As for the page content: the browser can surely see what the page contains as references and exercise program logic to open connections only when necessary (e.g. to load an image). In this particular case I assume a second connection may be needed for a favicon and the third one may be just the redirect from HTTP to HTTPS (speculation). But I don't see why connections should be kept open for much longer after the resource have been downloaded. During my tests with other pages I have noticed connections being made and packets being sent on browser closing to tracking domains, to fbcdn.net etc.

Also if we assume that the browser and the web engine it uses work each one for itself - that sounds to me like a serious design problem. If the engine sends packets on its own without being asked to - practically it can do whatever it wants. I really don't know for sure. Perhaps someone with more expertise could explain.

@ghost
Copy link
Author

ghost commented May 2, 2018

Reading further... it seems all this may be related to HTTP connection persistence, i.e. continuing communication to keep connection alive in order to prevent opening next connections. Perhaps this is beneficial for the server. I should probably test this:

http://kb.mozillazine.org/Network.http.keep-alive.timeout

@ghost
Copy link
Author

ghost commented May 2, 2018

I think this may be it:

user_pref("network.http.keep-alive.timeout", 0);

Now FF behaves like lynx :)

@ghost
Copy link
Author

ghost commented May 2, 2018

Perhaps a good value should be 10-15.

@Atavic
Copy link

Atavic commented May 3, 2018

Code contains images hosted on static.fsf.org subdomain, then there are iframes, these are all needed connections unless you harden your browser (textmode, local css).

The main culprit of the behaviour you're looking at is Plone CMS, if you look at bottom icons, they are loaded from this .css: static.fsf.org/nosvn/plone4/css/fsf-2017-11-13.css

Sockets aren't needed on static pages at all.

@ghost
Copy link
Author

ghost commented May 3, 2018

Code contains images hosted on static.fsf.org subdomain, then there are iframes, these are all needed connections unless you harden your browser (textmode, local css).

No, robots.txt does not contain that. That's why I am testing with it explicitly.

BTW how do you harden your browser to use local css? And which browser allows such hardening?

@Atavic
Copy link

Atavic commented May 3, 2018

In Firefox it's View > Page Style > No Style

Sorry, I forgot you just go to robots.txt

You can use Fiddler proxy by Telerik to look into this.

@ghost
Copy link
Author

ghost commented May 3, 2018

In Firefox it's View > Page Style > No Style

I didn't know that. Thanks. So far I used to block it using uMatrix.

You can use Fiddler proxy by Telerik to look into this.

I will check that too (also new to me). Thanks.

@pyllyukko
Copy link
Owner

Interesting.

Perhaps a good value should be 10-15.

Would require some reference and/or research regarding what would be the optimal value for this setting.

@Atavic
Copy link

Atavic commented May 11, 2018

HTTP persistent connection allows:

  • A reduced latency in subsequent requests (no handshaking).

  • Enables HTTP pipelining of requests and responses.

Setting this to more than 115 probably won't help and will make things worse. See here.

Mozilla networking preferences page lowered it to:

network.http.keep-alive.timeout 30

@ghost
Copy link
Author

ghost commented May 11, 2018

Would require some reference and/or research regarding what would be the optimal value for this setting.

According to Apache's docs the default value is 5 seconds.

HTTP persistent connection allows...

It's a balance, not just a benefit. The above link explains that too.

@pyllyukko
Copy link
Owner

According to Apache's docs the default value is 5 seconds.

From a quick testing, I would go with 15. That accounts for one TCP Keep-Alive ACK response from the server:

    1   0.000000 XXXXXXXXXXXX → 208.118.235.174 TCP 66 41424 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=128
...
   19  10.559493 XXXXXXXXXXXX → 208.118.235.174 TCP 54 [TCP Keep-Alive] 41424 → 443 [ACK] Seq=1312 Ack=4687 Win=43904 Len=0
   20  10.690260 208.118.235.174 → XXXXXXXXXXXX TCP 60 [TCP Keep-Alive ACK] 443 → 41424 [ACK] Seq=4687 Ack=1313 Win=17856 Len=0
...
   28  15.691863 XXXXXXXXXXXX → 208.118.235.174 TCP 54 41424 → 443 [RST] Seq=1367 Win=0 Len=0

Have you used this setting with your regular browsing? Any undesirable side effects?

@ghost
Copy link
Author

ghost commented May 11, 2018

Have you used this setting with your regular browsing?

Just a little in Firefox. But I have set it to 15 in TBB which I use more often.

Any undesirable side effects?

No. But I browse the web with JS turned off. Generally I would expect "side effects" in the sense of increased number of connections in a more active browsing scenario (lots of XHRs). I also suppose the more negative effect (memory-wise) may be server side. But the server can terminate the connection regardless of client timeout setting.

@pyllyukko
Copy link
Owner

pyllyukko commented May 11, 2018

Some benchmarks (with this very page):

network.http.keep-alive.timeout == 0

0-2

network.http.keep-alive.timeout == 15

15-2

network.http.keep-alive.timeout == 115

115-2

@ghost
Copy link
Author

ghost commented May 11, 2018

Testing just any page cannot be a universal measure for anything. There are many other factors influencing page load time.

@pyllyukko
Copy link
Owner

Testing just any page cannot be a universal measure for anything. There are many other factors influencing page load time.

True. Just wanted to do some quick tests.

@Atavic
Copy link

Atavic commented May 20, 2018

The test confirms that default values (Chrome has even higher values than Firefox) aren't optimized.

@ghost
Copy link
Author

ghost commented May 20, 2018

Chrome has even higher values than Firefox

What are the values for Chrome? Where do you read/set them? (I couldn't find a setting)

@Atavic
Copy link

Atavic commented May 20, 2018

Correction: Chrome had a value of 300 seconds, by looking at https://src.chromium.org/ I found:

Wait 45s until sending first TCP keep-alive packet.

@ghost
Copy link
Author

ghost commented May 20, 2018

Thanks. Do you think you could provide a link to the actual source code? Maybe we can file a request to Chromium for providing a setting.

@Atavic
Copy link

Atavic commented May 23, 2018

Can't find the src.chromium page quoted above, but...

setKeepAlive is set as 45 seconds here and it means that:

For Chrome, TCP keep-alive packets are sent every 45 seconds to ensure that the connection stays active.

See: Efficiency and Performance of WebSockets.pdf

pyllyukko added a commit that referenced this issue Jun 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants