Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networking: Fix connection corruption after turning network off and on. #2

Open
wants to merge 2 commits into
base: 0.55.4-zulip
Choose a base branch
from

Conversation

roberthoenig
Copy link

@gnprice
Copy link
Member

gnprice commented Jun 22, 2018

Huh interesting -- it looks like in master getOkHttpClient isn't actually used at all! There must be some backstory there.

Looking at git log --stat -p -S getOkHttpClient ReactAndroid/, I find where the places that you switch to using it previously stopped using it: commit 0a71f48, "Fix NetworkingModule losing Cookies when multiple CatalystInstances exist and one is destroyed".

So there was some kind of bug where the fix involved not doing this. I'd like to understand that better (the rest of that history might be useful reading for that) and have a story why we won't recreate the bug, or else find another way to apply this fix.

@gnprice
Copy link
Member

gnprice commented Jun 22, 2018

For example: there's already some code in OkHttpClientProvider.java that demonstrates a couple of ways of adding our ("us" as RN) own code around the creation of every OkHttpClient. The app has the option of setting some other factory (sFactory) which replaces that, but if they do then they can invoke enableTls12OnPreLollipop for themselves in that code (and they probably should).

We could set up that broadcast receiver there. Or if we lack the appropriate context to do that: we can add it to some data structure that's static on OkHttpClientProvider, and the receiver can iterate through that.

Maybe use WeakReferences in that data structure, so it doesn't get in the way of the clients getting GC'd if they become unused.

@roberthoenig
Copy link
Author

Looking at git log --stat -p -S getOkHttpClient ReactAndroid/, I find where the places that you switch to using it previously stopped using it: commit 0a71f48, "Fix NetworkingModule losing Cookies when multiple CatalystInstances exist and one is destroyed".
So there was some kind of bug where the fix involved not doing this. I'd like to understand that better (the rest of that history might be useful reading for that) and have a story why we won't recreate the bug, or else find another way to apply this fix.

Interesting! If you look at 0a71f48 in master, it has received quite a lot of angry comments: facebook@0a71f48#comments
However, all of them are complaining about how devs are not able to fiddle with the OkHttpClient anymore. No details about the change itself. Looking into this,

/**
 * A higher level API on top of the asynchronous JSC bridge. This provides an
 * environment allowing the invocation of JavaScript methods and lets a set of
 * Java APIs be invokable from JavaScript as well.
 */
@DoNotStrip
public interface CatalystInstance

So if there are in fact multiple CatalystInstance, the commit makes sense, because of this code in NetworkingModule:

  @Override
  public void onCatalystInstanceDestroy() {
    mShuttingDown = true;
    cancelAllRequests();

    mCookieHandler.destroy(); <-------------------------------------------------
    mCookieJarContainer.removeCookieJar(); <-----------------------------
    mRequestBodyHandlers.clear();
    mResponseHandlers.clear();
    mUriHandlers.clear();
  }

However, it doesn't look like there are ever > 1 CatalystInstance existing: CatalystInstance gets implemented by CatalystInstanceImpl. CatalystInstanceImpl is only ever instantiated in ReactInstanceManager in createReactContext(). createReactContext() is only ever called (over several other calls) by createReactContextInBackground():

  /**
   * Trigger react context initialization asynchronously in a background async task. This enables
   * applications to pre-load the application JS, and execute global code before
   * {@link ReactRootView} is available and measured. This should only be called the first time the
   * application is set up, which is enforced to keep developers from accidentally creating their
   * application multiple times without realizing it.
   *
   * Called from UI thread.
   */
  @ThreadConfined(UI)
  public void createReactContextInBackground() {

As the docstring says, createReactContextInBackground() should only be called once for an application. The reactContext gets recreated when the app is reloaded, but not before the old context is destroyed.
So, I think the code in this PR is safe. However, there surely is a reason for 0a71f48, and even if it's just some internal quirk at Facebook, the dev made clear that this change is necessary:

(but please don't make it use a single instance for everything as before, it would need to replace it in a way where you define how it's created rather than give it an instance.)

So for that reason and to get it merged into upstream at some point, I agree we should avoid going back to getHttpClient.

@roberthoenig
Copy link
Author

Ah, before I forget it: This code requires

<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />

@roberthoenig
Copy link
Author

Updated with this solution:

We could set up that broadcast receiver there. Or if we lack the appropriate context to do that: we can add it to some data structure that's static on OkHttpClientProvider, and the receiver can iterate through that.
Maybe use WeakReferences in that data structure, so it doesn't get in the way of the clients getting GC'd if they become unused.

(WeakReferences are amazing!!)

@gnprice
Copy link
Member

gnprice commented Jun 22, 2018

If you look at 0a71f48 in master, it has received quite a lot of angry comments: facebook/react-native@0a71f48#comments

Interesting -- it didn't even occur to me to look at comments on GitHub! When I mentioned the commit ID I knew it'd get auto-linkified, but I didn't follow the link.

However, all of them are complaining about how devs are not able to fiddle with the OkHttpClient anymore.

Yeah -- which it looks like the sFactory logic was subsequently added to address.

Haven't read through all the related code, but one possible way this could become an issue (a way that what happens in some application could differ from the chain of logic you describe) is this bit:

CatalystInstance gets implemented by CatalystInstanceImpl. CatalystInstanceImpl is only ever instantiated in ReactInstanceManager in createReactContext().

That separation between an interface and an implementation sounds like something that could be there to allow some users of this code to provide an alternate implementation. In particular, if there's one public implementation and then one or more other implementations within Facebook that do some more complex things that are tied to other Facebook-internal systems, that'd be a pretty classic pattern for a system open-sourced from a big company. (Two other examples from Facebook: GraphQL is a hilarious one, because the server-side implementation they open-sourced was totally unusable for real work; a startup named Meteor saw an opportunity and actually pivoted themselves to providing the missing server, calling it Apollo. MonkeyType is an extremely useful tool for adding types to an existing Python codebase, by logging what they are in production; the logger backend used internally sends to some internal giant distributed service named Scuba, and the backend they open-sourced just logs to SQLite, which is simple to deploy and perfectly good enough for a small-to-medium site.)

Or maybe that's not it. Like you said, even if we don't fully understand why it's this way, we'll just leave it undisturbed.

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! This looks great. I have some comments below: mainly on nailing down some details related to the APIs we're using (from the enormous Android SDK).

@@ -33,6 +46,9 @@
// User-provided OkHttpClient factory
private static @Nullable OkHttpClientFactory sFactory;

private static Set<OkHttpClient> sClients = Collections.newSetFromMap(
new WeakHashMap<OkHttpClient, Boolean>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't 100% confident this was an appropriate way to combine these APIs, never having used them (will the set somehow get confused by the weakrefs disappearing?)... and then I found that the newSetFromMap doc has this as an example nearly verbatim. 😃 Definitely correct.

(I don't think this needs a comment or anything; it's clear with just basic documented facts about the underlying APIs, so best for the curious reader to just look those up, like I did just now.)

@@ -33,6 +46,9 @@
// User-provided OkHttpClient factory
private static @Nullable OkHttpClientFactory sFactory;

private static Set<OkHttpClient> sClients = Collections.newSetFromMap(
new WeakHashMap<OkHttpClient, Boolean>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the thought about the collection APIs being used here: probably good to add final. Generally good to use that almost everywhere it'd be correct to use.

};
final IntentFilter intentFilter = new IntentFilter();
intentFilter.addAction(ConnectivityManager.CONNECTIVITY_ACTION);
context.registerReceiver(br, intentFilter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the doc for CONNECTIVITY_ACTION:
https://developer.android.com/reference/android/net/ConnectivityManager#CONNECTIVITY_ACTION

The doc says this is deprecated and there are better APIs -- but those require API level 24. So that's not happening soon.

Also, it looks like we'll get this if one network connection drops, even if there are others or if the system is attempting to connect to another. For this usage, I think that's fine. ... Oh, I see, the DISCONNECTED is a level-triggered state, not an edge-triggered event:
https://developer.android.com/reference/android/net/NetworkInfo.State
and if we're connecting to some other network, it'll be CONNECTING instead.

I wonder if we should do this also on CONNECTING -- for example, we might go CONNECTED -> CONNECTING -> CONNECTED, having switched to a different network, and the same bug will probably appear. The simple version would end up evicting twice in the case of CONNECTED -> DISCONNECTED -> CONNECTING -> CONNECTED, but that's probably fine -- there shouldn't be tons of clients, and evicting an empty connection pool on each one ought to be very cheap.

I'm also curious what SUSPENDED means, and wonder if we should be evicting on that too. In which case it's just state != NetworkInfo.State.CONNECTED.

In any case, I think a comment is in order explaining the logic here and the choices about when to do the eviction -- that will be helpful for anyone trying to understand it, including (a) a reviewer (me, and someone upstream), (b) someone debugging it in the future 😉 , (c) someone porting it in maybe 2023 to use the spiffy new APIs from today's recent Android versions. Maybe a block comment right above this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what SUSPENDED means, and wonder if we should be evicting on that too. In which case it's just state != NetworkInfo.State.CONNECTED.

According to this SO answer:

The connected state indicates that your phone is connected and should be able to access IP traffic. The suspended state indicates that your IP traffic is temporarily unavailable but you are still connected.
An example from the TelephonyManager is when you have access to a 2G network and you receive a phone call, the data traffic may be suspended.

I think we should opt for the state != NetworkInfo.State.CONNECTED. IP traffic unavailable -> possible interference with okhttp. Also, evicting too often isn't nearly as problematic as not evicting often enough, and given the above description, SUSPEND shouldn't occur often.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we could just trigger this on any NetworkInfo change event, that is, including CONNECT. After all, we don't know yet exactly which event causes okhttp to break. In my tests, it worked fine just evicting on DISCONNECT, but that was in my isolated emulator environment.

public static void addNetworkListenerToEvictClientConnectionsOnDisconnect(Context context) {
BroadcastReceiver br = new BroadcastReceiver() {
@Override
public void onReceive(Context context, Intent intent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the doc for this method, it looks like we should be checking the intent is what we expect:

The Intent filters used in Context.registerReceiver(BroadcastReceiver, IntentFilter) and in application manifests are not guaranteed to be exclusive. They are hints to the operating system about how to find suitable recipients. It is possible for senders to force delivery to specific recipients, bypassing filter resolution. For this reason, onReceive() implementations should respond only to known actions, ignoring any unexpected Intents that they may receive.

NetworkInfo info = extras.getParcelable("networkInfo");
NetworkInfo.State state = info.getState();
if (state == NetworkInfo.State.DISCONNECTED) {
new EvictIdleConnectionsTask().execute();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you choose this approach, with an AsyncTask, for executing this work?

I know this is a pretty tricky area of Android programming, and I don't really know what the right way to do it is; so I'm curious what docs you relied on (if you found some that helped) or how else you chose this version.

Reading docs a bit (both the method's reference doc I linked above, and the guide, which tends to be a very helpful genre of Android documentation when it exists):

  • this method is called on the main thread, so we shouldn't do anything slow;
  • I can't tell from OkHttp's docs if this method might be slow -- seems like probably it shouldn't but I'm not sure;
  • there's a warning about async tasks, saying the process might get killed after the receiver returns. With a recommendation to use goAsync to avoid that... whose own doc says it was introduced in API level 11, so plenty ancient enough for RN.

So I think probably the async isn't necessary, but it may be wise to do it anyway to be sure; but if we do, we should probably use goAsync.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If connectionPool().evictAll() is called on the UI thread, a NetworkOnMainThreadException is thrown. Android wants you to execute any networking operations in the background. I chose AsyncTask because the internet told me to (some high-voted SO answer used it and another blog post for which I cannot find the link anymore). Looking a bit more into this, it seems like AsyncTask -can potentially cause trouble if it's performing a longer operation, because it might get terminated when the activity changes (e.g. screen rotation). That shouldn't be a problem because evictAll is fast, but goAsync seems definitely better since it's built for BroadcastReceivers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, reading a bit more carefully, the guide you linked goes:

However, once your code returns from onReceive(), the BroadcastReceiver is no longer active. The receiver's host process becomes only as important as the other app components that are running in it. If that process hosts only a manifest-declared receiver (a common case for apps that the user has never or not recently interacted with), then upon returning from onReceive(), the system considers its process to be a low-priority process and may kill it to make resources available for other more important processes.

Our receiver is not manifest-declared (manifest-declared receivers don't receive CONNECTIVITY_ACTIONs in recent API versions). Rather, it's initialized in a function called from the UI thread. So our receiver's host process would probably still be treated as important as the UI thread. Furthermore, goAsync seems to have its own bag of gotcha's:

Keep in mind that the work you do here will block further broadcasts until it completes, so taking advantage of this at all excessively can be counter-productive and cause later events to be received more slowly.

Again, evictAll is fast so this shouldn't be an issue. But these things make the decision between goAsync and AsyncTask seem more like a 50/50 trade-off. I'll leave it as AsyncTask for now unless you think otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Replied in main thread.)

else {
client = createClientBuilder().build();
}
sClients.add(client);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination would actually be to stick this logic in createClientBuilder, rather than here. That leaves apps with full flexibility to drop or customize this logic if they choose to set their own factory.

An extra friendly version of that approach would be to pull out this line sClients.add(client) into a public static method, like enableTls12OnPreLollipop. That way, such an app can call that method if they want to opt back into this feature of OkHttpClientProvider.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this in createClientBuilder because that returns a client builder, not a client. However, flexibility is a good point -- I'll drop the sClients.add for replaceOkHttpClient and createClient with an sFactory, and put sClients.add(client); into a helper method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this in createClientBuilder because that returns a client builder, not a client.

Hmm, indeed. I guess I was misled by the variable and parameter names -- it gets called client here and in passing to enableTls12OnPreLollipop.

This other solution is good.

@@ -50,6 +51,7 @@ protected ReactActivityDelegate createReactActivityDelegate() {
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
mDelegate.onCreate(savedInstanceState);
OkHttpClientProvider.addNetworkListenerToEvictClientConnectionsOnDisconnect(getApplicationContext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name strikes me as very Java. 😛

(Which is not a complaint! We are after all writing Java here. The name is good and clear.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading this name, I do actually have a suggestion, though: s/ClientConnections/IdleConnections/. That more closely reflects what that evictAll method does (according to OkHttp's docs), and might avoid someone worrying that this cure could be causing its own disease by closing active connections that might have successfully finished their requests with a bit of patience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, though now that you've added the cancelAll, this new name isn't accurate -- in fact it sounds too innocent, which is dangerous.

Maybe s/EvictIdleConnections/DropConnections/g? It's a bit more vague, leaving room for the specifics of what goes inside that sClients for-loop to vary. And appropriately scary. :-p

@gnprice
Copy link
Member

gnprice commented Jun 22, 2018

Oh, and:

This code requires

Hmm. This is a pretty mild requirement, which probably any app that uses the network will want anyway, so I don't think it should stand in the way of this solution -- but it looks like that isn't something that RN requires in general. So we'll probably need to find a way to make this functionality degrade gracefully if this permission is missing.

Part of how I determined RN doesn't require this in general is I grepped for android.permission, and found among other things an error string in NetInfoModule.java telling you to add that permission if you want to use that module. That code is using a different API, though (the point-in-time getActiveNetworkInfo), so the way the permission failure manifests may be different here from there.

For this logic, because it's a workaround for a bug, I think it's fine if this logic just doesn't do anything if the permission is missing. So we just need to make sure it does that without blowing up.

@roberthoenig roberthoenig force-pushed the v0.55.4-custom branch 2 times, most recently from bff6301 to d44c384 Compare June 25, 2018 14:37
@roberthoenig
Copy link
Author

roberthoenig commented Jun 25, 2018

Updated! Thanks @gnprice for the Android insights; things are more complex than I expected them to be.

Here is a diff with the new changes since your last comments:

diff --git a/ReactAndroid/src/main/java/com/facebook/react/ReactActivity.java b/ReactAndroid/src/main/java/com/facebook/react/ReactActivity.java
index 779de56bcf..afe65b7fcd 100644
--- a/ReactAndroid/src/main/java/com/facebook/react/ReactActivity.java
+++ b/ReactAndroid/src/main/java/com/facebook/react/ReactActivity.java
@@ -9,9 +9,12 @@ package com.facebook.react;
 
 import javax.annotation.Nullable;
 
+import android.Manifest;
 import android.app.Activity;
 import android.content.Intent;
+import android.content.pm.PackageManager;
 import android.os.Bundle;
+import android.support.v4.content.ContextCompat;
 import android.view.KeyEvent;
 
 import com.facebook.react.modules.core.DefaultHardwareBackBtnHandler;
@@ -51,7 +54,10 @@ public abstract class ReactActivity extends Activity
   protected void onCreate(Bundle savedInstanceState) {
     super.onCreate(savedInstanceState);
     mDelegate.onCreate(savedInstanceState);
-    OkHttpClientProvider.addNetworkListenerToEvictClientConnectionsOnDisconnect(getApplicationContext());
+    if (ContextCompat.checkSelfPermission(this, Manifest.permission.ACCESS_NETWORK_STATE)
+        == PackageManager.PERMISSION_GRANTED) {
+      OkHttpClientProvider.addNetworkListenerToEvictIdleConnectionsOnNetworkChange(getApplicationContext());
+    }
   }
 
   @Override
diff --git a/ReactAndroid/src/main/java/com/facebook/react/modules/network/OkHttpClientProvider.java b/ReactAndroid/src/main/java/com/facebook/react/modules/network/OkHttpClientProvider.java
index e84e033dde..edcb8e383c 100644
--- a/ReactAndroid/src/main/java/com/facebook/react/modules/network/OkHttpClientProvider.java
+++ b/ReactAndroid/src/main/java/com/facebook/react/modules/network/OkHttpClientProvider.java
@@ -46,7 +46,7 @@ public class OkHttpClientProvider {
   // User-provided OkHttpClient factory
   private static @Nullable OkHttpClientFactory sFactory;
 
-  private static Set<OkHttpClient> sClients = Collections.newSetFromMap(
+  private final static Set<OkHttpClient> sClients = Collections.newSetFromMap(
     new WeakHashMap<OkHttpClient, Boolean>());
 
   public static void setOkHttpClientFactory(OkHttpClientFactory factory) {
@@ -70,14 +70,19 @@ public class OkHttpClientProvider {
     }
   }
 
-  public static void addNetworkListenerToEvictClientConnectionsOnDisconnect(Context context) {
-    BroadcastReceiver br = new BroadcastReceiver() {
+  /*
+    See https://github.com/facebook/react-native/issues/19709 for context.
+    We know that idle connections get corrupted when the connectivity state
+    changes to disconnected, but the debugging of this issue hasn't been
+    exhaustive and it's possible that other changes in connectivity also
+    corrupt idle connections. `CONNECTIVITY_ACTION`s occur infrequently
+    enough to go safe and evict all idle connections on every such action.
+   */
+  public static void addNetworkListenerToEvictIdleConnectionsOnNetworkChange(Context context) {
+    final BroadcastReceiver br = new BroadcastReceiver() {
       @Override
       public void onReceive(Context context, Intent intent) {
-        Bundle extras = intent.getExtras();
-        NetworkInfo info = extras.getParcelable("networkInfo");
-        NetworkInfo.State state = info.getState();
-        if (state == NetworkInfo.State.DISCONNECTED) {
+        if (intent.getAction().equals(ConnectivityManager.CONNECTIVITY_ACTION)) {
           new EvictIdleConnectionsTask().execute();
         }
       }
@@ -91,19 +96,18 @@ public class OkHttpClientProvider {
   // This allows app to init an OkHttpClient with custom settings.
   public static void replaceOkHttpClient(OkHttpClient client) {
     sClient = client;
-    sClients.add(client);
   }
 
   public static OkHttpClient createClient() {
-     OkHttpClient client;
     if (sFactory != null) {
-      client = sFactory.createNewNetworkModuleClient();
+      return sFactory.createNewNetworkModuleClient();
     }
     else {
-       client = createClientBuilder().build();
+      OkHttpClient client = createClientBuilder().build();
+      registerClientToEvictIdleConnectionsOnNetworkChange(client);
+      return client;
     }
-    sClients.add(client);
-    return client;
   }
 
   public static OkHttpClient.Builder createClientBuilder() {
@@ -117,6 +121,10 @@ public class OkHttpClientProvider {
     return enableTls12OnPreLollipop(client);
   }
 
+  public static void registerClientToEvictIdleConnectionsOnNetworkChange(OkHttpClient client) {
+    sClients.add(client);
+  }
+
   /*
     On Android 4.1-4.4 (API level 16 to 19) TLS 1.1 and 1.2 are
     available but not enabled by default. The following method

@roberthoenig roberthoenig force-pushed the v0.55.4-custom branch 2 times, most recently from 0f8c285 to c7258c3 Compare June 27, 2018 18:19
@roberthoenig
Copy link
Author

Updated again, this time to use client.dispatcher().cancelAll(); as well, since only that fully fixes the issue for zulip-mobile.

@gnprice
Copy link
Member

gnprice commented Jun 27, 2018

Thanks for all these changes! Reading in detail now.

Adapting here a remark I made in chat yesterday:

Still, given that we only do this [i.e., cancelAll] on disconnect, it shouldn't be too much of a problem.

Hmm. I think I'd be OK doing this as a workaround in our app; but if I were maintaining RN, this is a workaround I would be unhappy about taking upstream -- I might do it, but only if I really didn't have any confidence in getting a proper fix anytime soon. (And given how responsive the OkHttp folks have been and how competent they seem in general, I think we probably will get a proper fix before too long.)

That's because it's not so innocent a remedy. We might have just briefly lost connection, and be about to regain it. For some phones in some conditions, it may happen regularly that the connection drops out briefly -- and then for those users this remedy could be worse than the original issue. So for a platform like RN, which is widely used in diverse conditions by users the platform has no feedback channel from, it's a dicey change to take.

But for our app, we have a clearer picture of our users; we get a good stream of feedback from them so we can hope to hear about it if the remedy causes its own problems; and if so we can easily fix the problem by reverting and releasing. That makes it a much more palatable workaround for us.

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If connectionPool().evictAll() is called on the UI thread, a NetworkOnMainThreadException is thrown. Android wants you to execute any networking operations in the background.

OK -- that is certainly a good reason! :)

I chose AsyncTask because the internet told me to (some high-voted SO answer used it and another blog post for which I cannot find the link anymore).

Cool, makes sense.

Some general advice about SO: it's best treated much like Wikipedia, in that it can make you aware of interesting things but the authors are often mistaken and instead of relying on an answer being right you generally want to go read a primary source yourself. For programming and SO, that means upstream docs (like that guide, and the associated reference docs) and/or source code.

Furthermore, goAsync seems to have its own bag of gotcha's:

Keep in mind that the work you do here will block further broadcasts until it completes, so taking advantage of this at all excessively can be counter-productive and cause later events to be received more slowly.

Hmm, interesting question.

Here's the BroadcastReceiver source code:
https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/content/BroadcastReceiver.java

The first interesting thing there is goAsync:

    public final PendingResult goAsync() {
        PendingResult res = mPendingResult;
        mPendingResult = null;
        return res;
    }

That's the whole thing!

So what it seems to do is just sort of transfer ownership/responsibility for this PendingResult object. And skimming the rest of the file, the interesting guts are almost all actually in PendingResult; BroadcastReceiver itself is basically a thin shell around that.

Based on that, I think it's unlikely that goAsync actually causes anything to block on something it wasn't already blocking on -- instead, probably further broadcasts will block until the PendingResult is finished one way or another, and goAsync just makes the PendingResult last as long as the work we're doing does.

I think something a lot of us have learned from JavaScript in recent years is that "single-threadedness", i.e. each bit of code finishing something that should be fast and then explicitly yielding control at a point in the code the author explicitly chose, can be a pretty great way to structure an application. In that spirit, I think the behavior we get with goAsync is the right one here: we take the risk that evictAll is slow after all, but we basically transmute that risk from a correctness issue (what happens if we have these pile up, and randomly get around to killing connections much later?) into a performance one (we might be slow to get the next broadcast.)

(I can't seem to find a box to make this reply directly on that inline comment thread, so here it is. There's also a few small new comments below. Still reading the rest.)

@@ -54,7 +108,11 @@ public static OkHttpClient createClient() {
if (sFactory != null) {
return sFactory.createNewNetworkModuleClient();
}
return createClientBuilder().build();
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small style bit: No need for the else and the extra indentation.

The existing code here is following the "early return" pattern, which is often helpful for code like this: you get the exceptional cases out of the way with return, and then the main narrative of the function gets to continue at constant indentation. Which helps both in fitting on the screen and in reading it as a narrative, because it emphasizes that as the primary control path that the function is about.

final BroadcastReceiver br = new BroadcastReceiver() {
@Override
public void onReceive(Context context, Intent intent) {
if (intent.getAction().equals(ConnectivityManager.CONNECTIVITY_ACTION)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually another good use case for the early-return pattern: you can instead write

if (!...) {
  return;
}
Bundle extras = intent.getExtras();
...

That makes the narrative clearer, at least once you're accustomed to this pattern. (Which I think most people reading Java code will be.) Also saves a level of indentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, never thought about it this way.

changes to disconnected, but the debugging of this issue hasn't been
exhaustive and it's possible that other changes in connectivity also
corrupt idle connections. `CONNECTIVITY_ACTION`s occur infrequently
enough to go safe and evict all idle connections on every such action.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- this is a great comment!

It doesn't seem to match the code in one respect: we do this on DISCONNECTED, CONNECTING, and SUSPENDED states, but not CONNECTED.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yes, I readded the check for CONNECTED yesterday, but forgot to amend the comment.

@gnprice
Copy link
Member

gnprice commented Jun 27, 2018

I can't seem to find a box to make this reply directly on that inline comment thread, so here it is.

(FTR: the reply box is back now. It looks like the deal is that once I've started a "review", I can't reply to any existing review threads until I've submitted the "review". This is not a very good fit with real code-review workflows... oh GitHub.)

protected Object doInBackground(Object[] objects) {
for (OkHttpClient client: sClients) {
client.connectionPool().evictAll();
client.dispatcher().cancelAll();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps cancel first, then evict? It seems like evicting then canceling would mean we end up with just those connections that were doing something sitting around in the pool, now idle. If we're going to be so rough as to cancel active requests, we should probably make sure those connections get evicted too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, very good point!

@gnprice
Copy link
Member

gnprice commented Jun 27, 2018

Thanks again @roberthoenig ! Done reading the new version. Some comments above, but I think they're all fairly small at this point; this is converging quite well.

For sending this on upstream, I kind of think it might make sense to have two commits: one that does almost all of this, and then one that just adds the cancelAll line. If I understand the situation right:

  • the evictAll alone solves the problem in your test app
  • but cancelAll is needed to solve it in Zulip.

As long as the evictAll is known to be useful on its own, it's relatively easy for upstream to take, because it should be pretty harmless. But they'll also have the option to take both (or a squash of both) if they think that's warranted.

@roberthoenig
Copy link
Author

If I understand the situation right:
the evictAll alone solves the problem in your test app
but cancelAll is needed to solve it in Zulip.

Almost.

  • the evictAll alone solves the problem with a Zulip dev server.
  • but cancelAll is needed to solve it for chat.zulip.org.

I'm not sure if the underlying issue here is different http connection settings for dev and production version, or different traffic. I'd say probably the former.

@roberthoenig
Copy link
Author

In that spirit, I think the behavior we get with goAsync is the right one here: we take the risk that evictAll is slow after all, but we basically transmute that risk from a correctness issue (what happens if we have these pile up, and randomly get around to killing connections much later?) into a performance one (we might be slow to get the next broadcast.)

I'm not sure I can follow you. So first of all, this is what a goAsync solution would look like:

public void onReceive(Context context, Intent intent) {
...
final PendingResult result = goAsync();
Thread thread = new Thread() {
   public void run() {
      // Do processing
      result.finish();
   }
};
thread.start();
}

Afaic, this behavior:

what happens if we have these pile up, and randomly get around to killing connections much later?

can still happen with goAsync: If evictAll happens to be slow, the broadcast events will pile up, and be dispatched when we are already reconnected, again killing connections much later.

Still, I'll go Async if only for the reason that it's nice to have all this code in one big block, rather than a separate AsyncTask class...

…book#19709.

This bug is probably actually a bug in OkHttp: square/okhttp#4079
Both issues linked above contain extensive details about the issue,
its likely origins and how to reproduce it. A short summary of the
issue and the fix in this commit:

On Android, disconnecting from the network somehow corrupts the idle
connections and ongoing calls in okhttp clients. New requests made over
these clients fail. This commit works around that bug by evicting the idle
connection pool when we receive a DISCONNECTED or CONNECTING event (we
don't know yet if only one or both of them cause the issue). Technically,
to fully fix this issue, we would also need to cancel all ongoing calls.
However, cancelling all ongoing calls is aggressive, and not always desired
(when the app disconnects only for a short time, ongoing calls might still
succeed). In practice, just evicting idle connections results in this issue
occurring less often, so let's go with that for now.
…facebook#19709.

The previous commit partially fixed facebook#19709 by evicting all idle
connections on DISCONNECT and CONNECTING events. To fully fix facebook#19709, also
cancel all ongoing calls on these events. Cancel the ongoing calls
before evicting all idle connections, because cancelling the ongoing
calls can result in more idle connections.
@roberthoenig
Copy link
Author

Updated again :)

  • split evictAll and cancelAll into two commits
  • use goAsync
  • use early return

Let me know when I should open a PR at RN.

@roberthoenig
Copy link
Author

At this point, I'd also be happy about someone else trying out these changes on their own setup, confirm the bug, and confirm that this fixes it.

gnprice pushed a commit that referenced this pull request Aug 3, 2022
Summary:
In D30104853 (facebook@e35a963), we added fix to the issue where touches on the child view that is outside of its parent view's boundary are not registered. The only exception to that is if the parent view has overflow style `overflow: hidden`. In that case, touches happen outside of the parent view should be ignored.

{F686521911}

That fix works, but it increases the complexity for the DFS algorithm used to find the touch target in two ways:

1. Before we only traverse views that contain the touch event point. If the touch event point is outside of the view, we won't step in and traverse that part of the tree. This is actually what caused the initial problem. The fix removed that boundary check (where `isTransformedTouchPointInView` used to do) and push it later. This increases the number of tree traversal a lot.
2. The check for `overflow: hidden` is happened after we find a potential target view, which means we've spent time to find the target view before we decide if the target view is even a candidate.

This diff aims to update for the #2 item above. Since we are checking the style of the parent view, not the target view, it's not necessary to check that after we find the target view. We could check the parent view and if it has `overflow: hidden` on it, any pointer event that are not in the boundary can be ignored already.

Changelog:
[Internal][Android]

Reviewed By: mdvacca, ShikaSD

Differential Revision: D33079157

fbshipit-source-id: c79c2b38b8affb9ea0fd25b5e880b22466ab7ed9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants