Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError while opening a form with choices loaded from an external csv file #6446

Open
grzesiek2010 opened this issue Oct 9, 2024 · 6 comments
Assignees

Comments

@grzesiek2010
Copy link
Member

grzesiek2010 commented Oct 9, 2024

ODK Collect version

v2024.2.1 and newer versions

Device used

Most commonly OnePlus, Lenovo, Oppo, but also others that might appear more reliable, such as Samsung.

Problem description

We've recently received numerous reports in Crashlytics regarding OutOfMemoryError occurrences during form loading. See the following reports:

Fatal Exception: java.lang.OutOfMemoryError: Failed to allocate a 16 byte allocation with 1377288 free bytes and 1345KB until OOM, target footprint 536870912, growth limit 536870912; giving up on allocation because <1% of heap free after GC.
       at org.javarosa.core.model.instance.CsvExternalInstance.parse(CsvExternalInstance.java:36)
       at org.javarosa.xform.parse.ExternalInstanceParser.parse(ExternalInstanceParser.java:40)
       at org.javarosa.xform.util.XFormUtils.getExternalInstance(XFormUtils.java:183)
       at org.javarosa.core.model.instance.ExternalDataInstance.parseExternalInstance(ExternalDataInstance.java:81)
       at org.javarosa.core.model.instance.ExternalDataInstance.readExternal(ExternalDataInstance.java:117)
       at org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:183)
       at org.javarosa.core.util.externalizable.ExtWrapBase.readExternal(ExtWrapBase.java:56)
       at org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:217)
       at org.javarosa.core.util.externalizable.ExtWrapTagged.readExternal(ExtWrapTagged.java:64)
       at org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:217)
       at org.javarosa.core.util.externalizable.ExtWrapMap.readExternal(ExtWrapMap.java:127)
       at org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:217)
       at org.javarosa.core.model.FormDef.readExternal(FormDef.java:1029)

These errors occur while reading external CSV attachments using CsvExternalInstance, which suggests they are triggered when forms use large choice lists loaded from external files (not through search/pulldata as initially suspected, @lognaturel). The stack trace rules out forms with entities as well, which would use a different parser (LocalEntitiesExternalInstanceParserFactory).

The code responsible for reading CSV files (CsvExternalInstance) was recently updated:

However, I don't believe these changes could significantly impact memory usage, feel free to correct me if I'm mistaken, @seadowg.

I've also tested forms used last year when memory leaks were an issue (see this fix), and I don't see any major differences.

After thoroughly investigating, my conclusion is that this isn't a new issue. We likely encountered similar errors before, but they were previously caught and didn't appear as crashes. This changed in v2024.1, where we decided to let errors crash the app during form loading. While this isn't a new problem, we shouldn't ignore it, though I don't think it needs more immediate attention in the current release cycle.

Steps to reproduce the problem

This issue may be reproducible by opening a form with a large list of choices loaded from an external file on a device with limited resources. However, I have not been able to reproduce it.

Other information

One interesting aspect of these crash reports is that about 50% of them include an additional exception thrown while loading a form from the cache.

W/null: org.javarosa.core.util.externalizable.CannotCreateObjectException: 
    org.odk.collect.android.dynamicpreload.DynamicPreloadExtra: not instantiable at 
    org.javarosa.core.util.externalizable.PrototypeFactory.getInstance(PrototypeFactory.java:130) at 
    org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:182) at 
    org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:177) at 
    org.javarosa.core.util.externalizable.ExtWrapExternalizable.readExternal(ExtWrapExternalizable.java:37) at 
    org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:217) at 
    org.javarosa.core.util.externalizable.ExtWrapList.readExternal(ExtWrapList.java:80) at 
    org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:217) at 
    org.javarosa.core.util.externalizable.ExternalizableExtras.readExternal(ExternalizableExtras.java:16) at 
    org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:183) at 
    org.javarosa.core.util.externalizable.ExtUtil.read(ExtUtil.java:177) at 
    org.javarosa.core.model.FormDef.readExternal(FormDef.java:1045) at

I've also looked into this and have some thoughts, but I'll file a separate issue for it since I don't believe it's directly related. While it might have a slight impact on memory usage, it shouldn't make a significant difference.

@lognaturel
Copy link
Member

not through search/pulldata as initially suspected, @lognaturel

The possibility I was thinking of is CSVs intended for search/pulldata parsing being read into memory instead or additionally, for example if forms weren't correctly being identified as needing search/pulldata parsing.

@grzesiek2010
Copy link
Member Author

The possibility I was thinking of is CSVs intended for search/pulldata parsing being read into memory instead or additionally, for example if forms weren't correctly being identified as needing search/pulldata parsing.

I don't see anything like that, and even if it were true, I don't believe it would significantly impact memory usage because:

  1. The code that loads these CSV files is highly efficient and doesn't consume much memory.
  2. Loading CSV files occurs after reading the form from file or cache, so the crashes happen before attempting to load the CSV files. It’s possible that we could load a form, read the cache, and then improperly load CSV files in a way that consumes a lot of memory, leading to insufficient memory when opening another form, but that doesn't seem to be the case either.

@lognaturel
Copy link
Member

The code that loads these CSV files is highly efficient and doesn't consume much memory.

It's not the parsing that I'm worried about, it's the actually having it all in memory. As we saw with the 100k Entities form, modern low-end devices like my Samsung A13 don't have enough memory to represent even mid-sized choice lists. If a CSV was supposed to be parsed into a db for search/pulldata but was instead being read to memory, that would be a serious problem. Good to know that doesn't seem to be happening but it would be good to align our mental models on where memory gets used.

Loading CSV files occurs after reading the form from file or cache

Doesn't it happen during the process of reading the form from file or cache? Regardless, we know for sure that the crash happens while loading a CSV into memory, right? Otherwise we wouldn't be in CsvExternalInstance.parse so I'm not understanding your second point.

@grzesiek2010
Copy link
Member Author

Good to know that doesn't seem to be happening

Yeah I don't see anything like that.

Doesn't it happen during the process of reading the form from file or cache?

I was taking about CSVs used by search/pulldata. First we createFormDefFromCacheOrXml and then load CSVs a few lines below: ExternalDataUseCases.create

Regardless, we know for sure that the crash happens while loading a CSV into memory, right?

Yes and to be clear here it's about CVSs with choices but not these used by search/pulldata. The app crashes in CsvExternalInstance where we read a CSV file row by to and build a TreeElement: https://github.com/getodk/javarosa/blob/master/src/main/java/org/javarosa/core/model/instance/CsvExternalInstance.java#L27

@lognaturel
Copy link
Member

When pulldata is used (but not search), pyxform includes an external secondary instance declaration. That means there isn't really that strong of a distinction between the different ways of attaching CSVs. There needs to be (correct) code to make sure that for any given declared secondary instance, it's only loaded by the search/pulldata code or as a secondary instance, not both.

There have been moments in time in the past where both happened in some cases (see getodk/javarosa#417, #5620 which is why the "extra" was introduced) so the situation I was describing is not theoretical.

Now that I've refreshed my memory on that extra, I remember that it's used to so that the search/pulldata dbs are only populated if it's true so I agree with you that it's very unlikely to be related to the memory issue.

@grzesiek2010
Copy link
Member Author

As I remember I used the forms from #5780 (comment) for testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: not ready
Development

No branches or pull requests

2 participants