Skip to content

Commit

Permalink
Remove cache from version control and create caching module #27
Browse files Browse the repository at this point in the history
  • Loading branch information
aazuspan committed Jul 15, 2023
1 parent 2fc5ce5 commit 5ba73c1
Show file tree
Hide file tree
Showing 5 changed files with 82 additions and 75 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,6 @@ __pycache__/
htmlcov/
dist/
*.egg-info/
.tox/
.tox/

tests/data/
15 changes: 9 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,20 +58,23 @@ hatch run test:all -k feature

### Building New Tests

New features should have unit tests. To avoid having to use `getInfo` every time a test is run against a client-side Earth Engine object, `eerepr` uses a caching function `tests.test_html.load_info` to load data. This function takes an Earth Engine object and either 1) retrieves it from the local cache in `tests/data/data.json` if it has been used before, or 2) retrieves it from the server and adds it to the cache. Objects in the cache use their serialized form as an identifying key.
New features should have unit tests. If your test needs to use `getInfo` to retrieve data from an Earth Engine object, you'll need to use the caching system described below.

To demonstrate, let's write a new dummy test that uses a custom `ee.Image`.
Using `getInfo` to retrieve data from an Earth Engine object can be slow and network-dependent. To speed up tests, `eerepr` uses a caching function `tests.cache.get_info` to load data. This function takes an Earth Engine object and either 1) retrieves its info from a local cache file if it has been used before, or 2) retrieves it from the server and adds it to the cache. The cache directory and file (`tests/data/data.json`) will be created automatically the first time tests are run.

To demonstrate, let's write a new dummy test that checks the properties of a custom `ee.Image`.

```python
from tests.test_html import load_info
from tests.cache import get_info

def test_my_image():
img = ee.Image.constant(42).set("custom_property", ["a", "b", "c"])
info = load_info(img)
# Use `get_info` instead of `img.getInfo` to utilize the cache
info = get_info(img)

assert info
assert "custom_property" in info["properties"]
```

The first time the test is run, `getInfo` will be used to retrieve the image metadata and store it in `tests/data/data.json`. Subsequent runs will pull the data directly from the cache.

When you add a new test, be sure to commit the updated data cache.
Caches are kept locally and are not version-controlled, so there's no need to commit newly added objects.
36 changes: 36 additions & 0 deletions tests/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import json
import os

CACHE_DIR = "./tests/data/"
CACHE_PATH = CACHE_DIR + "data.json"


def get_info(obj):
"""Load client-side info for an Earth Engine object.
Info is retrieved (if available) from a local JSON file using the serialized
object as the key. If the data does not exist locally, it is loaded from Earth
Engine servers and stored for future use.
"""
serialized = obj.serialize()

if not os.path.isdir("./tests/data"):
os.mkdir("./tests/data")

try:
with open("./tests/data/data.json") as src:
existing_data = json.load(src)

# File is missing or unreadable
except (FileNotFoundError, json.JSONDecodeError):
existing_data = {}
with open("./tests/data/data.json", "w") as dst:
json.dump(existing_data, dst)

# File exists, but info does not
if serialized not in existing_data:
with open("./tests/data/data.json", "w") as dst:
existing_data[serialized] = obj.getInfo()
json.dump(existing_data, dst, indent=2)

return existing_data[serialized]
1 change: 0 additions & 1 deletion tests/data/data.json

This file was deleted.

Loading

0 comments on commit 5ba73c1

Please sign in to comment.