LSLI skid initial development #5

jacobdadams · 2024-12-19T16:30:41Z

This is a manual-run-as-needed, non-GCP skid to load data for the Lead Service Line Inventory map Zach is putting together for the Division of Drinking Water.

I tried a more object-oriented approach to the Google Sheets part of the process, using a class to store data as instance variables. I was trying to avoid constantly returning the results of one step just to use them as the input to the next and to allow access to the lists/dicts of records that should be included in the status email.

I'd appreciate feedback on that design- if its properly following OOP paradigms/patterns or if there's a better way I should have done it.

stdavis

Looks nice!

.github/workflows/push.yml

steveoh · 2024-12-19T17:40:13Z

LICENSE

@@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2022 UGRC
+Copyright (c) 2024 UGRC


i might suggest removing the year to not have to think about it.

This is an interesting rabbit hole to go down. Some say you only need the initial year, others that you should update every year its released, and Microsoft appears not to include a year at all. We should probably come to an office consensus on this.

Github adds the year by default if you choose the mit template which is probably why it was initially there. I've been removing it everywhere I notice it since I learned that the year was not required. This sums it up pretty well. https://opensource.stackexchange.com/a/5779. I'm open to whatever but my preference is to remove it.

README.md

src/lsli/config.py

steveoh · 2024-12-19T17:48:55Z

src/lsli/main.py

+        )
+
+        #: Strip off trailing digits for any zipcodes in ZIP+4 format
+        spatial_records["pws_zipcode"] = spatial_records["pws_zipcode"].astype(str).str[:5].astype("Int64")


not that memory matters much here, but a 5 digit number can be represented by less bits in a int32

steveoh · 2024-12-19T17:54:05Z

src/lsli/main.py

+        module_logger.debug("Loading %s rows with WGS84 coordinates", format(len(wgs_data), ","))
+        wgs_spatial = pd.DataFrame.spatial.from_xy(wgs_data, "longitude", "latitude", sr=4326)
+        module_logger.debug("Projecting WGS84 data to Web Mercator")
+        wgs_spatial.spatial.project(3857)
+        web_mercator_dfs.append(wgs_spatial)
+
+    utm_data = df[df["latitude"] > 100]
+    if not utm_data.empty:
+        module_logger.debug("Loading %s rows with UTM coordinates", format(len(utm_data), ","))
+        utm_spatial = pd.DataFrame.spatial.from_xy(utm_data, "longitude", "latitude", sr=26912)
+        module_logger.debug("Projecting UTM data to Web Mercator")
+        utm_spatial.spatial.project(3857)
+        web_mercator_dfs.append(utm_spatial)


do either of these need a transformation method applied? I think Sean chose NAD_1983_To_WGS_1984_5 for utm to web mercator.

steveoh · 2024-12-19T18:01:09Z

src/lsli/main.py

+    final_systems = pd.DataFrame()
+
+    missing_geometries = {}
+    invalid_pwsids = []


python is weird about initializing default values to mutable data outside of a constructor. This can lead to unexpected shared memory and bugs between instances of this class. It's generally speaking safer to initialize this array in the init if it's not intended to be shared.

Maybe this example will describe the issue a bit better

class MyClass: my_list = [] # Initialized outside the constructor def __init__(self): pass obj1 = MyClass() obj2 = MyClass() obj1.my_list.append(1) print(obj1.my_list) # Output: [1] print(obj2.my_list) # Output: [1] - Unexpected!

steveoh · 2024-12-19T18:11:50Z

src/lsli/main.py

+
+        module_logger.debug("Loading system area geometries from %s...", service_areas_service_url)
+        water_service_areas = arcgis.features.FeatureLayer(service_areas_service_url).query(as_df=True)
+        self.cleaned_water_service_areas = water_service_areas[water_service_areas["DWSYSNUM"] != " "].copy()


this seems a little fragile. Would it be safer to trim the column and check for an empty string or convert invalids to na? I assume this one space convention is working now but I wonder how consistent it will stay.

steveoh

I think your abstractions look good. nice job 🐍

jacobdadams added 16 commits November 21, 2024 10:26

ci: no gcp needed

c8e6ceb

chore: initial setup

0b26a47

chore: forgot to save

fe2993c

chore: remove black, other config

723765d

feat: load points from graphql endpoint

e481540

feat: load service area data from gsheet

954c80e

refactor: separate sheet loading and cleaning

e4fca0f

chore: check and report invalid PWSIDs

cf9a592

feat: load links to interactive maps

9dafc6e

chore: update secrets template, page size

bc97c92

chore: adding logging

6ebba6f

fix: time is not an allowable agol field name

0a83096

chore: live areas itemid

2d9388c

docs: readme

e4b498f

fix: empty rows in links sheet

59e1499

chore: comments

a667738

jacobdadams requested review from steveoh and stdavis December 19, 2024 16:30

jacobdadams added 2 commits December 19, 2024 09:40

chore: missing dep

80ef0f8

ci: pre-emptive dbot

bb37918

stdavis approved these changes Dec 19, 2024

View reviewed changes

.github/workflows/push.yml Show resolved Hide resolved

steveoh reviewed Dec 19, 2024

View reviewed changes

README.md Show resolved Hide resolved

steveoh reviewed Dec 19, 2024

View reviewed changes

src/lsli/config.py Show resolved Hide resolved

steveoh reviewed Dec 19, 2024

View reviewed changes

steveoh approved these changes Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSLI skid initial development #5

LSLI skid initial development #5

jacobdadams commented Dec 19, 2024

stdavis left a comment

steveoh Dec 19, 2024

jacobdadams Dec 19, 2024

steveoh Dec 20, 2024

steveoh Dec 19, 2024

steveoh Dec 19, 2024

steveoh Dec 19, 2024

steveoh Dec 19, 2024 •

edited

Loading

steveoh left a comment

LSLI skid initial development #5

Are you sure you want to change the base?

LSLI skid initial development #5

Conversation

jacobdadams commented Dec 19, 2024

stdavis left a comment

Choose a reason for hiding this comment

steveoh Dec 19, 2024

Choose a reason for hiding this comment

jacobdadams Dec 19, 2024

Choose a reason for hiding this comment

steveoh Dec 20, 2024

Choose a reason for hiding this comment

steveoh Dec 19, 2024

Choose a reason for hiding this comment

steveoh Dec 19, 2024

Choose a reason for hiding this comment

steveoh Dec 19, 2024

Choose a reason for hiding this comment

steveoh Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

steveoh left a comment

Choose a reason for hiding this comment

steveoh Dec 19, 2024 •

edited

Loading