Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSLI skid initial development #5

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

LSLI skid initial development #5

wants to merge 18 commits into from

Conversation

jacobdadams
Copy link
Member

This is a manual-run-as-needed, non-GCP skid to load data for the Lead Service Line Inventory map Zach is putting together for the Division of Drinking Water.

I tried a more object-oriented approach to the Google Sheets part of the process, using a class to store data as instance variables. I was trying to avoid constantly returning the results of one step just to use them as the input to the next and to allow access to the lists/dicts of records that should be included in the status email.

I'd appreciate feedback on that design- if its properly following OOP paradigms/patterns or if there's a better way I should have done it.

Copy link
Member

@stdavis stdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

.github/workflows/push.yml Show resolved Hide resolved
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2022 UGRC
Copyright (c) 2024 UGRC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i might suggest removing the year to not have to think about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting rabbit hole to go down. Some say you only need the initial year, others that you should update every year its released, and Microsoft appears not to include a year at all. We should probably come to an office consensus on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github adds the year by default if you choose the mit template which is probably why it was initially there. I've been removing it everywhere I notice it since I learned that the year was not required. This sums it up pretty well. https://opensource.stackexchange.com/a/5779. I'm open to whatever but my preference is to remove it.

README.md Show resolved Hide resolved
)

#: Strip off trailing digits for any zipcodes in ZIP+4 format
spatial_records["pws_zipcode"] = spatial_records["pws_zipcode"].astype(str).str[:5].astype("Int64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that memory matters much here, but a 5 digit number can be represented by less bits in a int32

Comment on lines +276 to +288
module_logger.debug("Loading %s rows with WGS84 coordinates", format(len(wgs_data), ","))
wgs_spatial = pd.DataFrame.spatial.from_xy(wgs_data, "longitude", "latitude", sr=4326)
module_logger.debug("Projecting WGS84 data to Web Mercator")
wgs_spatial.spatial.project(3857)
web_mercator_dfs.append(wgs_spatial)

utm_data = df[df["latitude"] > 100]
if not utm_data.empty:
module_logger.debug("Loading %s rows with UTM coordinates", format(len(utm_data), ","))
utm_spatial = pd.DataFrame.spatial.from_xy(utm_data, "longitude", "latitude", sr=26912)
module_logger.debug("Projecting UTM data to Web Mercator")
utm_spatial.spatial.project(3857)
web_mercator_dfs.append(utm_spatial)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do either of these need a transformation method applied? I think Sean chose NAD_1983_To_WGS_1984_5 for utm to web mercator.

final_systems = pd.DataFrame()

missing_geometries = {}
invalid_pwsids = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python is weird about initializing default values to mutable data outside of a constructor. This can lead to unexpected shared memory and bugs between instances of this class. It's generally speaking safer to initialize this array in the init if it's not intended to be shared.

Maybe this example will describe the issue a bit better

class MyClass:
    my_list = []  # Initialized outside the constructor

    def __init__(self):
        pass

obj1 = MyClass()
obj2 = MyClass()

obj1.my_list.append(1)

print(obj1.my_list)  # Output: [1]
print(obj2.my_list)  # Output: [1] - Unexpected!


module_logger.debug("Loading system area geometries from %s...", service_areas_service_url)
water_service_areas = arcgis.features.FeatureLayer(service_areas_service_url).query(as_df=True)
self.cleaned_water_service_areas = water_service_areas[water_service_areas["DWSYSNUM"] != " "].copy()
Copy link
Member

@steveoh steveoh Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little fragile. Would it be safer to trim the column and check for an empty string or convert invalids to na? I assume this one space convention is working now but I wonder how consistent it will stay.

Copy link
Member

@steveoh steveoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your abstractions look good. nice job 🐍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants