-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSLI skid initial development #5
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice!
@@ -1,6 +1,6 @@ | |||
MIT License | |||
|
|||
Copyright (c) 2022 UGRC | |||
Copyright (c) 2024 UGRC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i might suggest removing the year to not have to think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting rabbit hole to go down. Some say you only need the initial year, others that you should update every year its released, and Microsoft appears not to include a year at all. We should probably come to an office consensus on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github adds the year by default if you choose the mit template which is probably why it was initially there. I've been removing it everywhere I notice it since I learned that the year was not required. This sums it up pretty well. https://opensource.stackexchange.com/a/5779. I'm open to whatever but my preference is to remove it.
) | ||
|
||
#: Strip off trailing digits for any zipcodes in ZIP+4 format | ||
spatial_records["pws_zipcode"] = spatial_records["pws_zipcode"].astype(str).str[:5].astype("Int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not that memory matters much here, but a 5 digit number can be represented by less bits in a int32
module_logger.debug("Loading %s rows with WGS84 coordinates", format(len(wgs_data), ",")) | ||
wgs_spatial = pd.DataFrame.spatial.from_xy(wgs_data, "longitude", "latitude", sr=4326) | ||
module_logger.debug("Projecting WGS84 data to Web Mercator") | ||
wgs_spatial.spatial.project(3857) | ||
web_mercator_dfs.append(wgs_spatial) | ||
|
||
utm_data = df[df["latitude"] > 100] | ||
if not utm_data.empty: | ||
module_logger.debug("Loading %s rows with UTM coordinates", format(len(utm_data), ",")) | ||
utm_spatial = pd.DataFrame.spatial.from_xy(utm_data, "longitude", "latitude", sr=26912) | ||
module_logger.debug("Projecting UTM data to Web Mercator") | ||
utm_spatial.spatial.project(3857) | ||
web_mercator_dfs.append(utm_spatial) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do either of these need a transformation method applied? I think Sean chose NAD_1983_To_WGS_1984_5
for utm to web mercator.
final_systems = pd.DataFrame() | ||
|
||
missing_geometries = {} | ||
invalid_pwsids = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python is weird about initializing default values to mutable data outside of a constructor. This can lead to unexpected shared memory and bugs between instances of this class. It's generally speaking safer to initialize this array in the init if it's not intended to be shared.
Maybe this example will describe the issue a bit better
class MyClass:
my_list = [] # Initialized outside the constructor
def __init__(self):
pass
obj1 = MyClass()
obj2 = MyClass()
obj1.my_list.append(1)
print(obj1.my_list) # Output: [1]
print(obj2.my_list) # Output: [1] - Unexpected!
|
||
module_logger.debug("Loading system area geometries from %s...", service_areas_service_url) | ||
water_service_areas = arcgis.features.FeatureLayer(service_areas_service_url).query(as_df=True) | ||
self.cleaned_water_service_areas = water_service_areas[water_service_areas["DWSYSNUM"] != " "].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a little fragile. Would it be safer to trim the column and check for an empty string or convert invalids to na? I assume this one space convention is working now but I wonder how consistent it will stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your abstractions look good. nice job 🐍
This is a manual-run-as-needed, non-GCP skid to load data for the Lead Service Line Inventory map Zach is putting together for the Division of Drinking Water.
I tried a more object-oriented approach to the Google Sheets part of the process, using a class to store data as instance variables. I was trying to avoid constantly returning the results of one step just to use them as the input to the next and to allow access to the lists/dicts of records that should be included in the status email.
I'd appreciate feedback on that design- if its properly following OOP paradigms/patterns or if there's a better way I should have done it.