Quickly Generating Geodataset from *many* scenes #2191
Replies: 3 comments 3 replies
-
Hi, I've noticed datasets taking a few seconds to build for remote files (which is slow enough as it is) but never dozens of seconds. Currently, Questions:
Another solution would be to pre-chip the dataset and then use
Thank you very much for the kind words! |
Beta Was this translation helpful? Give feedback.
-
This is very helpful! Thank you! My idea for speeding things up was that I should be able to crop the image using the extent of the AOI and the bbox parameter for RasterioSource and then only generate windows for that small area (1kmx1km) instead of each massive image (15kmx100km). This seems to work really well for unlabeled data. Using the from_uris constructor before, it took over 2 hours to create the dataset using our 6300 AOI across our survey area. Cropping the images first let us do it in less than 10 minutes! The problem comes in when I try the same approach using geojson labels for chip classification. I'm pretty sure there's something simple I'm missing, but I keep getting a key error when I try to query the label source. Would you be willing to look at this and help me figure out what I'm missing? I've been fighting it for a couple of days with no luck. image_path="data/Analysis_Imagery/Region01/Full/506412069050_Ortho_Bundle_Mosaic_8bit.TIF"
aoi_path = "data/AOIs/Region01/Training/imageid_506412069050.geojson"
label_path = "data/Labels/Region01/imageid_506412069050.geojson"
# Create Class config
class_config = ClassConfig(
names=['nothing', 'object'],
colors=['lightgray', 'lightblue'],
null_class='nothing')
#Create crs_transformer from image
crs_transformer = RasterioCRSTransformer.from_uri(image_path)
# Create an extent to clip from the AOI
aoiSource = GeoJSONVectorSource(
aoi_path,crs_transformer)
myextent=aoiSource.extent
rasterSource = RasterioSource(
image_path, #path to the image
allow_streaming=True, # allow_streaming so we don't have to load the whole image
bbox=myextent
) # Clip the image to the extent of the aoi. This means chip windows will only be created within the bounds of the aoi extent
#Create the AOI
aoiSource = GeoJSONVectorSource(
aoi_path,rasterSource.crs_transformer,bbox=rasterSource.bbox)
#If there are labels, import them as GeoJSONVectorSource, clipping them to the AOI extent using bbox
labelSource=None
if label_path is not None:
#import labels as a GeoJSONVectorSource
labelVectorSource = GeoJSONVectorSource(
label_path, # path to the label geojson
rasterSource.crs_transformer, # convert labels from geographic to pixel coordinates
bbox=rasterSource.bbox, # clip them to the AOI extent
vector_transformers=[
ClassInferenceTransformer(
default_class_id=class_config.get_class_id('object') #use class config
)
]
)
#Configure labels for Chip Classification
labelSourceConfig=ChipClassificationLabelSourceConfig(ioa_thresh=0.5, # 50% of the feature must be in the Chip for the chip to be positive. NOTE: This theshold could be changed
infer_cells=True, # Figure out what the cells are, we're not providing them explicitly
background_class_id=class_config.null_class_id, #
use_intersection_over_cell=False) # If true ioa_thresh would require 50% of the *chip* to contain features to be positive. That's not what we want.
#Convert the label vector to a lable source (format suitable for machine learning)
labelSource=ChipClassificationLabelSource(labelSourceConfig, #use the above config
labelVectorSource, #use the above label vectors
bbox=rasterSource.bbox, #clip to aoi extent
lazy=True) #Don't actually create the labels until they are called for. This prevents us from creating unnecessary labels
chip = rasterSource[:100,:100,[4,2,1]]
label = labelSource[:100,:100]
print(label) This results in the following error
|
Beta Was this translation helpful? Give feedback.
-
This is awesome! Thank you! I was afraid that I was losing my mind... I've tried it with the new update and everything works now. Thank you so much for your help! |
Beta Was this translation helpful? Give feedback.
-
Hello! I want to start out by saying that we've been using RasterVision for a couple of years now and love it! It just keeps getting better.
We're creating ClassificationSlidingWindowGeoDatasets and depending on the size and complexity of the AOI/transformations it takes a dozen seconds to a few minutes to create a dataset for a given scene.
The issue we're running into now is that we're trying to scale up. We have 670 scenes distributed across ~100k sqkm we want to use to create our dataset. This means it would take a few hours just to build the dataset. This is a minor inconvenience if we have to do it once, but if we have to do it every time we want to run an experiment it becomes untenable. We've thought of two possible solutions:
Unfortunately, both of these ideas have been defeated by the fact that Rasterio data can't be pickled (a requirement for both saving as a pytorch dataset, or for python's multiprocessing libraries).
Does anyone have any suggestions about how we might speed this up? And/or save the dataset output so that it can be reloaded instead of starting over each time?
Thanks in advance for any ideas!
Beta Was this translation helpful? Give feedback.
All reactions