-
Notifications
You must be signed in to change notification settings - Fork 30
Combine 2016-2017 and 2012-2014 datasets #120
Comments
Would it be possible to train models for each different part of the year? So a model for summer, a model for fall, etc. |
It would be possible, but I doubt that it would be better. Do you think that it would be? |
I think it would be possible to get better results. Detecting objects (or in this case animals) is easier/more efficient in similar environments. I think, that this would factor in for this algorithm as well. |
Try it! |
Hi all! Came across this project and it looks both interesting and in my wheelhouse. Interested in contributing if you are open to it and it is still an active project. @DigitalPhilosopher are you working on this issue or is it up for grabs? |
Great! It is absolutely an active project in theory :-). Addressing this issue in particular would be a great help in moving it forward. |
@gsganden Good to know ! I've made progress cleaning the 2012-14 labels in the CSV to align with 16-17, but I may need some extra information to restructure the image filenames to run through the build_dataset process. Particularly to recreate a process_raw.py for the 2012-14 images. Is there any additional documentation regarding filename structure? |
I am not aware of any. @mfidino can you provide additional information? |
The filename structure should be something like:
This represents:
Transect is section of the city we are sampling (DPT is north west, RST is west, SCT is southwest, JNT is the heart of Chicago. What I think is likely the most important this is the site sampled part of the file name, which likely lines up with the 2012 - 2014 data? You could also just parse the |
I'm seeing two issues with the 2012-2014 data:
|
It's going to take me a bit to look into the labels for the 2013 and 2014 images for a couple reasons.
Way back when we tagged those images we used to write the species tag into the photo metadata. I wrote a ruby script a long time ago to pull those tags (If I recall all the keywords are in the under Ruby script here. Which may at least point out to where you can look to get the species tags. |
Thank you, and congrats on the new baby! |
mfidino thank you, that's good to know! @gsganden
|
No worries, this is all voluntary so anything you can do is a bonus. I have time to work on this project today and this ticket is by far the top priority, so I'll be working on it as well.
|
>> from PIL import Image
>>
>>
>> def get_exif(filename):
>> image = Image.open(filename)
>> image.verify()
>> return image._getexif()
>>
>>
>> df.loc[:, "path"].progress_apply(get_exif).notna().mean()
0.0 >> import piexif
>>
>> piexif.load(str(df1.loc[0, "path"]))
{'0th': {}, 'Exif': {}, 'GPS': {}, 'Interop': {}, '1st': {}, 'thumbnail': None} >> identify -verbose "data/lpz_2012-2014/raw/FA14/JNT/J01-LMP1/J01-LMP1-FA14 (11).JPG"
Image: data/lpz_2012-2014/raw/FA14/JNT/J01-LMP1/J01-LMP1-FA14 (11).JPG
Format: JPEG (Joint Photographic Experts Group JFIF format)
Class: DirectClass
Geometry: 227x227+0+0
Resolution: 72x72
Print size: 3.15278x3.15278
Units: Undefined
Type: TrueColor
Endianess: Undefined
Colorspace: sRGB
Depth: 8-bit
Channel depth:
red: 8-bit
green: 8-bit
blue: 8-bit
Channel statistics:
Red:
min: 0 (0)
max: 255 (1)
mean: 125.861 (0.493571)
standard deviation: 47.4512 (0.186083)
kurtosis: 0.0259633
skewness: 0.116636
Green:
min: 0 (0)
max: 255 (1)
mean: 129.525 (0.507942)
standard deviation: 47.0568 (0.184536)
kurtosis: 0.046401
skewness: -0.0850175
Blue:
min: 0 (0)
max: 224 (0.878431)
mean: 67.0616 (0.262987)
standard deviation: 43.2846 (0.169743)
kurtosis: -0.0088973
skewness: 0.538302
Image statistics:
Overall:
min: 0 (0)
max: 255 (1)
mean: 107.482 (0.4215)
standard deviation: 45.9693 (0.180272)
kurtosis: 1.94601
skewness: 0.14245
Rendering intent: Perceptual
Gamma: 0.454545
Chromaticity:
red primary: (0.64,0.33)
green primary: (0.3,0.6)
blue primary: (0.15,0.06)
white point: (0.3127,0.329)
Interlace: None
Background color: white
Border color: srgb(223,223,223)
Matte color: grey74
Transparent color: black
Compose: Over
Page geometry: 227x227+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 75
Orientation: Undefined
Properties:
date:create: 2020-03-17T15:09:12+00:00
date:modify: 2016-09-02T07:01:19+00:00
jpeg:colorspace: 2
jpeg:sampling-factor: 2x2,1x1,1x1
signature: b74a2e95ab2c5990d8da57c24705af34221be10b7db1d93d0153241429216bc6
Artifacts:
filename: data/lpz_2012-2014/raw/FA14/JNT/J01-LMP1/J01-LMP1-FA14 (11).JPG
verbose: true
Tainted: False
Filesize: 24.6KB
Number pixels: 51.5K
Pixels per second: 0B
User time: 0.000u
Elapsed time: 0:01.000 |
Using only the 2016-2017 data is very limiting because it is only from mid-summer. I wouldn't expect models trained on just this data to generalize to other times of year, and indeed we have seen substantial performance drops on images from other times of year. We have data from all seasons from 2012-2014. It is formatted differently but contains roughly the same information. Putting these datasets together and training on the result is the lowest-hanging fruit for providing more value with this project.
The text was updated successfully, but these errors were encountered: