Skip to content

Latest commit

 

History

History
542 lines (502 loc) · 29.3 KB

appendix.md

File metadata and controls

542 lines (502 loc) · 29.3 KB

Appendix

Background

Algal blooms (e.g., Red Tide) pose a threat to the health of humans, marine life, and [aquatic] ecosystems. These blooms, often fueled by nutrient runoff and warmer temperatures, are increasing in prevalence and can negatively impact water quality and oxygen levels, hence the need to keep track of harmful algae / algal blooms (i.e., collect natural water sample(s) for analysis).

An expensive and cumbersome microscope is often needed to view samples/slides in high-resolution. While offering very high visual fidelity, these types of microscopes do not offer a solution that can be used in the field. Conversely, affordable and light microscopes come with limitations as well, such as subpar resolution and focus. The manual nature of detection, quantification, and classification further compounds the drawbacks, resulting in time-consuming and labor-intensive procedures.

Although it certainly isn't a 1 : 1 comparison, I like to think of the camera(s) as the system's eyes and the detection model as its brain: - This project applies computer vision (subfield of AI) techniques to fetch visual data from the camera(s) - The type of model being used (i.e., CNN, which is a subset of DNN) is loosely inspired by the human brain - In both cases, eyes / camera(s) see / get the input and send it to the brain / model for processing

Boards

The following boards are compatible with this project:

Board MCU SRAM Flash PSRAM Camera Microphone
Espressif ESP32-Wrover CAM ESP32 520 KB 4 Mb 4 MB OV2640 No
AI Thinker ESP32-CAM ESP32-S 520 KB 4 Mb 4 MB OV2640 No
Espressif ESP-EYE ESP32 520 KB 4 Mb 4 MB OV2640 No
Espressif ESP-S3-EYE ESP32-S3 520 KB 4 Mb 4 MB OV2640 No
LilyGo camera module ESP32 Wrover 520 KB 4 Mb 4 MB OV2640 / OV5640 No
LilyGo Simcam ESP32-S3R8 OV2640 No
LilyGo TTGO-T Camera OV2640 No
M5Stack ESP32CAM ESP32 520 Kb 4 Mb OV2640 Yes
M5Stack UnitCam ESP32-WROOM-32E 520 KB 4 Mb OV2640 No
M5Stack Camera ESP32 520 Kb 4 Mb OV2640 No
M5Stack Camera PSRAM ESP32 520 Kb 4 Mb 4 Mb OV2640 No
M5Stack UnitCamS3 ESP32-S3-WROOM-1-N16R8 520 Kb 16 Mb 8 Mb OV2640 No
Seeed studio Xiao ESP32S3 Sense ESP32-S3R8 520 KB 8 Mb 8 MB OV2640 Yes
(Software tested with ESP32-CAM AI Thinker and ESP32-S3-EYE)

Diagrams

Model Performance
[Pre-Trained] Model Confusion Matrix (Normalized) Precision-Confidence Curve Precision-Recall Curve Recall-Confidence Curve F1-Confidence Curve Training Results Validation Output Example Prediction
YOLOv8 Nano Confusion Matrix (Normalized) Precision-Confidence Curve Precision-Recall Curve Recall-Confidence Curve F1-Confidence Curve Training Results Validation Output Example Prediction
YOLOv8 Extra-Large Confusion Matrix (Normalized) Precision-Confidence Curve Precision-Recall Curve Recall-Confidence Curve F1-Confidence Curve Training Results Validation Output Example Prediction
YOLOv8 Nano with SAHI Confusion Matrix (Normalized) Precision-Confidence Curve Precision-Recall Curve Recall-Confidence Curve F1-Confidence Curve Training Results Validation Output Example Prediction
System Design
ESP32
ESP32 system design
iPhone
iPhone system design
UML
Dataset
Dataset flowchart

Class Example
Closterium Closterium
Microcystis Microcystis
Nitzschia Nitzschia
Oscillatoria Oscillatoria
Non-Algae Non-Algae
Repository Structure
.
├── assets/
│   ├── algae/
│   │   ├── closterium.jpg
│   │   ├── microcystis.jpg
│   │   ├── nitzschia.jpg
│   │   ├── non-algae.jpg
│   │   └── oscillatoria.jpg
│   ├── diagrams/
│   │   ├── drawio/
│   │   │   ├── Camera_uml.drawio
│   │   │   ├── dataset_flowchart.drawio
│   │   │   ├── esp32_sys_design.drawio
│   │   │   └── streaming_uml.drawio
│   │   ├── dataset_flowchart.png
│   │   ├── detection_uml.png
│   │   ├── esp32_sys_des.png
│   │   ├── iphone_sys_des.png
│   │   ├── saft_framework.png
│   │   ├── sahi_framework.png
│   │   ├── streaming_uml.png
│   │   └── yolov8_architecture.jpg
│   ├── esp32/
│   │   ├── ai_thinker.jpg
│   │   ├── ap_popup.png
│   │   ├── board_port.png
│   │   ├── build_upload_monitor.png
│   │   ├── choose_ap.png
│   │   ├── config.png
│   │   ├── disconnect.png
│   │   ├── esp32_ip.png
│   │   ├── index.png
│   │   ├── init_config.png
│   │   ├── open_streaming.png
│   │   └── platformio_folder.png
│   ├── misc/
│   │   ├── demo.gif
│   │   ├── iphone_ui_connect.png
│   │   ├── microscope.jpg
│   │   └── user_interface.png
│   └── models/
│       ├── custom_yolov8n/
│       │   ├── confusion_matrix_normalized.png
│       │   ├── confusion_matrix.png
│       │   ├── example.jpg
│       │   ├── F1_curve.png
│       │   ├── P_curve.png
│       │   ├── PR_curve.png
│       │   ├── R_curve.png
│       │   ├── results.png
│       │   ├── val_label.jpg
│       │   ├── val_pred.jpg
│       │   └── validation.png
│       ├── custom_yolov8x/
│       │   ├── confusion_matrix_normalized.png
│       │   ├── confusion_matrix.png
│       │   ├── example.jpg
│       │   ├── F1_curve.png
│       │   ├── P_curve.png
│       │   ├── PR_curve.png
│       │   ├── R_curve.png
│       │   ├── results.png
│       │   └── validation.png
│       └── sahi_yolov8n/
│           ├── confusion_matrix_normalized.png
│           ├── confusion_matrix.png
│           ├── example.jpg
│           ├── F1_curve.png
│           ├── P_curve.png
│           ├── PR_curve.png
│           ├── R_curve.png
│           ├── results.png
│           └── validation.png
├── docs/
│   ├── appendix.md
│   ├── CONTRIBUTING.md
│   ├── manual.md
│   ├── README.md
│   └── test_samples.pdf
├── src/
│   ├── detection/
│   │   ├── camera.py
│   │   └── esp32.py
│   └── streaming/
│       ├── boards/
│       │   ├── esp32cam_ai_thinker.json
│       │   ├── esp32cam_espressif_esp_eye.json
│       │   ├── esp32cam_espressif_esp32s2_cam_board.json
│       │   ├── esp32cam_espressif_esp32s2_cam_header.json
│       │   ├── esp32cam_espressif_esp32s3_cam_lcd.json
│       │   ├── esp32cam_espressif_esp32s3_eye.json
│       │   ├── esp32cam_freenove_s3_wroom_n8r8.json
│       │   ├── esp32cam_freenove_wrover_kit.json
│       │   ├── esp32cam_m5stack_camera_psram.json
│       │   ├── esp32cam_m5stack_camera.json
│       │   ├── esp32cam_m5stack_esp32cam.json
│       │   ├── esp32cam_m5stack_unitcam.json
│       │   ├── esp32cam_m5stack_unitcams3.json
│       │   ├── esp32cam_m5stack_wide.json
│       │   ├── esp32cam_seeed_xiao_esp32s3_sense.json
│       │   ├── esp32cam_ttgo_t_camera.json
│       │   └── esp32cam_ttgo_t_journal.json
│       ├── html/
│       │   └── index.min.html
│       ├── include/
│       │   ├── format_duration.h
│       │   ├── format_number.h
│       │   ├── lookup_camera_effect.h
│       │   ├── lookup_camera_frame_size.h
│       │   ├── lookup_camera_gainceiling.h
│       │   ├── lookup_camera_wb_mode.h
│       │   └── settings.h
│       ├── lib/
│       │   └── rtsp_server/
│       │       ├── library.json
│       │       ├── rtsp_server.cpp
│       │       └── rtsp_server.h
│       ├── src/
│       │   └── main.cpp
│       └── platformio.ini
├── weights/
│   └── custom_yolov8n.pt
├── .gitattributes
├── .gitignore
├── environment.yml
└── LICENSE.md
YOLOv8 Architecture
YOLOv8 architecture
Framework

Customization

Due to its modular, generalizable design, this project can be easily adapted and used to detect any and as many object(s) of your choosing (i.e., it's not limited to harmful algae).

To do so, you may forgo these requirements:

  • Modded microscope
  • Algae dataset
  • ESP32-CAM
  • Micro-USB cable
  • PlatformIO Visual Studio Code extension

If you still want to use an ESP32-CAM, disregard the last 3 bullets and only forgo the microscope and algae dataset. Then:

  1. Use your own dataset — comprised of [images of] the object(s) you want your custom model to detect — to create a new, custom object detection model
  2. Save/download the resulting model once finished
  3. Use the model with camera(s) for real-time detection and classification

Inference Deployed Model

Using inference library in camera.py would look similar to:

from cv2 import imshow
from inference import get_model
from supervision import Detections, BoundingBoxAnnotator, LabelAnnotator

def _process_frame(self, frame: MatLike) -> None:
  # Annotators
  label, bbox = LabelAnnotator(), BoundingBoxAnnotator()

  # Load model via Roboflow
  model = get_model(model_id = f"algae-detection-1opyx/22", api_key = "hgJXUeytoTeH8achgueb")

  # Process frames
  for result in model.infer(frame):
    # Get detected object(s)
    detection = Detections.from_inference(result)

    # Annotate the frame with its result, then show in window
    imshow(self._args.title, label.annotate(scene = bbox.annotate(frame, detection), detections = detection))
  • Increase dataset and improve model versatility by taking quality images of various types of algae

  • Increase model accuracy

    • Try different models, such as RetinaNet and YOLOv9
    • Use DC-GAN to generate additional synthetic images for training
  • Connect to ESP32 without a server (e.g., via USB, etc.) OR use RTSP instead of HTTP

  • Improve + optimize model for inference on ESP32 (aka edge device instead of computer); TFLite Edge TPU?

  • Heatsink for ESP32 to prevent overheating

  • Update microscope's 3D printed lens attachment by making it adjustable AND/OR create multiple ones for different devices, e.g., iPhone, Android, etc.

  • Add camera settings to UI (C++ instead of Python for OpenCV?)

  • Add Android compatibility (if applicable and/or necessary)

  • Write cross-platform script to automate ESP32 setup

  • Use roboflow.js to integrate project + streaming (which has its own web UI)?

    • Realtime on-device inference available via roboflow.js
    • This will load your model to run realtime inference directly in your users' web-browser using WebGL instead of passing images to the server-side
  • Save streaming URL after entering it once in CLI?

  • If calling model via Roboflow API, incorporate auth (API key or login creds/token?)

  • Add option / args for running model locally (i.e., without internet aka default) vs hosted API (i.e., with internet)

  • Active learning to improve model performance?

Further Reading

Glossary

  1. Access Point (AP): Networking device that allows wireless-capable devices to connect to a WLAN; in this case, it provides WiFi to ESP32
  2. Algae: Group of mostly aquatic, photosynthetic, and nucleus-bearing organisms that lack many features of larger multicellular plants
  3. Anaconda: Open-source platform for managing and installing various Python packages
  4. Artificial Intelligence (AI): Simulation of human intelligence in machines that can perform tasks like problem-solving, decision-making, learning, etc.
  5. Closterium: Type of algae identified by their elongated or crescent shape
  6. Computer Vision (CV): Field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos
  7. Confusion Matrix: Visualizes model performance (i.e., number of correct and incorrect predictions per class), where the x-axis is the true value and y-axis is the model's predicted value; diagonal elements represent the number of points for which the predicted label is equal to the true label (higher diagonal values are better since it indicates many correct predictions), off-diagonal elements are those mislabeled by the model (lower off-diagonal elements are better since it indicates lack of incorrect predictions)
  8. Convolutional Neural Network (CNN): Type of DNN specifically designed for image recognition and processing
  9. Deep Neural Network (DNN): ML method inspired by the human brain's neural structure that can recognize complex patterns in data (e.g., pictures, text, sounds, etc.) to produce accurate insights and predictions
  10. Epoch: One complete iteration of the entire training dataset through the ML algorithm
  11. ESP32: Series of low-cost, low-power system-on-chip microcontrollers with integrated WiFi and Bluetooth capabilities
  12. Espressif: Manufacturer of ESP32 microcontrollers
  13. Fine-Tuning: Process that takes a model (architecture + weights) already trained for one given task and tunes/tweaks the model to make it perform a second similar task
  14. Google Colab: Hosted Jupyter Notebook service that provides free and paid access to computing resources, including GPUs and TPUs, and requires no setup to use
  15. Graphics Processing Unit (GPU): Specialized electronic circuit that can perform mathematical calculations at high speed; useful for training AI and DNNs
  16. Inference: Process of using a trained ML model to make predictions, classifications, and/or detections on new data
  17. Local Access Network (LAN): Group of connected computing devices within a limited area (usually sharing a centralized Internet connection) that can communicate and share resources amongst each other
  18. Machine Learning (ML): Subfield of AI that involves training computer systems to learn from data and make decisions or predictions without being explicitly programmed
  19. Microcystis: Very toxic genus of cyanobacteria which look like clusters of small dots and is known for forming harmful algal blooms in bodies of water
  20. Motion JPEG (MJPEG): Video compression format where each frame of a digital video sequence is compressed separately as a JPEG image
  21. Nitzschia: Type of thin, elongated algae that can cause harmful algal blooms
  22. Normalize: Within the context of confusion matrices, it means the matrix elements are displayed as a percentage
  23. Oscillatoria: Genus of filamentous cyanobacteria that forms blue-green algal blooms
  24. PlatformIO: Cross-platform, cross-architecture, multi-framework tool for embedded system engineers and software engineers who write embedded applications
  25. Python: High-level programming language widely used for data analysis and ML
  26. PyTorch: ML library used for various applications, including CV
  27. Red Tide: Event which occurs on Florida’s coastline where algae grows uncontrollably
  28. Roboflow: CV developer framework for better data collection, dataset preprocessing, dataset augmentation, model training techniques, model deployment, and more
  29. Slicing Aided Fine Tuning (SAFT): Novel approach that augments the fine-tuning dataset by dividing images into overlapping patches, thus providing a more balanced representation of small objects and overcoming the bias towards larger objects in the original pre-training datasets
  30. Slicing Aided Hyper Inference (SAHI): Common method of improving the detection accuracy of small objects, which involves running inference over portions of an image then accumulating the results
  31. System-on-Chip (SoC): Integrated circuit that compresses all of a(n) computer/electronic system's required components onto one piece of silicon
  32. Tensor Processing Unit (TPU): Google’s application-specific integrated circuit (ASIC) used to accelerate ML workloads; useful for training AI and DNNs
  33. Ultralytics: Company that aims to make AI model development accessible, efficient to train, and easy to deploy
  34. Weights: Numbers associated with the connections between neurons/nodes across different layers of a DNN
  35. Wireless Local Area Network (WLAN): Computer network that links two or more devices using wireless communication to form a LAN
  36. You Only Look Once (YOLO): High performance, real-time object detection and image segmentation model developed by Ultralytics