Appendix

Background

Algal blooms (e.g., Red Tide) pose a threat to the health of humans, marine life, and [aquatic] ecosystems. These blooms, often fueled by nutrient runoff and warmer temperatures, are increasing in prevalence and can negatively impact water quality and oxygen levels, hence the need to keep track of harmful algae / algal blooms (i.e., collect natural water sample(s) for analysis).

An expensive and cumbersome microscope is often needed to view samples/slides in high-resolution. While offering very high visual fidelity, these types of microscopes do not offer a solution that can be used in the field. Conversely, affordable and light microscopes come with limitations as well, such as subpar resolution and focus. The manual nature of detection, quantification, and classification further compounds the drawbacks, resulting in time-consuming and labor-intensive procedures.

Although it certainly isn't a 1 : 1 comparison, I like to think of the camera(s) as the system's eyes and the detection model as its brain: - This project applies computer vision (subfield of AI) techniques to fetch visual data from the camera(s) - The type of model being used (i.e., CNN, which is a subset of DNN) is loosely inspired by the human brain - In both cases, eyes / camera(s) see / get the input and send it to the brain / model for processing

Boards

The following boards are compatible with this project:

Board	MCU	SRAM	Flash	PSRAM	Camera	Microphone
Espressif ESP32-Wrover CAM	ESP32	520 KB	4 Mb	4 MB	OV2640	No
AI Thinker ESP32-CAM	ESP32-S	520 KB	4 Mb	4 MB	OV2640	No
Espressif ESP-EYE	ESP32	520 KB	4 Mb	4 MB	OV2640	No
Espressif ESP-S3-EYE	ESP32-S3	520 KB	4 Mb	4 MB	OV2640	No
LilyGo camera module	ESP32 Wrover	520 KB	4 Mb	4 MB	OV2640 / OV5640	No
LilyGo Simcam	ESP32-S3R8				OV2640	No
LilyGo TTGO-T Camera					OV2640	No
M5Stack ESP32CAM	ESP32	520 Kb	4 Mb		OV2640	Yes
M5Stack UnitCam	ESP32-WROOM-32E	520 KB	4 Mb		OV2640	No
M5Stack Camera	ESP32	520 Kb	4 Mb		OV2640	No
M5Stack Camera PSRAM	ESP32	520 Kb	4 Mb	4 Mb	OV2640	No
M5Stack UnitCamS3	ESP32-S3-WROOM-1-N16R8	520 Kb	16 Mb	8 Mb	OV2640	No
Seeed studio Xiao ESP32S3 Sense	ESP32-S3R8	520 KB	8 Mb	8 MB	OV2640	Yes

(Software tested with ESP32-CAM AI Thinker and ESP32-S3-EYE)

Diagrams

Model Performance

[Pre-Trained] Model	Confusion Matrix (Normalized)	Precision-Confidence Curve	Precision-Recall Curve	Recall-Confidence Curve	F1-Confidence Curve	Training Results	Validation Output	Example Prediction
YOLOv8 Nano
YOLOv8 Extra-Large
YOLOv8 Nano with SAHI

System Design

ESP32

iPhone

UML

streaming

detection

Dataset

Class	Example
Closterium
Microcystis
Nitzschia
Oscillatoria
Non-Algae

Repository Structure

.
├── assets/
│   ├── algae/
│   │   ├── closterium.jpg
│   │   ├── microcystis.jpg
│   │   ├── nitzschia.jpg
│   │   ├── non-algae.jpg
│   │   └── oscillatoria.jpg
│   ├── diagrams/
│   │   ├── drawio/
│   │   │   ├── Camera_uml.drawio
│   │   │   ├── dataset_flowchart.drawio
│   │   │   ├── esp32_sys_design.drawio
│   │   │   └── streaming_uml.drawio
│   │   ├── dataset_flowchart.png
│   │   ├── detection_uml.png
│   │   ├── esp32_sys_des.png
│   │   ├── iphone_sys_des.png
│   │   ├── saft_framework.png
│   │   ├── sahi_framework.png
│   │   ├── streaming_uml.png
│   │   └── yolov8_architecture.jpg
│   ├── esp32/
│   │   ├── ai_thinker.jpg
│   │   ├── ap_popup.png
│   │   ├── board_port.png
│   │   ├── build_upload_monitor.png
│   │   ├── choose_ap.png
│   │   ├── config.png
│   │   ├── disconnect.png
│   │   ├── esp32_ip.png
│   │   ├── index.png
│   │   ├── init_config.png
│   │   ├── open_streaming.png
│   │   └── platformio_folder.png
│   ├── misc/
│   │   ├── demo.gif
│   │   ├── iphone_ui_connect.png
│   │   ├── microscope.jpg
│   │   └── user_interface.png
│   └── models/
│       ├── custom_yolov8n/
│       │   ├── confusion_matrix_normalized.png
│       │   ├── confusion_matrix.png
│       │   ├── example.jpg
│       │   ├── F1_curve.png
│       │   ├── P_curve.png
│       │   ├── PR_curve.png
│       │   ├── R_curve.png
│       │   ├── results.png
│       │   ├── val_label.jpg
│       │   ├── val_pred.jpg
│       │   └── validation.png
│       ├── custom_yolov8x/
│       │   ├── confusion_matrix_normalized.png
│       │   ├── confusion_matrix.png
│       │   ├── example.jpg
│       │   ├── F1_curve.png
│       │   ├── P_curve.png
│       │   ├── PR_curve.png
│       │   ├── R_curve.png
│       │   ├── results.png
│       │   └── validation.png
│       └── sahi_yolov8n/
│           ├── confusion_matrix_normalized.png
│           ├── confusion_matrix.png
│           ├── example.jpg
│           ├── F1_curve.png
│           ├── P_curve.png
│           ├── PR_curve.png
│           ├── R_curve.png
│           ├── results.png
│           └── validation.png
├── docs/
│   ├── appendix.md
│   ├── CONTRIBUTING.md
│   ├── manual.md
│   ├── README.md
│   └── test_samples.pdf
├── src/
│   ├── detection/
│   │   ├── camera.py
│   │   └── esp32.py
│   └── streaming/
│       ├── boards/
│       │   ├── esp32cam_ai_thinker.json
│       │   ├── esp32cam_espressif_esp_eye.json
│       │   ├── esp32cam_espressif_esp32s2_cam_board.json
│       │   ├── esp32cam_espressif_esp32s2_cam_header.json
│       │   ├── esp32cam_espressif_esp32s3_cam_lcd.json
│       │   ├── esp32cam_espressif_esp32s3_eye.json
│       │   ├── esp32cam_freenove_s3_wroom_n8r8.json
│       │   ├── esp32cam_freenove_wrover_kit.json
│       │   ├── esp32cam_m5stack_camera_psram.json
│       │   ├── esp32cam_m5stack_camera.json
│       │   ├── esp32cam_m5stack_esp32cam.json
│       │   ├── esp32cam_m5stack_unitcam.json
│       │   ├── esp32cam_m5stack_unitcams3.json
│       │   ├── esp32cam_m5stack_wide.json
│       │   ├── esp32cam_seeed_xiao_esp32s3_sense.json
│       │   ├── esp32cam_ttgo_t_camera.json
│       │   └── esp32cam_ttgo_t_journal.json
│       ├── html/
│       │   └── index.min.html
│       ├── include/
│       │   ├── format_duration.h
│       │   ├── format_number.h
│       │   ├── lookup_camera_effect.h
│       │   ├── lookup_camera_frame_size.h
│       │   ├── lookup_camera_gainceiling.h
│       │   ├── lookup_camera_wb_mode.h
│       │   └── settings.h
│       ├── lib/
│       │   └── rtsp_server/
│       │       ├── library.json
│       │       ├── rtsp_server.cpp
│       │       └── rtsp_server.h
│       ├── src/
│       │   └── main.cpp
│       └── platformio.ini
├── weights/
│   └── custom_yolov8n.pt
├── .gitattributes
├── .gitignore
├── environment.yml
└── LICENSE.md

YOLOv8 Architecture

Framework

SAFT

SAHI

Customization

Due to its modular, generalizable design, this project can be easily adapted and used to detect any and as many object(s) of your choosing (i.e., it's not limited to harmful algae).

To do so, you may forgo these requirements:

Modded microscope
Algae dataset
ESP32-CAM
Micro-USB cable
PlatformIO Visual Studio Code extension

If you still want to use an ESP32-CAM, disregard the last 3 bullets and only forgo the microscope and algae dataset. Then:

Use your own dataset — comprised of [images of] the object(s) you want your custom model to detect — to create a new, custom object detection model
Save/download the resulting model once finished
Use the model with camera(s) for real-time detection and classification

Inference Deployed Model

Using inference library in camera.py would look similar to:

from cv2 import imshow
from inference import get_model
from supervision import Detections, BoundingBoxAnnotator, LabelAnnotator

def _process_frame(self, frame: MatLike) -> None:
  # Annotators
  label, bbox = LabelAnnotator(), BoundingBoxAnnotator()

  # Load model via Roboflow
  model = get_model(model_id = f"algae-detection-1opyx/22", api_key = "hgJXUeytoTeH8achgueb")

  # Process frames
  for result in model.infer(frame):
    # Get detected object(s)
    detection = Detections.from_inference(result)

    # Annotate the frame with its result, then show in window
    imshow(self._args.title, label.annotate(scene = bbox.annotate(frame, detection), detections = detection))

Future Work

Increase dataset and improve model versatility by taking quality images of various types of algae
- At least 1000 images per class
- All classes are balanced (i.e., have roughly the same amount of images)
- Dr. Schonna R. Manning and/or Mr. Q may [or may not] be able to help with categorizing any algae in new images
Increase model accuracy
- Try different models, such as RetinaNet and YOLOv9
- Use DC-GAN to generate additional synthetic images for training
Connect to ESP32 without a server (e.g., via USB, etc.) OR use RTSP instead of HTTP
- Attempted — but unable — to use RTSP
- See this GitHub Issue for further details
- Use Roboflow Inference with video, webcam, or RTSP stream
Improve + optimize model for inference on ESP32 (aka edge device instead of computer); TFLite Edge TPU?
- Convert to a C binary
  - Use standard tools to store it in a read-only program memory on device for TF Lite
  - Use DeepSea library for PyTorch
- Use TF Lite Micro API's C++ Library to run inference
Heatsink for ESP32 to prevent overheating
Update microscope's 3D printed lens attachment by making it adjustable AND/OR create multiple ones for different devices, e.g., iPhone, Android, etc.
Add camera settings to UI (C++ instead of Python for OpenCV?)
Add Android compatibility (if applicable and/or necessary)
Write cross-platform script to automate ESP32 setup
Use roboflow.js to integrate project + streaming (which has its own web UI)?
- Realtime on-device inference available via roboflow.js
- This will load your model to run realtime inference directly in your users' web-browser using WebGL instead of passing images to the server-side
Save streaming URL after entering it once in CLI?
If calling model via Roboflow API, incorporate auth (API key or login creds/token?)
Add option / args for running model locally (i.e., without internet aka default) vs hosted API (i.e., with internet)
Active learning to improve model performance?

Glossary

Access Point (AP): Networking device that allows wireless-capable devices to connect to a WLAN; in this case, it provides WiFi to ESP32
Algae: Group of mostly aquatic, photosynthetic, and nucleus-bearing organisms that lack many features of larger multicellular plants
Anaconda: Open-source platform for managing and installing various Python packages
Artificial Intelligence (AI): Simulation of human intelligence in machines that can perform tasks like problem-solving, decision-making, learning, etc.
Closterium: Type of algae identified by their elongated or crescent shape
Computer Vision (CV): Field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos
Confusion Matrix: Visualizes model performance (i.e., number of correct and incorrect predictions per class), where the x-axis is the true value and y-axis is the model's predicted value; diagonal elements represent the number of points for which the predicted label is equal to the true label (higher diagonal values are better since it indicates many correct predictions), off-diagonal elements are those mislabeled by the model (lower off-diagonal elements are better since it indicates lack of incorrect predictions)
Convolutional Neural Network (CNN): Type of DNN specifically designed for image recognition and processing
Deep Neural Network (DNN): ML method inspired by the human brain's neural structure that can recognize complex patterns in data (e.g., pictures, text, sounds, etc.) to produce accurate insights and predictions
Epoch: One complete iteration of the entire training dataset through the ML algorithm
ESP32: Series of low-cost, low-power system-on-chip microcontrollers with integrated WiFi and Bluetooth capabilities
Espressif: Manufacturer of ESP32 microcontrollers
Fine-Tuning: Process that takes a model (architecture + weights) already trained for one given task and tunes/tweaks the model to make it perform a second similar task
Google Colab: Hosted Jupyter Notebook service that provides free and paid access to computing resources, including GPUs and TPUs, and requires no setup to use
Graphics Processing Unit (GPU): Specialized electronic circuit that can perform mathematical calculations at high speed; useful for training AI and DNNs
Inference: Process of using a trained ML model to make predictions, classifications, and/or detections on new data
Local Access Network (LAN): Group of connected computing devices within a limited area (usually sharing a centralized Internet connection) that can communicate and share resources amongst each other
Machine Learning (ML): Subfield of AI that involves training computer systems to learn from data and make decisions or predictions without being explicitly programmed
Microcystis: Very toxic genus of cyanobacteria which look like clusters of small dots and is known for forming harmful algal blooms in bodies of water
Motion JPEG (MJPEG): Video compression format where each frame of a digital video sequence is compressed separately as a JPEG image
Nitzschia: Type of thin, elongated algae that can cause harmful algal blooms
Normalize: Within the context of confusion matrices, it means the matrix elements are displayed as a percentage
Oscillatoria: Genus of filamentous cyanobacteria that forms blue-green algal blooms
PlatformIO: Cross-platform, cross-architecture, multi-framework tool for embedded system engineers and software engineers who write embedded applications
Python: High-level programming language widely used for data analysis and ML
PyTorch: ML library used for various applications, including CV
Red Tide: Event which occurs on Florida’s coastline where algae grows uncontrollably
Roboflow: CV developer framework for better data collection, dataset preprocessing, dataset augmentation, model training techniques, model deployment, and more
Slicing Aided Fine Tuning (SAFT): Novel approach that augments the fine-tuning dataset by dividing images into overlapping patches, thus providing a more balanced representation of small objects and overcoming the bias towards larger objects in the original pre-training datasets
Slicing Aided Hyper Inference (SAHI): Common method of improving the detection accuracy of small objects, which involves running inference over portions of an image then accumulating the results
System-on-Chip (SoC): Integrated circuit that compresses all of a(n) computer/electronic system's required components onto one piece of silicon
Tensor Processing Unit (TPU): Google’s application-specific integrated circuit (ASIC) used to accelerate ML workloads; useful for training AI and DNNs
Ultralytics: Company that aims to make AI model development accessible, efficient to train, and easy to deploy
Weights: Numbers associated with the connections between neurons/nodes across different layers of a DNN
Wireless Local Area Network (WLAN): Computer network that links two or more devices using wireless communication to form a LAN
You Only Look Once (YOLO): High performance, real-time object detection and image segmentation model developed by Ultralytics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

appendix.md

appendix.md

Appendix

Boards

Diagrams

Customization

Inference Deployed Model

Future Work

Further Reading

Glossary

Files

appendix.md

Latest commit

History

appendix.md

File metadata and controls

Appendix

Boards

Diagrams

Customization

Inference Deployed Model

Future Work

Further Reading

Glossary