Autonomous Vehicle with Active Perception

Computer Vision • Hardware • Perception • Object Detection • Instance Segmentation

Problem Statement
To reduce manual labor & impact of herbicides by prototyping an agricultural robot for detection & removal of weeds from crops.

Target Audience

• Object Detection using YOLO (You Only Look Once) v5 model
• 550+ images for model training
• iPhone camera input utilized for processing on Macbook laptop*
• Probabilistic exploration mode
95% weed removal accuracy
• Hardware implemented using LEGO Mindstorm EV3 kit

* MacOS and iOS based system essential for continuity camera feature by Apple

Tools & Frameworks
Python • PyTorch • OpenCV

Image Generation from Voice Prompts

NLP • CV • Deep Learning • CNNs

Problem Statement
To develop a system for generating realistic images of an individual from voice input using machine learning algorithms.

Target Audience
Artists (Design Inspiration), Entertainment & Social Media (Content Generation), Game Development (e.g. Fortnite Emotes)

• Voice Input
• Speech-to-Text using Whisper API
• Text-to-Image using Stable Diffusion Model
• Custom Dataset Training with 3e-2 training loss (see images for reference)
Unique Images as output

Voice Input: "Affan person as a masterpiece portrait painting by John Singer Sargent in the style of Rembrandt"
Output: See images for reference

Tools & Frameworks
Python • PyTorch • CUDA/Distributed Computing

Image Captioning & Evaluation

Supervised Learning • Large Language Model (LLM) • CNN • RNN • NLP • CV

Problem Statement
To re-train an image captioning model on a different dataset and evaluate using established metrics.
Train model from scratch to demonstrate effectiveness of utilizing multiple datasets.

Target Audience
Visually Disabled People (Image Interpretation), Story Writers & Artists (Imaginative Inspiration)

• Segmentation using Attention Network
• Beam Search for sentence formation / captioning
• Evaluation using F1 Score, Recall & evaluation models
• Datasets (Large-Scale): MS COCO, Flickr8k & Google Conceptual Captions
• Fine-Tuning / Hyper Parameter Tuning ~ 79.9% improvement*
• Training from scratch VS Model Re-training ~ 40% improvement*
• Pipeline: Image - Caption - Story - Stable Diffusion Model - Images

* See table below for reference

Input & Output Image(s): See cats image for reference
Output Caption: "A cat laying on top of a couch next to another cat"

Tools & Frameworks
Python • PyTorch • LINUX • Bash Scripting • GPUs/CUDA/Distributed Computing


Metrics VIT-GPT2 ~ Trained on COCO dataset (Base Model) Model ~ Re-Trained using Flickr 8k dataset Model ~ Trained from scratch on Flickr 8k dataset
SacreBLEU Score 18.58 % 33.44 % 12.42 %
Rogue Metric (Rouge-L) 14.1 % 20 % 12.52 %
Meteor Score 3.5 % 3.6 % 3.5 %
Cider Metric 2.88 % 1.25 % 0.79 %


Wanted to see my resume? Glad you asked 😊


Arizona (AZ), USA