AudioAid

Hear to Help

UC Berkeley MIDS Capstone

An edge-deployed AI system that passively monitors household audio to detect elderly falls in real-time,enabling privacy-preserving safety monitoring without cameras or wearables.

96% Test Accuracy

99% F1 Score

0 False Negatives

PythonTensorFlowTensorFlow LiteCNNRaspberry PiTwilioLibrosa

The Challenge

The Problem

93% of adults 65+ want to age in place (CDC)

Every second, an older adult suffers a fall. That adds up to 36 million falls per year and 32,000+ deaths in the U.S. alone. For elderly individuals living independently, undetected falls are one of the leading causes of fatal outcomes.

Current solutions fall short

Wearables

Expensive, poor reliability, users forget to wear them or refuse due to stigma.

Camera Systems

Major privacy concerns, ethical issues, high cost, and limited room coverage.

$151B Aging-in-place market opportunity, growing at 13% CAGR

Our Approach

The Solution

AudioAid is a Raspberry Pi with an attached microphone that passively listens to household sounds and classifies falls using a Convolutional Neural Network, then instantly sends SMS alerts via Twilio to caregivers and emergency contacts.

Space-Agnostic Resiliency

Works in any room, any layout. No line-of-sight requirements like cameras , sound travels around corners and through doorways.

User-Based Functionality

Zero user interaction required. No devices to wear, charge, or remember. Always on, always monitoring , completely passive.

Applied Ethical Regulation

Privacy-first design. Audio is processed locally on-device and never stored or transmitted , only classification results trigger alerts.

Technical Deep Dive

Technical Architecture

Data Collection

With no publicly available fall audio datasets, we built our own from scratch , recording a 150lb test dummy alongside 7 real human volunteers (ages 20–57) across 3 different microphones (22kHz–192kHz sample rates).

564 Fall clips (188 recorded + augmented via time stretch)

915 No-fall clips (305 recorded + augmented)

Augmentation techniques: frequency masking, time masking, and time stretching to increase dataset diversity and model robustness.

150lb crash-test dummy used for fall audio recording

Feature Engineering

Raw audio waveforms were transformed into visual representations that capture both frequency and temporal information , critical for distinguishing the sharp, broadband impact of a fall from everyday household sounds.

MFCCs (Mel-Frequency Cepstral Coefficients) , compact representation of the audio spectrum that mimics human auditory perception
Mel Spectrograms , time-frequency representation weighted to match human hearing sensitivity

Model Architecture

After experimenting with 25+ model configurations, the top-performing architecture uses MFCC input features through a CNN with the following layer structure:

ZeroPadding2D → Conv2D → MaxPooling2D → Flatten → Dense → Dropout

A higher classification threshold of 0.9 (vs standard 0.5) was applied in production to minimize false positives while maintaining zero false negatives , critical for a safety application.

Model Performance

98.99% Validation Accuracy

96% Test Accuracy

	Precision	Recall	F1 Score
Fall	0.97	1.00	0.99
No Fall	1.00	0.98	0.99

K-Fold Cross Validation (k=5)

Fold 10.96

Fold 20.94

Fold 30.95

Fold 40.96

Fold 50.97

Mean accuracy: 0.96

Zero false negatives

Every real fall was correctly detected , the most critical metric for a safety application.

Confusion matrix showing model classification results

Deployment Pipeline

🎤 Live Audio Stream Sounddevice + Portaudio

→

✂ Standardize Clips Fixed-length segments

→

🎵 MFCC Extraction Librosa + python_speech_features

→

🧠 TF Lite Inference On-device classification

→

📩 Twilio Alert SMS to caregivers

What This Demonstrates

Skills & Competencies

Machine Learning

CNN design, training, hyperparameter tuning, and model evaluation across 25+ experiments to optimize for safety-critical performance.

Audio / Signal Processing

MFCCs, Mel Spectrograms, audio augmentation (time stretch, masking), and feature engineering from raw waveforms.

Edge / Embedded ML

TensorFlow Lite deployment on Raspberry Pi, real-time inference on constrained hardware with optimized model conversion.

Data Engineering

Custom dataset creation from scratch, audio preprocessing pipeline, data augmentation strategies, and data quality validation.

Full-Stack Product Thinking

End-to-end ownership from data collection to alerting system , including Twilio integration, UX considerations, and stakeholder communication.

Hardware Integration

Raspberry Pi configuration, microphone input handling, Sounddevice/Portaudio setup, and hardware-software interface development.

Research & Ethics

Privacy-first design philosophy, ethical evaluation framework, domain expert consultation, and IRB-style considerations for human subjects.

Cross-Validation & Rigor

K-fold validation, threshold tuning for safety-critical thresholds, confusion matrix analysis, and systematic experimentation methodology.

Reflection

Challenges & Impact

📌

No existing fall audio data

With no publicly available fall audio datasets, we designed and executed a custom data collection protocol with volunteers across age groups , building the first dataset of its kind.

🎓

Self-taught from zero

No prior experience with audio ML or hardware deployment. Self-taught through research papers, expert consultation, and iterative experimentation across 25+ model configurations.

🏆

First of its kind

AudioAid is the first purely sound-based fall detection system , a novel approach in a space dominated by wearable sensors and computer vision.