Habitat Signature Analyzer (H.S.A)- a Multi-Sensor Edge AI System Running Entirely on Arduino Hardware Uno-Q/nano 33BLE Sense Rev2

by shredermann in Circuits > Arduino

233 Views, 3 Favorites, 0 Comments

Habitat Signature Analyzer (H.S.A)- a Multi-Sensor Edge AI System Running Entirely on Arduino Hardware Uno-Q/nano 33BLE Sense Rev2

nano_uno_q_titled.png
dashboard.gif

Every room has a behavioral pattern. Not just temperature changes or motion spikes, but micro-vibrations, acoustic textures, magnetic disturbances, and environmental rhythms.

Instead of detecting isolated events, this project learns to classify the behavioral state of a space. Everything runs locally. No cloud. Just structured signals, engineered features, and interpretable machine learning.

Supplies

nano_uno_q.png

The Hardware Foundation

Capture d’écran 2026-02-20 à 15.09.19.png
hsa_table.png
hsa_packet.png
hsa_architecture.png
hsa_fft_mel_diagram_fixed.png

This project is built around two complementary boards: the Arduino Nano 33 BLE Sense Rev2 and the Arduino UNO Q. They play very different roles.


Arduino Nano 33 BLE Sense Rev2 : The Sensing Core


The Nano captures the physical world. Its 7 integrated sensors cover every relevant environmental dimension:

  1. IMU (BMI270) : micro-vibrations and movement
  2. Microphone (MP34DT05) : audio energy and spectral features via FFT
  3. Magnetometer (BMM150) : magnetic field vector (more on this later)
  4. Proximity sensor (APDS9960) : IR proximity 0-255
  5. Pressure (LPS22HB)
  6. Temperature & Humidity (HS3003)

The Nano does not decide. It observes, processes, and transmits.

Audio is processed onboard using a double-windowed FFT pipeline: two 128-point FFTs with 50% overlap, Hann-windowed, producing 16 mel-scale frequency bands averaged across both frames, plus RMS and ZCR. All other sensor data is transmitted raw. The resulting 14 engineered features are then assembled into a 118 byte binary packet sent to the UNO Q at 10 Hz.

header[2] = 2 bytes
packet_id = 2 bytes
timestamp_us = 4 bytes
audio_bands[16] = 64 bytes
audio_rms = 4 bytes
audio_zcr = 4 bytes
imu_data[3] = 12 bytes
env_data[6] = 24 bytes
prox = 1 byte
_padding = 1 byte
─────────────────────────
TOTAL = 118 bytes


Sensors as More Than Data Sources


During development, two sensors revealed capabilities beyond their primary function, opening possibilities that were not part of the original design.

The Magnetometer Discovery

The BMM150 magnetometer was initially included as a directional sensor. What emerged during testing was unexpected: it reacts measurably and consistently to nearby metallic or electronic objects.

This suggests a future interaction model: placing a phone to the left or right of the device produces a distinct magnetic signature on the X/Y axis. No buttons, no additional hardware — a magnetic gesture interface built from a standard compass sensor. This has not been implemented in this version, but the data clearly supports it as a next step.

The Proximity Sensor as a Presence Gate

The APDS9960 proximity sensor outputs a raw IR value from 255 (nothing) to 0 (very close). In the final model, its mean and max over a 10-point window became two of the most discriminative features, directly solving the ambiguity between a quiet empty room and a quiet occupied one. A hand above the device drops the proximity reading from ~235 to ~115, a difference no other sensor captured reliably. It also can do gesture interface ...(changing visualize mode for exemple)

The Barometer is a hidden gem in this setup. During my real-world tests, I noticed it reacts instantly to air pressure drops when a window is opened, especially with wind. It adds a 'security' layer: the system doesn't just hear the room, it feels the air flow.


Arduino UNO Q — The Edge Intelligence Layer


The UNO Q is the system's brain. It runs Python, InfluxDB, Scikit-learn, and Grafana, all locally, with 4GB RAM and 32GB eMMC persistent storage. Unlike typical microcontrollers, the UNO Q preserves all historical data across reboots, enabling long-term behavioral monitoring rather than just real-time reaction.


How the Two Boards Communicate


Tx nano -------> Rx Uno-Q: simple wire connexion


Feature windows are transmitted via UART at 921600 baud, a deliberately high baud rate that ensures stable, low-latency streaming of 118 byte binary packets at 10 Hz without congestion or buffer overflow.

The pipeline is clean: Nano handles feature extraction, UNO Q handles storage, inference, and visualization. A transparent STM32 passthrough on the UNO Q routes bytes between the two without any processing overhead.

The Nano does not simply forward sensor data. Audio processing happens entirely onboard: two 128-point FFTs with 50% overlap, Hann-windowed, produce 16 mel-scale frequency bands averaged across both frames, plus RMS and ZCR. This means the UNO Q never handles raw audio — it receives interpreted spectral features.

All other sensors (IMU, magnetometer, pressure, temperature, humidity, proximity) transmit raw values. The full packet is 118 bytes at 10 Hz.

From there, the UNO Q computes 14 engineered features over a sliding window of 10 data points.


Ground Truth and Labeling


Behavioral states were defined under controlled conditions:

  1. Calme : 60 seconds immobile, near-total silence. Audio RMS ~0.002, proximity at baseline ~235.
  2. Presence: Hand held above the Nano with ambient sound nearby. Proximity drops to ~115-125.
  3. Activite : Intentional movement near the device. Audio RMS ~0.007-0.008, magnetometer variance spikes to 10-12 µT².
  4. Ambiance: Music or film playing. Audio RMS ~0.033-0.035, sustained and consistent.

For this demonstration, 3 sessions of 60 seconds per class (655 windows total) were sufficient to achieve 97.2% accuracy in a controlled domestic environment. In practice, a more robust model would benefit from 10+ sessions per class, collected across different times of day, varying ambient conditions, and multiple environments. The pipeline is designed to make additional collection straightforward, simply run ml_collect_habitat.py and retrain.

Sensor Fusion and Model Training

hsa_ml_training.png
hsa_confusion_matrix.png
hsa_rolling_buffer.png

Early models with 12 features struggled to separate Calme from Presence, both share low audio and stable IMU. Adding proximity_mean and proximity_max as features 13 and 14 eliminated the ambiguity completely.

Model: RandomForestClassifier (200 trees, class_weight='balanced').

Why not deep learning? Because the goal of this tutorial is to understand every step of the AI pipeline, from raw sensor data to a live prediction on screen. Deep learning models are powerful but opaque: you feed data in, a prediction comes out, and what happens in between is difficult to explain or debug.

RandomForest is different. You can inspect feature importance, trace exactly why a classification was made, and understand what the model learned. With 655 samples and 14 tabular features, it is also the correct engineering choice, but more importantly, it is the right pedagogical choice. If you understand this pipeline, you can adapt it, improve it, and apply it to any sensor-based classification problem.

Train/test split was done by session, not randomly. The last session of each class was held out for testing, preventing data leakage between correlated windows. GroupKFold cross-validation confirmed generalization.

Test accuracy: 97.2% across 218 test windows.
Confusion matrix (test set):
Activite: 54/54 correct
Ambiance: 48/54 correct (6 misclassified as Activite — both classes share elevated audio)
Calme: 55/55 correct
Presence: 55/55 correct


Feature importance (top 5):

  1. pressure_mean 21.5%,
  2. audio_rms_mean 20.9%,
  3. audio_rms_var 12.9%,
  4. proximity_mean 12.7%,
  5. proximity_max 9.4%.


Real-Time Edge Inference on the UNO Q


The trained model is exported as a .pkl file and loaded on the UNO Q. Every 500ms, the inference loop queries the last 10 data points from InfluxDB, computes 14 features, runs predict_proba(), and averages the result over a rolling buffer of 5 predictions.

This rolling average is the key to stability. Individual predictions can show 30-50% confidence due to transient sensor noise, but the 5 prediction buffer consistently converges to the correct label. In practice, zero misclassifications were observed during live testing.

Performance on the UNO Q:

inference time under 10ms,

RAM: 1.11/3.58GB CPU: 39% usage

Detection latency to dashboard update: about 4-5 seconds (inference loop + buffer stabilization + Grafana 5s refresh cycle).

All inference runs locally. No external API. No cloud dependency. The UNO Q operates as a compact industrial edge AI node.


Dashboard and Monitoring


Two Grafana dashboards provide full system visibility.

The sensor dashboard shows real-time audio RMS, mel band spectrum, IMU axes, magnetometer XYZ, temperature, humidity, pressure trend, and proximity.

The habitat dashboard shows the current classified state, confidence gauge, per-class probability history, and an event log with timestamps.

Data persists on the eMMC across reboots. This means the behavioral history of a room accumulates over days and weeks, a foundation for long-term environmental analysis.

The CODE

hsa_unoq_linux.png
hsa_frame_sync-2.png
hsa_shared_memory.png
hsa_groupkfold.png
hsa_randomforest.png

A complete walkthrough of every pipeline stage, from sensor to live classification


This step walks through the code behind each pipeline stage. Rather than listing every function, we focus on the design decisions that make this system work, and why each choice was made. The full source code is available in the project files.

You can find all the code here " HSA"

https://github.com/shredermann-hash/Habitat-Signature-Analyzer/tree/instructable

How the System is Deployed


The Arduino UNO Q runs two environments simultaneously, and understanding this is essential before diving into the code.

App Lab - The STM32 firmware


App Lab is the UNO Q's embedded development environment. It manages the STM32 microcontroller, which handles two roles: driving the onboard LED matrix and acting as a transparent UART passthrough between the Nano and the Linux SoC. The STM32 firmware (unoQstm32.cpp) is uploaded via App Lab. A minimal Python stub (unoQmain.py) runs alongside it to keep the App Lab runtime alive:

# unoQmain.py — keeps App Lab runtime alive for STM32
from arduino.app_utils import App
import time

def loop():
time.sleep(1) # STM32 handles everything else

App.run(user_loop=loop)


SSH - The Python intelligence layer


Everything else runs over SSH on the Qualcomm SoC Linux environment: the capture daemon, InfluxDB writes, ML inference, and Grafana. Three Python processes are launched by a single shell script (start_phisualize.sh) using setsid so they survive terminal closure:

# start_phisualize.sh — launches all three processes
setsid python3 capture_daemon.py > /tmp/capture.log 2>&1 &
setsid python3 ml_process_influx.py > /tmp/ml_process.log 2>&1 &
setsid python3 ml_predict_habitat.py > /tmp/habitat.log 2>&1 &

The STM32 and the Python layer communicate through the same physical UART at 921600 baud — bytes flow transparently in both directions. The STM32 forwards sensor packets from the Nano to Linux, and forwards LED commands from Linux to the matrix.


Pipeline 1 - The Nano Firmware (nano.cpp)


I used the same classes .h/.cpp from each sensor from the Nano 33BLE sense rev2 I did in my last instructable "Phisualize it!"


The binary packet

All sensor data is packed into a single 118-byte struct and sent over UART once per audio frame (~10 Hz). Three design choices here matter:

  1. __attribute__((packed)) eliminates compiler padding. Without it, the struct would be larger and the receiver would misread field positions.
  2. The 0xAA 0xBB header acts as a synchronization marker. If bytes are lost in transit, the receiver scans for this pattern to re-lock onto the frame boundary.
  3. packet_id is a rolling 16-bit counter. A gap in the sequence on the receiver side means packets were dropped, not corrupted.
struct __attribute__((packed)) SensorFeatures {
uint8_t header[2]; // 0xAA 0xBB — sync marker
uint16_t packet_id; // loss detection
uint32_t timestamp_us; // micros() — temporal alignment
float audio_bands[16]; // 16 mel-scale bands (64 bytes)
float audio_rms; // RMS energy
float audio_zcr; // zero-crossing rate
float imu_data[3]; // last IMU sample X, Y, Z (12 bytes)
float env_data[6]; // mag XYZ, pressure, temp, humidity
uint8_t prox; // proximity 0-255
uint8_t _padding;
}; // Total: 118 bytes @ 921600 baud = negligible bandwidth


The double-windowed FFT audio pipeline


This is the most technically complex part of the firmware. Raw PCM audio is transformed into 16 mel-scale frequency bands onboard, the Linux side never sees a single audio sample, only interpreted spectral features.

A single FFT per 96-sample PDM buffer would miss spectral detail between frames. Two FFTs with 50% overlap give a more stable estimate. The Hann window prevents spectral leakage, without it, the sharp frame edges would introduce artificial frequency content:

// Precompute Hann window once at startup
for (int i = 0; i < AUDIO_FFT_SIZE; i++) {
hann_window[i] = 0.5f * (1.0f - arm_cos_f32(2*PI*i / AUDIO_FFT_SIZE));
}

// Apply window then FFT (repeated twice with 50% overlap)
for (int i = 0; i < AUDIO_FFT_SIZE; i++) {
audio_fft_input[i] = (buffer[i] / 32768.0f) * hann_window[i];
}
arm_rfft_fast_f32(&audio_fft, audio_fft_input, audio_fft_output, 0);


The 64 FFT bins are grouped into 16 mel-scale bands. Lower frequencies get narrower bands (more detail), higher frequencies get wider bands, matching how human hearing perceives sound:

const int mel_bins[17] = {
0, 1, 2, 3, 4, 5, 7, 9, 12, 15, 19, 24, 30, 37, 45, 54, 64
};

for (int b = 0; b < 16; b++) {
float sum = 0;
int count = mel_bins[b+1] - mel_bins[b];
for (int i = mel_bins[b]; i < mel_bins[b+1]; i++)
sum += audio_magnitudes[i];
bands[b] = sum / count;
}

// Average both FFTs before transmission
feat.audio_bands[i] = (bands1[i] + bands2[i]) * 0.5f;


Pipeline 2 - The STM32 (unoQstm32.cpp)


The STM32 has two responsibilities: transparent passthrough and LED matrix rendering. The passthrough is intentionally simple, any complexity here would add latency and potential failure points:

void loop() {
// Nano -> Linux SoC (sensor packets)
while (Serial.available())
Serial1.write(Serial.read());

// Linux SoC -> LED matrix (habitat patterns)
// Detect 0xDD header, validate XOR checksum, render
while (Serial1.available()) {
uint8_t b = Serial1.read();
if (b == 0xDD && spectrum_index == 0) { ... }
// 15-byte packet: 0xDD + 13 band values + checksum
}
}


The LED matrix renders habitat states as bar patterns.

Calme = 3 columns lit,
Presence = 6,
Activite = 9,
Ambiance = all 13.

A XOR checksum validates each packet before rendering, corrupted packets are silently discarded.


Pipeline 3 - The Capture Daemon (capture_daemon.py)


The capture daemon runs over SSH on the Linux SoC. It reads the UART byte stream, reconstructs packets, and writes them to shared memory for the ML process to consume. It never touches InfluxDB directly, that is the ML process's responsibility.

Frame synchronization

The byte stream may contain partial packets, especially on startup. The daemon scans for the 0xAA 0xBB header to re-lock onto frame boundaries:

while len(buffer) >= FRAME_SIZE:
try:
idx = buffer.index(b'\xAA\xBB') # find sync marker
except ValueError:
buffer = buffer[-FRAME_SIZE:] # keep tail, discard noise
break

if idx + FRAME_SIZE <= len(buffer):
packet = buffer[idx:idx+FRAME_SIZE]
buffer = buffer[idx+FRAME_SIZE:]
# validate and store...


Loss detection

The packet_id field enables real-time loss monitoring, essential for validating UART reliability at 921600 baud:

packet_id, timestamp_us = struct.unpack('<HI', packet[2:8])

if last_id is not None:
lost = (packet_id - last_id - 1) % 65536
if lost > 0:
print(f'Lost {lost} packets (ID {last_id} -> {packet_id})')

last_id = packet_id


Shared memory ring buffer

Packets are stored in a 100-slot ring buffer in /dev/shm, shared memory on the Linux filesystem. This avoids disk I/O overhead and allows the ML process to consume data independently at its own pace:

# 100 slots × 118 bytes + 4 bytes write index = 11804 bytes
shm = shared_memory.SharedMemory(create=True,
size=4 + RING_SIZE * FRAME_SIZE,
name='phisualize_buffer')

slot = write_idx_ptr[0] % RING_SIZE
frames_buffer[slot] = np.frombuffer(packet, dtype=np.uint8)
write_idx_ptr[0] += 1


Pipeline 4 - The ML Process (ml_process_influx.py)


The ML process reads from shared memory and decodes each packet into individual sensor fields written to InfluxDB. The struct unpacking must exactly match the Nano's packed struct layout:

# Unpack: 2 header + 2 packet_id + 4 timestamp + 27 floats + 2 bytes
packet_id, timestamp_us = struct.unpack('<HI', packet[2:8])
floats = struct.unpack('<27fBB', packet[8:])

# floats[0:16] = audio bands
# floats[16] = audio_rms
# floats[17] = audio_zcr
# floats[18:21] = imu X, Y, Z
# floats[21:24] = mag X, Y, Z
# floats[24] = pressure
# floats[25] = temperature
# floats[26] = humidity
# floats[27] = proximity (uint8)


Points are batched in groups of 10 before writing to InfluxDB, reducing write overhead from 10 individual writes to 1 batch write per second:

batch_points.append(point)
if len(batch_points) >= 10:
influx.write_points(batch_points)
batch_points = []

Process isolation is key: capture_daemon and ml_process_influx are two separate Python processes communicating only through shared memory. If the ML process crashes, the capture daemon keeps running and no sensor data is lost. This architecture is what enabled 3+ days of continuous uptime.


Pipeline 5 - Feature Engineering (ml_collect_habitat.py)


InfluxDB stores one data point per packet at ~10 Hz. The feature engineering layer takes the last 10 points (approximately 1 second of data) and computes 14 descriptors that summarize the behavioral state of the environment.

A single sensor reading at one instant is meaningless. A distribution of readings over time tells a story.

def compute_features(window):
audio_rms = window['audio_rms'].values.astype(float)
imu_norm = sqrt(imu_x**2 + imu_y**2 + imu_z**2)
mag_norm = sqrt(mag_x**2 + mag_y**2 + mag_z**2)
proximity = window['proximity'].values.astype(float)

return {
'audio_rms_mean': mean(audio_rms), # average energy
'audio_rms_var': var(audio_rms), # energy stability
'audio_rms_delta': rms[-1] - rms[0], # energy trend
'mag_norm_var': var(mag_norm), # magnetic disturbance
'proximity_mean': mean(proximity), # average IR reading
'proximity_max': max(proximity), # closest detection
# ... 8 more features
}


Two features deserve special attention:

  1. mag_norm_var is the magnetometer discovery. Human movement perturbs the local magnetic field. At rest: ~2 µT². During activity: 10-12 µT². This was not designed, it emerged from the data.
  2. proximity_mean and proximity_max solved the Calme vs Presence problem. A hand above the Nano drops the IR reading from ~235 to ~115. No other sensor captured this difference reliably. Adding these two features increased the feature set from 12 to 14 and eliminated all Calme/Presence confusion.

The timestamp fix

An important implementation detail: InfluxDB's internal now() clock was offset from the system clock. Queries using now() returned 0 results. The fix is to use Python's timestamp() multiplied by 1e9 for nanosecond-epoch alignment:

start_time = pd.Timestamp.now('UTC')
# ...
start_ns = int(start_time.timestamp() * 1e9) # correct nanosecond epoch
end_ns = int(end_time.timestamp() * 1e9)

query = f'SELECT ... FROM sensors WHERE time >= {start_ns} AND time <= {end_ns}'


Pipeline 6 - Training Strategy (ml_train_habitat.py)


The data leakage problem

A naive random train/test split is fundamentally wrong for time-series data. Consecutive windows share 9 out of 10 data points, splitting them randomly means the model has seen nearly identical samples during training. The reported accuracy would be inflated and the model would fail in the real world.

The correct approach splits by session, the last session of each class is held out for testing:

# Dynamic split: last session of each class → test set
test_sessions = []
for label in np.unique(y):
label_sessions = np.unique(groups[y == label])
test_sessions.append(int(label_sessions[-1]))

test_mask = np.isin(groups, test_sessions)
test_idx = np.where(test_mask)[0]
train_idx = np.where(~test_mask)[0]


GroupKFold cross-validation

The same logic applies to cross-validation. GroupKFold ensures each fold's validation set contains only sessions unseen during training for that fold:

gkf = GroupKFold(n_splits=5)
cv_scores = cross_val_score(
pipeline, X_train, y_train,
cv=gkf, groups=groups_train
)


Why RandomForest?

The goal of this tutorial is to understand every step of the AI pipeline. RandomForest is transparent: you can inspect feature importance, trace exactly why a classification was made, and understand what the model learned. With 655 samples and 14 tabular features, it is also the correct engineering choice, but more importantly, it is the right pedagogical choice. Deep learning would be a black box here, not a learning tool.

pipeline = Pipeline([
('scaler', StandardScaler()),
('rf', RandomForestClassifier(
n_estimators=200,
class_weight='balanced', # equal weight per class
min_samples_leaf=3, # reduces overfitting
random_state=42,
n_jobs=-1 # all CPU cores
))
])


Pipeline 7 - Real-Time Inference (ml_predict_habitat.py)


The inference loop

Every 500ms, the prediction process queries the last 10 data points from InfluxDB, reorders them chronologically (the query returns DESC), computes 14 features, and asks the model for per-class probabilities:

# Query DESC for recency, reorder ASC for correct delta calculations
df = pd.DataFrame(points).iloc[::-1].reset_index(drop=True)

# Compute 14 features
features = [mean(audio_rms), var(audio_rms), ...]

# Get probability vector — not just a label
proba = pipeline.predict_proba([features])[0]
# → [0.02, 0.91, 0.05, 0.02] for ambiance


The rolling buffer

A single prediction can be uncertain due to transient sensor noise. The rolling buffer of 5 predictions smooths these transients. Individual predictions often show 30-50% confidence, the average consistently converges to the correct label. In live testing, zero misclassifications were observed despite the seemingly low per-prediction confidence:

PROBA_BUFFER_SIZE = 5

proba_buffer.append(proba)
if len(proba_buffer) > PROBA_BUFFER_SIZE:
proba_buffer.pop(0)

mean_proba = np.mean(proba_buffer, axis=0)
label = CLASSES[np.argmax(mean_proba)]
confidence = mean_proba[np.argmax(mean_proba)] * 100


Writing back to InfluxDB

The predicted label, confidence, and per-class probabilities are written to a separate InfluxDB measurement (habitat) every second. Grafana reads from this measurement to drive the dashboard panels:

point = [{
'measurement': 'habitat',
'fields': {
'label': label, # 'calme', 'presence'...
'confidence': confidence, # 0.0-1.0
'proba_calme': mean_proba[0],
'proba_presence': mean_proba[1],
'proba_activite': mean_proba[2],
'proba_ambiance': mean_proba[3],
}
}]
client.write_points(point)


The LED matrix feedback

When the predicted class changes, a 15-byte command is sent over /dev/ttyHS1 to the STM32, which renders the corresponding bar pattern on the LED matrix. A XOR checksum prevents corrupted commands from displaying garbage:

HABITAT_PATTERNS = {
'calme' : [8,8,8,0,0,0,0,0,0,0,0,0,0], # 3 columns
'presence' : [8,8,8,8,8,8,0,0,0,0,0,0,0], # 6 columns
'activite' : [8,8,8,8,8,8,8,8,8,0,0,0,0], # 9 columns
'ambiance' : [8,8,8,8,8,8,8,8,8,8,8,8,8], # 13 columns
}

data = bytes([0xDD] + bands)
checksum = XOR of all bytes
uart.write(data + bytes([checksum]))


Detection Latency

The end-to-end latency from a real-world event to a dashboard update is approximately 4-5 seconds. This breaks down as: 500ms inference loop + ~2.5s for the 5-prediction buffer to stabilize + up to 5s for Grafana's refresh cycle. For behavioral classification, distinguishing whether a room is calm, occupied, or active, this latency is entirely acceptable. This system does not react to events. It interprets context.

INSTALLATION

hsa_survival_linux.png
hsa_survival_python.png

Everything you need to get the system running from scratch on the Arduino UNO Q


This step covers the complete installation process: connecting to the UNO Q, installing Python dependencies, setting up InfluxDB and Grafana, uploading the Nano firmware, and launching the system. The process takes approximately 1-2 hours on first install.

Prerequisites

  1. Arduino Nano 33 BLE Sense Rev2 with the Phisualize firmware uploaded (covered in the Nano Firmware step)
  2. Arduino UNO Q powered and connected to your local network via WiFi or USB-C
  3. Arduino IDE 2.x installed on your development machine for Nano firmware upload
  4. SSH client on your development machine (Terminal on Mac/Linux, PowerShell or PuTTY on Windows)
  5. Arduino App Lab installed from the Arduino website for UNO Q STM32 firmware

1 - Connect to the UNO Q via SSH

Find the UNO Q's IP address from your router's DHCP table, or from the Arduino Lab for Boards app. Then connect:

ssh arduino@<UNO-Q-IP>
# Default password: "the one you choose when installing Arduino App Lab"

# Verify the environment
python3 --version
# → Python 3.13.x

uname -m
# → aarch64 (confirms 64-bit ARM — use arm64 packages)

df -h /home/arduino
# → Should show ~18 GB available

All Python processes in this project run over SSH, not via App Lab. App Lab is used only for the STM32 LED matrix firmware. Keep two terminals open: one for SSH, one for App Lab.

2 - Install Python Dependencies

The UNO Q ships with Python 3.13 but without pip. The first step is to install pip via apt, then install the required packages:

Install pip and base packages via apt

sudo apt update
sudo apt install -y python3-pip python3-numpy python3-serial

Install remaining packages via pip

Use this flag with caution, running "pip3 install --break-system-packages" bypasses the system package manager's dependency protection. On a stable, dedicated system like the UNO Q running only Phisualize, this is safe. On a general-purpose Linux system, it can create dependency conflicts that are difficult to recover from without a full reinstall.


The --break-system-packages flag is required on the UNO Q to bypass the system package manager's restrictions. This is the correct and expected approach on this platform:

pip3 install influxdb --break-system-packages
pip3 install scikit-learn --break-system-packages
pip3 install pandas --break-system-packages
pip3 install joblib --break-system-packages
scikit-learn pulls in scipy (32 MB) and joblib automatically. The full download is ~50 MB, allow 2-3 minutes on a typical home connection at the UNO Q's WiFi speed.

Verify all packages

python3 -c "import serial; print('✓ pyserial', __import__('serial').__version__)"
python3 -c "import numpy; print('✓ numpy', __import__('numpy').__version__)"
python3 -c "import pandas; print('✓ pandas', __import__('pandas').__version__)"
python3 -c "import sklearn; print('✓ sklearn', __import__('sklearn').__version__)"
python3 -c "import influxdb; print('✓ influxdb', __import__('influxdb').__version__)"
python3 -c "import joblib; print('✓ joblib', __import__('joblib').__version__)"
All six should print a version number. If any fail, re-run the corresponding pip3 install command.

3 - Install and Configure InfluxDB

Install InfluxDB 1.x (arm64)

# Download InfluxDB 1.8 arm64 package
wget https://dl.influxdata.com/influxdb/releases/influxdb_1.8.10_arm64.deb

sudo dpkg -i influxdb_1.8.10_arm64.deb

sudo systemctl enable influxdb
sudo systemctl start influxdb

# Verify
influxd version
# → InfluxDB v1.8.x
Use the arm64 .deb package, verify your architecture first with
'uname -m'
An aarch64 result confirms arm64 is correct.

Create the database

influx

> CREATE DATABASE phisualize
> SHOW DATABASES
# → phisualize should appear
> exit
Set retention policy (recommended)
influx
> USE phisualize
> CREATE RETENTION POLICY "30d" ON "phisualize" DURATION 30d REPLICATION 1 DEFAULT
> exit


4 - Install and Configure Grafana

Install Grafana

Do NOT use the apt repository method for Grafana on the UNO Q. The Grafana GPG key (B53AE77B...) is not recognized by the sqv signature tool on Debian Trixie, which causes apt update to fail with a signature error. This error blocks App Lab from starting on every subsequent boot, a critical problem.

The correct approach is to install Grafana directly from the .deb package, bypassing the repository entirely:

# Install Grafana directly from .deb — skip the apt repository
cd /tmp
wget https://dl.grafana.com/oss/release/grafana_11.4.0_arm64.deb
sudo dpkg -i grafana_11.4.0_arm64.deb

# Fix any missing dependencies
sudo apt --fix-broken install -y

sudo systemctl enable grafana-server
sudo systemctl start grafana-server

# Verify
sudo systemctl status grafana-server
# → Active: active (running)
If you accidentally added the Grafana apt repository before installing, remove it immediately:
sudo rm /etc/apt/sources.list.d/grafana.list && sudo apt update.
Leaving this file in place causes App Lab to fail on every launch with a GPG signature error that blocks the entire environment.

5 - Upload the Nano Firmware

On your development machine (not the UNO Q), open Arduino IDE 2.x:

  1. Install board support: Boards Manager → search 'Arduino Mbed OS Nano Boards' → Install
  2. Install libraries: Library Manager → install Arduino_BMI270_BMM150, Arduino_LPS22HB, Arduino_HS300x, Arduino_APDS9960, Arduino_CMSIS-DSP
  3. Open nano.cpp and all PhiXxx files in the same sketch folder
  4. Select board: Tools → Board → Arduino Nano 33 BLE
  5. Upload: verify sizeof(SensorFeatures) prints 118 in Serial Monitor after upload
# Expected Serial Monitor output after boot:
================================
PHISUALIZE V2 - NANO SENSE
sizeof(SensorFeatures) = 118
CORRIGÉ : IMU 3 floats + timestamp
================================
If size of prints anything other than 118, there is a struct alignment mismatch. The Python capture_daemon will receive malformed packets. Do not proceed until this shows 118.

6 - Upload the STM32 Firmware via App Lab

App Lab manages the UNO Q's STM32 microcontroller separately from the Linux SoC. Open App Lab, connect to the UNO Q, and create a new project with two files:

  1. unoQstm32.cpp : the STM32 passthrough and LED matrix firmware
  2. unoQmain.py : the minimal Python stub that keeps the App Lab runtime alive

Upload via App Lab's Run button. The LED matrix should flash white briefly on boot, then go dark. This confirms the STM32 is running.

7 - Deploy the Python Scripts

Copy all Python scripts to the UNO Q via SCP from your development machine:

# From your development machine:
scp capture_daemon.py arduino@<IP>:~/phisualizematrix/
scp ml_process_influx.py arduino@<IP>:~/phisualizematrix/
scp ml_predict_habitat.py arduino@<IP>:~/phisualizematrix/
scp ml_collect_habitat.py arduino@<IP>:~/phisualizematrix/
scp ml_train_habitat.py arduino@<IP>:~/phisualizematrix/
scp start_phisualize.sh arduino@<IP>:~/phisualizematrix/
scp stop_phisualize.sh arduino@<IP>:~/phisualizematrix/

Then on the UNO Q via SSH:

chmod +x ~/phisualizematrix/start_phisualize.sh
chmod +x ~/phisualizematrix/stop_phisualize.sh
mkdir -p ~/ml_data
mkdir -p ~/ml_models

8 - First Launch

With the Nano connected to the UNO Q via UART and both powered on, launch the system from SSH:

cd ~/phisualizematrix
./start_phisualize.sh

The script launches three processes with setsid, they survive terminal closure. Expected output:

🧹 Nettoyage système...
📦 Démarrage capture_daemon...
🧠 Démarrage ml_process...
🏠 Démarrage habitat signature...

✅ PHISUALIZE V2 DÉMARRÉ
capture_daemon : PID xxxxx
ml_process : PID xxxxx
habitat : PID xxxxx
Verify data is flowing
# Check capture daemon is receiving packets
tail -f /tmp/capture.log
# → Should show packet counts increasing

# Check InfluxDB is receiving data
influx -execute 'SELECT count(audio_rms) FROM sensors' -database phisualize
# → count should grow every few seconds

# Check habitat predictions
tail -f /tmp/habitat.log
# → Should show state classifications every ~500ms
Stop the system
./stop_phisualize.sh

The three processes run independently with setsid. Closing the SSH terminal does NOT stop them. Always use stop_phisualize.sh for a clean shutdown, it also removes the shared memory segment cleanly.

Next Steps

At this point the system is running in production mode, collecting sensor data, writing to InfluxDB, and attempting live classification. However, since no model has been trained yet, ml_predict_habitat.py will fail to load the .pkl file and log an error.

The next step is data collection and training: run ml_collect_habitat.py for each of the four habitat classes, then run ml_train_habitat.py to generate the model. Once habitat_signature_pipeline.pkl exists in ~/ml_models/, restart the system and live classification will begin.

Data Collection and Training

hsa_ml_training.png
hsa_confusion_matrix.png

Teaching the system to recognize your environment : from labeled sessions to a live model

This step is where the system learns. You will collect labeled sensor data for each habitat class, then train a RandomForest classifier on that data. The result is a .pkl model file that the prediction process loads to run live inference. The entire process runs on the UNO Q, no external compute required.

Before starting, make sure the system is running: ./start_phisualize.sh launched, Nano connected, data flowing into InfluxDB. Verify with:

influx -execute 'SELECT count(audio_rms) FROM sensors' -database phisualize
# → count should be non-zero and growing


The Four Habitat Classes

The system classifies the environment into four behavioral states. Each must be collected under controlled, reproducible conditions, the quality of your labels directly determines model accuracy.

  1. 😴 Calme : Empty room, near-total silence. No one nearby, no audio playing, device completely immobile. This is the baseline. Audio RMS ~0.001-0.002, proximity at baseline ~235, magnetometer stable.
  2. 🧍 Presence : Your hand held 5-10 cm above the Nano, with some ambient sound nearby (phone, low music). The proximity sensor is the key discriminator: it drops from ~235 to ~115-125. Audio slightly elevated but not dominant.
  3. 🏃 Activite : Active movement near the device. Walking around it, moving objects nearby, gesturing. The magnetometer variance spikes to 10-12 µT² (vs ~2 µT² at rest). Audio RMS rises to ~0.007-0.008.
  4. 🎵 Ambiance : Music or film playing at normal listening volume. Audio RMS ~0.033-0.035, sustained and consistent. Device and environment otherwise still.

These definitions are specific to your environment. If you replicate this project, define your own classes based on your use case, the pipeline works for any set of behavioral states as long as they produce distinct sensor signatures.

Step 1 - Collect Data (ml_collect_habitat.py)

The collection script queries InfluxDB for sensor data recorded during a timed session, computes 14 features over a sliding window of 10 points, and saves the result as a CSV file in ~/ml_data/.

Run a collection session

Via SSH on the UNO Q, with the system running in a separate terminal:

cd ~/phisualizematrix
python3 ml_collect_habitat.py

# The script will prompt:
# Label: calme
# Duree (secondes, defaut 60): 60

# Then:
# Prepare le contexte...
# Appuie sur ENTER pour demarrer ->

# After ENTER, stay completely still and silent for 60 seconds
# The script then queries InfluxDB and saves:
# /home/arduino/ml_data/habitat_calme_20260219_143022.csv
# → ~51 fenetres (stride 1 sur 60 points @ 10 Hz)

Collect all four classes

Repeat for each class. Run 3 sessions per class minimum, more is better. Between sessions, take a short break to reset the environment:

# Session 1 - Calme
python3 ml_collect_habitat.py # label: calme, 60s

# Session 2 - Presence
python3 ml_collect_habitat.py # label: presence, 60s
# Hold your hand 5-10cm above the Nano for the full 60s

# Session 3 - Activite
python3 ml_collect_habitat.py # label: activite, 60s
# Move actively near the device for the full 60s

# Session 4 - Ambiance
python3 ml_collect_habitat.py # label: ambiance, 60s
# Play music at normal volume, stay still

# Repeat each 2 more times (3 sessions per class minimum)
Verify collected data
ls -la ~/ml_data/
# → 12+ files: habitat_calme_*.csv, habitat_presence_*.csv, ...

# Check a file
head -2 ~/ml_data/habitat_calme_*.csv | head -3
# → header row + first data row with 14 features + label

# Count total windows
wc -l ~/ml_data/*.csv
# → Should be 150+ lines total (excluding headers)
A common mistake is collecting all sessions of the same class back-to-back in identical conditions. Try varying the time of day and exact setup slightly between sessions, this makes the model more robust to natural variation.

Step 2 - Train the Model (ml_train_habitat.py)

The training script loads all CSV files from ~/ml_data/, builds the feature matrix, splits by session to avoid data leakage, trains a RandomForest pipeline, and saves the model to ~/ml_models/.

Run training
cd ~/phisualizematrix
python3 ml_train_habitat.py
Expected output:
===================================
HABITAT SIGNATURE V2 - TRAINING
===================================

Session 0 | calme | 51 fenetres
Session 1 | calme | 51 fenetres
Session 2 | calme | 53 fenetres
Session 3 | presence | 51 fenetres
...

Total : 655 fenetres
Sessions : 12

GroupKFold CV (n=5)...
Scores : ['100.0%', '100.0%', '50.5%', '100.0%', '100.0%']
Mean : 90.1% +/- 19.8%

Test Accuracy : 97.2%

Feature Importance:
pressure_mean 0.215 ###########
audio_rms_mean 0.209 ##########
audio_rms_var 0.129 ######
proximity_mean 0.127 ######
proximity_max 0.094 ####
...

Understanding the CV scores

The GroupKFold cross-validation score of 50.5% on one fold is not a bug, it is expected. GroupKFold splits by session, and with only 3 sessions per class, one fold may contain a session that is more variable than the others. The test accuracy of 97.2% on a held-out session per class is the reliable metric. Cross-validation here is a consistency check, not the primary accuracy measure.

Understanding the confusion matrix

The 6 misclassifications in the original training were all Ambiance predicted as Activite, both classes share elevated audio energy. This is the model's one genuine ambiguity. In practice it resolves within 2-3 inference cycles as the rolling buffer stabilizes.

Confusion Matrix:

See matrix confusion schema.

Saved files

ls ~/ml_models/
# → habitat_signature_pipeline.pkl (scikit-learn Pipeline)
# → habitat_signature_config.json (metadata: classes, features, accuracy)
The .pkl file contains the full pipeline: StandardScaler + RandomForestClassifier. Loading it with joblib.load() gives you a pipeline.predict_proba() method that accepts a 14-feature vector and returns class probabilities directly.

Step 3 - Launch Live Inference


Once the model exists, restart the system to activate the prediction process:

./stop_phisualize.sh
sleep 3
./start_phisualize.sh

# Monitor predictions
tail -f /tmp/habitat.log
Expected output in habitat.log:
Chargement pipeline...
Classes : ['activite', 'ambiance', 'calme', 'presence']
Features : 14
Accuracy : 97.2%

=============================================
😴 HABITAT : CALME
Confiance : 94.3%
Probas : activite=1% ambiance=2% calme=94% presence=3%
=============================================

The system is now classifying your environment in real time. Open Grafana at http://<UNO-Q-IP>:3000 and watch the Habitat Rescue dashboard — the state panel, confidence gauge, and probability history should all update within 5 seconds.

Retraining and Adaptation

The model is trained on your specific environment at a specific time. If accuracy degrades over days or weeks (seasonal changes, new furniture, different ambient noise floor), simply collect new sessions and retrain. The pipeline is designed for this:

# Add new sessions to existing data
python3 ml_collect_habitat.py # collect more sessions as needed

# Retrain on all data including new sessions
python3 ml_train_habitat.py

# Restart to load new model
./stop_phisualize.sh && sleep 3 && ./start_phisualize.sh

For this tutorial, 3 sessions of 60 seconds per class (655 windows total) achieved 97.2% accuracy in a controlled domestic environment. For a production deployment, 10+ sessions per class collected across different times of day and varying conditions would significantly improve robustness. Each additional real-world session is worth more than any hyperparameter adjustment.

InfluxDB

hsa_influx_A_datamodel.png
hsa_influx_B_measurements.png
hsa_influx_C_write_read.png

Time-series persistent storage running locally on the Arduino UNO Q


InfluxDB is the data backbone of Phisualize. Every sensor packet received from the Nano is decoded and written to InfluxDB. The ML prediction process reads from it to build feature windows. The prediction results are written back to it. Grafana reads from it to render the dashboards. Everything flows through this single local database, no cloud, no external dependency.

The version used is InfluxDB 1.6.7, installed manually on the UNO Q Linux environment. Version 1.x uses InfluxQL, a SQL-like query language that is straightforward to read and write, and compatible with the Python influxdb client library used throughout this project.

Why InfluxDB?

Most sensor projects store data in CSV files or SQLite. InfluxDB offers three specific advantages for this use case:

  1. Time-series native: Every data point is automatically indexed by timestamp. Queries like 'last 10 points' or 'data between two times' are first-class operations, not workarounds.
  2. Persistent across reboots: Data is stored on the UNO Q's 32GB eMMC. The behavioral history of a room accumulates over days and weeks without any manual intervention. A SQLite file would work too, but InfluxDB handles concurrent writes from multiple processes cleanly.
  3. Grafana native integration: InfluxDB is one of Grafana's core datasources. No adapter, no middleware just a direct connection with full InfluxQL support in the query editor.

Installation on the UNO Q

InfluxDB 1.6.7 is installed manually via the official Debian/Ubuntu package. On the UNO Q Linux environment, connect via SSH and run:

# Download InfluxDB 1.x package (ARM64 for UNO Q)
wget https://dl.influxdata.com/influxdb/releases/influxdb_1.8.10_arm64.deb

# Install
sudo dpkg -i influxdb_1.8.10_arm64.deb

# Enable and start service
sudo systemctl enable influxdb
sudo systemctl start influxdb

# Verify
influxd version
# → InfluxDB v1.x.x

Use the arm64 variant for the UNO Q's Qualcomm SoC. The amd64 package will fail silently or produce a SIGILL error at runtime.

Create the database

Once InfluxDB is running, create the 'phisualize' database used by all three Python processes:

# Open InfluxDB CLI
influx

# Inside the CLI:
CREATE DATABASE phisualize
SHOW DATABASES
# → phisualize should appear in the list
exit


Configure data retention (optional but recommended)

By default, InfluxDB keeps data forever. On the UNO Q's 32GB eMMC, several months of data at 10 Hz would eventually fill the disk. A retention policy limits storage to a practical window, 30 days is sufficient for behavioral analysis:

influx

USE phisualize
CREATE RETENTION POLICY "30d" ON "phisualize"
DURATION 30d REPLICATION 1 DEFAULT

SHOW RETENTION POLICIES ON phisualize

If you skip the retention policy, monitor disk usage with 'df -h' periodically. A full eMMC will cause InfluxDB write failures and stop the entire pipeline silently.


Data Model

InfluxDB organizes data into measurements (equivalent to tables). this project uses two measurements in the 'phisualize' database:

Measurement: sensors

Written by ml_process_influx.py at ~10 Hz. One point per decoded packet from the Nano. Contains all raw sensor fields:

measurement: sensors
tags: device=nano
fields:
audio_band_0 ... audio_band_15 (float) -- 16 mel bands
audio_rms (float) -- RMS energy
audio_zcr (float) -- zero-crossing rate
imu_x, imu_y, imu_z (float) -- accelerometer g
mag_x, mag_y, mag_z (float) -- magnetic field µT
pressure (float) -- hPa
temperature (float) -- °C
humidity (float) -- % RH
proximity (int) -- 0-255 IR


Measurement: habitat

Written by ml_predict_habitat.py every ~500ms. Contains the ML output:

measurement: habitat
fields:
label (string) -- 'calme', 'presence', 'activite', 'ambiance'
confidence (float) -- rolling buffer confidence 0.0-1.0
proba_calme (float) -- per-class probability
proba_presence (float)
proba_activite (float)
proba_ambiance (float)

InfluxDB 1.x stores string fields differently from numeric fields. The label field is a string, Grafana's value mapping system handles the string-to-display conversion without any additional processing.

Python Client Usage

All three Python processes use the influxdb Python client (v5.x). The connection is identical in all three:

from influxdb import InfluxDBClient

client = InfluxDBClient(host='localhost', port=8086, database='phisualize')


Writing data

Points are written as a list of dicts. The ml_process batches 10 points per write to reduce I/O overhead:

point = {
'measurement': 'sensors',
'tags': {'device': 'nano'},
'fields': {
'audio_rms': 0.0023,
'proximity': 235,
# ... all other fields
}
}

batch.append(point)
if len(batch) >= 10:
client.write_points(batch) # single write for 10 points
batch = []


Reading data

The feature engineering and prediction processes query InfluxDB using raw InfluxQL. The critical detail is timestamp alignment, InfluxDB uses nanosecond Unix epoch internally. Python's time objects must be converted correctly:

# CORRECT: nanosecond epoch from Python timestamp
start_ns = int(pd.Timestamp.now('UTC').timestamp() * 1e9)

# WRONG: this returns nanoseconds from .value but may be offset
# start_ns = pd.Timestamp.utcnow().value ← do not use

query = f'''
SELECT audio_rms, audio_zcr, imu_x, imu_y, imu_z,
mag_x, mag_y, mag_z, pressure, proximity
FROM sensors
WHERE time >= {start_ns} AND time <= {end_ns}
ORDER BY time ASC
'''
result = client.query(query)
points = list(result.get_points())


Real-time query for inference

The prediction process uses a different pattern, it always wants the latest 10 points regardless of absolute time, then reorders them chronologically for correct delta calculations:

query = '''
SELECT audio_rms, audio_zcr, imu_x, imu_y, imu_z,
mag_x, mag_y, mag_z, pressure, proximity
FROM sensors
ORDER BY time DESC LIMIT 10
'''
result = client.query(query)
points = list(result.get_points())

# Reorder ASC — critical for delta features (audio_rms_delta, pressure_grad)
df = pd.DataFrame(points).iloc[::-1].reset_index(drop=True)

Without the .iloc[::-1] reorder, delta features compute end-minus-start in reverse ,audio_rms_delta and pressure_grad return inverted values, corrupting those features silently. The model will still run but accuracy degrades.

Useful CLI Commands for Debugging


These commands are useful during development to verify data is flowing correctly:

# Connect to InfluxDB CLI
influx
USE phisualize

# Count points in sensors (should grow at ~10/s)
SELECT count(audio_rms) FROM sensors

# Check latest 5 sensor points
SELECT audio_rms, proximity FROM sensors ORDER BY time DESC LIMIT 5

# Check latest habitat predictions
SELECT label, confidence FROM habitat ORDER BY time DESC LIMIT 5

# Check data rate (should be ~600 points/minute)
SELECT count(audio_rms) FROM sensors
WHERE time > now() - 1m

# Database size on disk
exit
# From SSH shell:
du -sh /var/lib/influxdb/


Service Management

InfluxDB runs as a systemd service. Standard commands apply:

sudo systemctl status influxdb # check if running
sudo systemctl restart influxdb # restart after config change
sudo systemctl stop influxdb # stop
sudo journalctl -u influxdb -f # live logs
InfluxDB starts automatically on boot thanks to 'systemctl enable'. If the UNO Q loses power, the database service resumes on next boot and all historical data on the eMMC is preserved, this is one of the core advantages of the UNO Q over a standard microcontroller.

DASHBOARDS: GRAFANA

hsa_grafana_A_architecture.png
hsa_grafana_B_flux_queries.png
hsa_grafana_C_alerting.png

Two dashboards : raw sensor observability and real-time behavioral intelligence


Grafana connects to InfluxDB running locally on the UNO Q and refreshes every 5 seconds. Two separate dashboards serve two distinct purposes: one for raw sensor monitoring, one for ML output and behavioral interpretation. Both read from the same InfluxDB instance but from different measurements.

Grafana is accessible from any device on the local network at http://[UNO-Q-IP]:3000, no cloud, no external account required. All data stays local on the eMMC.

Data Architecture

Two InfluxDB measurements feed the dashboards:

  1. sensors : written by ml_process_influx.py at ~10 Hz. Contains all raw sensor fields: audio_rms, audio_zcr, audio_band_0 through audio_band_15, imu_x/y/z, mag_x/y/z, pressure, temperature, humidity, proximity.
  2. habitat : written by ml_predict_habitat.py every 500ms. Contains ML output: label (string), confidence (0.0-1.0), and per-class probabilities proba_calme, proba_presence, proba_activite, proba_ambiance.

This separation is intentional, sensors is high-frequency raw data for observability, habitat is low-frequency interpreted data for decision-making.

Dashboard 1 : Phisualize V2 Complete (Raw Sensors)


This dashboard shows what the system sees, every sensor reading in real time. It is the diagnostic and observability layer. During data collection, keep this dashboard open to visually confirm sensors are responding before labeling a session.

Audio Row

🔊 Audio RMS : Gauge (0 to 0.02)

Immediate visual reading of ambient sound level. Silence reads near zero, conversation around 0.005, music around 0.03.

SELECT mean("audio_rms") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval)


🎵 Audio ZCR : Time Series

Zero-crossing rate over time. High for noisy or percussive sounds, low for tonal sounds like music. Complements RMS for audio characterization, two environments can have similar energy levels but very different spectral texture.

🎼 Mel Bands : Time Series (4 representative bands)

Bands 0, 4, 8, and 12 plotted together. Band 0 is the lowest frequency, band 12 is upper mid-range. Music fills all bands relatively evenly. Speech concentrates in the mid bands. Silence keeps all bands near zero. Watching this panel live makes the FFT pipeline tangible.

SELECT mean("audio_band_0") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval) -- Bass
SELECT mean("audio_band_4") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval) -- Low-mid
SELECT mean("audio_band_8") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval) -- Mid
SELECT mean("audio_band_12") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval) -- High-mid


Motion & Magnetic Field Row

📐 IMU X/Y/Z : Time Series

Three accelerometer axes. At rest on a flat surface: X and Y near 0g, Z near 1g (gravity). Physical movement or shock introduces spikes. In practice the IMU proved less discriminative than the magnetometer for human presence, but essential for detecting device displacement.

🧲 Magnetometer X/Y/Z : Time Series

The most revealing panel in the sensor dashboard. At rest, all three axes are stable around the local Earth field vector. When a person moves nearby, or a phone or powered device is brought close, all three axes shift measurably. This is the EMF disturbance detection that became a core ML feature. Mag_norm_var at rest: ~2 µT². During human activity: 10-12 µT². Watching this panel live makes the discovery immediately intuitive.

SELECT mean("mag_x") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval)
SELECT mean("mag_y") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval)
SELECT mean("mag_z") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval)


Environment Row

🌡️ Temperature : Gauge (10°C to 35°C)

Current temperature from the HS3003. Slow-moving metric, useful for verifying sensor health and for long-term environmental context.

💧 Humidity : Gauge (0% to 100%)

Relative humidity. The pressure gradient from the companion LPS22HB proved to be the single most important feature in the ML model at 21.5% importance, pressure itself is remarkably stable indoors, which makes any gradient meaningful.

☁️ Pressure : Time Series (hPa)

Atmospheric pressure over time. The trend matters more than the absolute value. The pressure_grad feature (last minus first point in a window) captures slow environmental drifts that correlate with broader activity patterns.

Proximity Row

📍 Proximity : Bar Gauge (0 to 255)

A horizontal bar gauge spanning the full dashboard width. The APDS9960 IR sensor reads ~235 when nothing is detected, and drops to ~115 when a hand is held above the Nano at around 15-20cm. This single panel makes the Presence class immediately visible during data collection, it is the ground truth sensor for the class that was hardest to separate from Calme.

SELECT mean("proximity") FROM "sensors" WHERE $timeFilter GROUP BY time($__interval)

During data collection: before labeling a session as Presence, verify the proximity panel drops below 150. Before labeling Calme, verify audio_rms is near zero and proximity is at baseline ~235. The sensors dashboard is your ground truth.


Dashboard 2 : Habitat Rescue V1.4 (ML Intelligence)


"Note for English readers: The dashboard panels are in French as the system is deployed in my home. 'Calme' = Quiet, 'Présence' = Presence, 'Activité' = Activity."

This dashboard shows what the system concludes. It reads exclusively from the habitat measurement. Where the sensors dashboard shows data, this dashboard shows meaning.

Current State Row

🏠 État Actuel : Table with value mappings

The last predicted label, colored and labeled by value mapping rules:

  1. calme → 😴 CALME (blue)
  2. presence → 🧍 PRÉSENCE (orange)
  3. activite → 🏃 ACTIVITÉ (red)
  4. ambiance → 🎵 AMBIANCE (purple)
SELECT last("label") FROM "habitat" WHERE $timeFilter


🎯 Confiance IA : Gauge (0% to 100%)

The rolling-averaged confidence score. Threshold at 80%: below is red, above is green. Values above 85% indicate a stable unambiguous environment. The gauge turns red when the system is uncertain, typically during transitions between states or in mixed environments.

SELECT last("confidence") FROM "habitat" WHERE $timeFilter


Probability History : Full Width Time Series

📈 Historique des Probabilités

The most informative panel in the habitat dashboard. Four lines show per-class probability over the selected time range. The four probabilities always sum to 100%. Transitions appear as one line rising while others fall. The smoothness reflects the 5-prediction rolling buffer.

Reading this panel is intuitive: Calme near 95% for an extended period = empty quiet room. Sudden rise in Activite with Calme dropping = movement detected. Ambiance rising sharply = music started. This single panel gives a complete behavioral history of the room.

SELECT "proba_calme" * 100 FROM "habitat" WHERE $timeFilter -- alias: Calme
SELECT "proba_presence" * 100 FROM "habitat" WHERE $timeFilter -- alias: Présence
SELECT "proba_activite" * 100 FROM "habitat" WHERE $timeFilter -- alias: Activité
SELECT "proba_ambiance" * 100 FROM "habitat" WHERE $timeFilter -- alias: Ambiance


Application Status Row : Four Stat Panels

Four panels in a row demonstrate how behavioral states map to real-world automation triggers. Each reads the same last label and applies regex mappings:

🌙 Mode Éco —> blue / ÉCO when calme. Heating or lighting could reduce.
💡 Lampe Salon —> yellow / ALLUMÉ when presence. Comfort lighting activates.
🚨 Alarme —> red / ALERTE when activite. Unexpected movement flagged.
🎵 Musique —> purple / PLAY when ambiance. Music or film detected.

All four panels use the same base query:

SELECT last("label") FROM "habitat" WHERE $timeFilter


Regex mapping example (Alarme panel):

Pattern: .*activite.* → 🚨 ALERTE (red background)
Pattern: .* → 🛡️ OK (dark-gray background)

These panels are concept demonstrations. Connecting them to Home Assistant, Node-RED, or any MQTT broker is a straightforward next step, the classified label is already available in InfluxDB as a queryable string field.

Event Log : Journal des Événements

A table showing all non-Calme events in reverse chronological order, with their confidence scores. Calme is excluded, it represents the absence of activity, not an event worth logging. Only Presence, Activite, and Ambiance detections appear. This gives a behavioral diary of the room at a glance.

SELECT "label" AS "Evenement", "confidence" AS "Fiabilite"
FROM "habitat"
WHERE "label" != 'calme' AND $timeFilter
ORDER BY time DESC


Design Decisions


Why two dashboards?

Mixing raw sensor data with ML output creates visual noise. The sensors dashboard is for development and debugging, you use it to understand what the sensors see. The habitat dashboard is the operational view, you use it to understand what the room is doing. Keeping them separate preserves the clarity of purpose for each.

Why InfluxQL raw queries instead of Flux?

InfluxDB 1.x InfluxQL queries are self-explanatory:

SELECT mean(field) FROM measurement WHERE timeFilter.

No transformation pipelines to debug, no unfamiliar syntax for readers new to Grafana. Every query in these dashboards can be read and understood in seconds.

Refresh rate: 5 seconds

The ML inference loop runs every 500ms, but refreshing Grafana faster than 5 seconds adds unnecessary query load on the UNO Q. With a ~4-5 second end-to-end detection latency, a 5-second refresh shows state changes within one or two cycles, fast enough to feel responsive, light enough to preserve system stability.

eMMC persistence across reboots

InfluxDB stores its data on the UNO Q's 32GB eMMC. All sensor and habitat history survives a reboot. After restarting the system, Grafana immediately shows the complete history going back days. This transforms the system from a real-time detector into a long-term behavioral monitoring platform, without any manual backup or data management.

CONCLUSION

hsa_extended_architecture.png
hsa_security_alarm.png
hsa_future_work-2.png

Beyond Classification: Real-World Applications


Because the system outputs interpreted behavioral states rather than raw sensor values, it can serve as a contextual trigger layer for home automation platforms such as Home Assistant.

Possible applications, not implemented here, but straightforward to add:

  1. Eco mode: Reduce heating or lighting during prolonged Calme state
  2. Comfort mode: Restore ambient settings when Presence is detected
  3. Security mode: Flag unexpected Activite during nighttime Calme periods, a behavioral anomaly, not a motion trigger
  4. Ambiance mode: Synchronize lighting or display profiles when music is detected

The distinction matters: this is not a motion detector. It is a behavioral classifier. It does not react to events, it interprets context.

Challenges and Lessons Learned


  1. Process isolation: Running ML inference in the same process as UART capture caused crashes under sustained load. Separating them into two independent Python processes, with InfluxDB as the bridge, solved the stability problem completely. The system then ran for 3 days without an unplanned stop.
  2. Power sensitivity: The UNO Q triggered a self-protection shutdown early in testing due to a minor board shock combined with a marginal power supply. This is normal protective behavior. Lesson: use quality cables and a stable 5V supply. Once addressed, the issue never recurred.
  3. Timestamp synchronization: InfluxDB's internal clock was offset from the system clock. Queries using now() returned no results. Solution: use timestamp() * 1e9 in Python for accurate nanosecond-epoch alignment.
  4. Session-based train/test split: Random splitting caused data leakage between correlated windows. The correct approach is to split by session, ensuring no session appears in both train and test.
  5. Magnetometer discovery: Its variance under human movement (10-12 µT² vs ~2 µT² at rest) proved more discriminative than IMU data. An unexpected finding that became a core feature.
  6. The barometer proved surprisingly effective. During testing, opening a window caused an immediate pressure shift, suggesting that H.S.A. could easily be adapted for intrusion detection or monitoring home ventilation.

Future Work


  1. Magnetic gesture interface: left/right object placement as directional commands, using the magnetometer X/Y axis differential
  2. Multi-room generalization with transfer learning
  3. Long-term drift adaptation as the environment changes seasonally
  4. Anomaly detection mode: flag behavioral states that fall outside learned patterns
  5. On-device incremental training without restarting the pipeline

Conclusion


A room has a behavioral fingerprint. With structured feature engineering and efficient edge AI, that fingerprint can be recognized, and acted upon.

Running entirely on Arduino hardware, centered around the UNO Q, this project demonstrates that meaningful environmental intelligence does not require cloud infrastructure, specialized AI accelerators, or massive datasets. 655 labeled windows, 14 hand-crafted features, a RandomForest, and a thoughtful architecture.


"Credits & Acknowledgments"

"A big thanks to Gemini AI and its 'Nano Banana' image engine for this beautiful hero shot!"

"Original project documented in French by me, translated into English with the assistance of ClaudeAI to ensure technical clarity for the global community."

Thank you !