Habitat Signature Analyzer (H.S.A)- a Multi-Sensor Edge AI System Running Entirely on Arduino Hardware Uno-Q/nano 33BLE Sense Rev2
by shredermann in Circuits > Arduino
233 Views, 3 Favorites, 0 Comments
Habitat Signature Analyzer (H.S.A)- a Multi-Sensor Edge AI System Running Entirely on Arduino Hardware Uno-Q/nano 33BLE Sense Rev2
Every room has a behavioral pattern. Not just temperature changes or motion spikes, but micro-vibrations, acoustic textures, magnetic disturbances, and environmental rhythms.
Instead of detecting isolated events, this project learns to classify the behavioral state of a space. Everything runs locally. No cloud. Just structured signals, engineered features, and interpretable machine learning.
Supplies
The Hardware Foundation
This project is built around two complementary boards: the Arduino Nano 33 BLE Sense Rev2 and the Arduino UNO Q. They play very different roles.
Arduino Nano 33 BLE Sense Rev2 : The Sensing Core
The Nano captures the physical world. Its 7 integrated sensors cover every relevant environmental dimension:
- IMU (BMI270) : micro-vibrations and movement
- Microphone (MP34DT05) : audio energy and spectral features via FFT
- Magnetometer (BMM150) : magnetic field vector (more on this later)
- Proximity sensor (APDS9960) : IR proximity 0-255
- Pressure (LPS22HB)
- Temperature & Humidity (HS3003)
The Nano does not decide. It observes, processes, and transmits.
Audio is processed onboard using a double-windowed FFT pipeline: two 128-point FFTs with 50% overlap, Hann-windowed, producing 16 mel-scale frequency bands averaged across both frames, plus RMS and ZCR. All other sensor data is transmitted raw. The resulting 14 engineered features are then assembled into a 118 byte binary packet sent to the UNO Q at 10 Hz.
Sensors as More Than Data Sources
During development, two sensors revealed capabilities beyond their primary function, opening possibilities that were not part of the original design.
The Magnetometer Discovery
The BMM150 magnetometer was initially included as a directional sensor. What emerged during testing was unexpected: it reacts measurably and consistently to nearby metallic or electronic objects.
This suggests a future interaction model: placing a phone to the left or right of the device produces a distinct magnetic signature on the X/Y axis. No buttons, no additional hardware — a magnetic gesture interface built from a standard compass sensor. This has not been implemented in this version, but the data clearly supports it as a next step.
The Proximity Sensor as a Presence Gate
The APDS9960 proximity sensor outputs a raw IR value from 255 (nothing) to 0 (very close). In the final model, its mean and max over a 10-point window became two of the most discriminative features, directly solving the ambiguity between a quiet empty room and a quiet occupied one. A hand above the device drops the proximity reading from ~235 to ~115, a difference no other sensor captured reliably. It also can do gesture interface ...(changing visualize mode for exemple)
The Barometer is a hidden gem in this setup. During my real-world tests, I noticed it reacts instantly to air pressure drops when a window is opened, especially with wind. It adds a 'security' layer: the system doesn't just hear the room, it feels the air flow.
Arduino UNO Q — The Edge Intelligence Layer
The UNO Q is the system's brain. It runs Python, InfluxDB, Scikit-learn, and Grafana, all locally, with 4GB RAM and 32GB eMMC persistent storage. Unlike typical microcontrollers, the UNO Q preserves all historical data across reboots, enabling long-term behavioral monitoring rather than just real-time reaction.
How the Two Boards Communicate
Tx nano -------> Rx Uno-Q: simple wire connexion
Feature windows are transmitted via UART at 921600 baud, a deliberately high baud rate that ensures stable, low-latency streaming of 118 byte binary packets at 10 Hz without congestion or buffer overflow.
The pipeline is clean: Nano handles feature extraction, UNO Q handles storage, inference, and visualization. A transparent STM32 passthrough on the UNO Q routes bytes between the two without any processing overhead.
The Nano does not simply forward sensor data. Audio processing happens entirely onboard: two 128-point FFTs with 50% overlap, Hann-windowed, produce 16 mel-scale frequency bands averaged across both frames, plus RMS and ZCR. This means the UNO Q never handles raw audio — it receives interpreted spectral features.
All other sensors (IMU, magnetometer, pressure, temperature, humidity, proximity) transmit raw values. The full packet is 118 bytes at 10 Hz.
From there, the UNO Q computes 14 engineered features over a sliding window of 10 data points.
Ground Truth and Labeling
Behavioral states were defined under controlled conditions:
- Calme : 60 seconds immobile, near-total silence. Audio RMS ~0.002, proximity at baseline ~235.
- Presence: Hand held above the Nano with ambient sound nearby. Proximity drops to ~115-125.
- Activite : Intentional movement near the device. Audio RMS ~0.007-0.008, magnetometer variance spikes to 10-12 µT².
- Ambiance: Music or film playing. Audio RMS ~0.033-0.035, sustained and consistent.
For this demonstration, 3 sessions of 60 seconds per class (655 windows total) were sufficient to achieve 97.2% accuracy in a controlled domestic environment. In practice, a more robust model would benefit from 10+ sessions per class, collected across different times of day, varying ambient conditions, and multiple environments. The pipeline is designed to make additional collection straightforward, simply run ml_collect_habitat.py and retrain.
Sensor Fusion and Model Training
Early models with 12 features struggled to separate Calme from Presence, both share low audio and stable IMU. Adding proximity_mean and proximity_max as features 13 and 14 eliminated the ambiguity completely.
Model: RandomForestClassifier (200 trees, class_weight='balanced').
Why not deep learning? Because the goal of this tutorial is to understand every step of the AI pipeline, from raw sensor data to a live prediction on screen. Deep learning models are powerful but opaque: you feed data in, a prediction comes out, and what happens in between is difficult to explain or debug.
RandomForest is different. You can inspect feature importance, trace exactly why a classification was made, and understand what the model learned. With 655 samples and 14 tabular features, it is also the correct engineering choice, but more importantly, it is the right pedagogical choice. If you understand this pipeline, you can adapt it, improve it, and apply it to any sensor-based classification problem.
Train/test split was done by session, not randomly. The last session of each class was held out for testing, preventing data leakage between correlated windows. GroupKFold cross-validation confirmed generalization.
Feature importance (top 5):
- pressure_mean 21.5%,
- audio_rms_mean 20.9%,
- audio_rms_var 12.9%,
- proximity_mean 12.7%,
- proximity_max 9.4%.
Real-Time Edge Inference on the UNO Q
The trained model is exported as a .pkl file and loaded on the UNO Q. Every 500ms, the inference loop queries the last 10 data points from InfluxDB, computes 14 features, runs predict_proba(), and averages the result over a rolling buffer of 5 predictions.
This rolling average is the key to stability. Individual predictions can show 30-50% confidence due to transient sensor noise, but the 5 prediction buffer consistently converges to the correct label. In practice, zero misclassifications were observed during live testing.
Performance on the UNO Q:
inference time under 10ms,
RAM: 1.11/3.58GB CPU: 39% usage
Detection latency to dashboard update: about 4-5 seconds (inference loop + buffer stabilization + Grafana 5s refresh cycle).
All inference runs locally. No external API. No cloud dependency. The UNO Q operates as a compact industrial edge AI node.
Dashboard and Monitoring
Two Grafana dashboards provide full system visibility.
The sensor dashboard shows real-time audio RMS, mel band spectrum, IMU axes, magnetometer XYZ, temperature, humidity, pressure trend, and proximity.
The habitat dashboard shows the current classified state, confidence gauge, per-class probability history, and an event log with timestamps.
Data persists on the eMMC across reboots. This means the behavioral history of a room accumulates over days and weeks, a foundation for long-term environmental analysis.
The CODE
A complete walkthrough of every pipeline stage, from sensor to live classification
This step walks through the code behind each pipeline stage. Rather than listing every function, we focus on the design decisions that make this system work, and why each choice was made. The full source code is available in the project files.
You can find all the code here " HSA"
https://github.com/shredermann-hash/Habitat-Signature-Analyzer/tree/instructable
How the System is Deployed
The Arduino UNO Q runs two environments simultaneously, and understanding this is essential before diving into the code.
App Lab - The STM32 firmware
App Lab is the UNO Q's embedded development environment. It manages the STM32 microcontroller, which handles two roles: driving the onboard LED matrix and acting as a transparent UART passthrough between the Nano and the Linux SoC. The STM32 firmware (unoQstm32.cpp) is uploaded via App Lab. A minimal Python stub (unoQmain.py) runs alongside it to keep the App Lab runtime alive:
SSH - The Python intelligence layer
Everything else runs over SSH on the Qualcomm SoC Linux environment: the capture daemon, InfluxDB writes, ML inference, and Grafana. Three Python processes are launched by a single shell script (start_phisualize.sh) using setsid so they survive terminal closure:
The STM32 and the Python layer communicate through the same physical UART at 921600 baud — bytes flow transparently in both directions. The STM32 forwards sensor packets from the Nano to Linux, and forwards LED commands from Linux to the matrix.
Pipeline 1 - The Nano Firmware (nano.cpp)
I used the same classes .h/.cpp from each sensor from the Nano 33BLE sense rev2 I did in my last instructable "Phisualize it!"
The binary packet
All sensor data is packed into a single 118-byte struct and sent over UART once per audio frame (~10 Hz). Three design choices here matter:
- __attribute__((packed)) eliminates compiler padding. Without it, the struct would be larger and the receiver would misread field positions.
- The 0xAA 0xBB header acts as a synchronization marker. If bytes are lost in transit, the receiver scans for this pattern to re-lock onto the frame boundary.
- packet_id is a rolling 16-bit counter. A gap in the sequence on the receiver side means packets were dropped, not corrupted.
The double-windowed FFT audio pipeline
This is the most technically complex part of the firmware. Raw PCM audio is transformed into 16 mel-scale frequency bands onboard, the Linux side never sees a single audio sample, only interpreted spectral features.
A single FFT per 96-sample PDM buffer would miss spectral detail between frames. Two FFTs with 50% overlap give a more stable estimate. The Hann window prevents spectral leakage, without it, the sharp frame edges would introduce artificial frequency content:
The 64 FFT bins are grouped into 16 mel-scale bands. Lower frequencies get narrower bands (more detail), higher frequencies get wider bands, matching how human hearing perceives sound:
Pipeline 2 - The STM32 (unoQstm32.cpp)
The STM32 has two responsibilities: transparent passthrough and LED matrix rendering. The passthrough is intentionally simple, any complexity here would add latency and potential failure points:
The LED matrix renders habitat states as bar patterns.
Calme = 3 columns lit,
Presence = 6,
Activite = 9,
Ambiance = all 13.
A XOR checksum validates each packet before rendering, corrupted packets are silently discarded.
Pipeline 3 - The Capture Daemon (capture_daemon.py)
The capture daemon runs over SSH on the Linux SoC. It reads the UART byte stream, reconstructs packets, and writes them to shared memory for the ML process to consume. It never touches InfluxDB directly, that is the ML process's responsibility.
Frame synchronization
The byte stream may contain partial packets, especially on startup. The daemon scans for the 0xAA 0xBB header to re-lock onto frame boundaries:
Loss detection
The packet_id field enables real-time loss monitoring, essential for validating UART reliability at 921600 baud:
Shared memory ring buffer
Packets are stored in a 100-slot ring buffer in /dev/shm, shared memory on the Linux filesystem. This avoids disk I/O overhead and allows the ML process to consume data independently at its own pace:
Pipeline 4 - The ML Process (ml_process_influx.py)
The ML process reads from shared memory and decodes each packet into individual sensor fields written to InfluxDB. The struct unpacking must exactly match the Nano's packed struct layout:
Points are batched in groups of 10 before writing to InfluxDB, reducing write overhead from 10 individual writes to 1 batch write per second:
Process isolation is key: capture_daemon and ml_process_influx are two separate Python processes communicating only through shared memory. If the ML process crashes, the capture daemon keeps running and no sensor data is lost. This architecture is what enabled 3+ days of continuous uptime.
Pipeline 5 - Feature Engineering (ml_collect_habitat.py)
InfluxDB stores one data point per packet at ~10 Hz. The feature engineering layer takes the last 10 points (approximately 1 second of data) and computes 14 descriptors that summarize the behavioral state of the environment.
A single sensor reading at one instant is meaningless. A distribution of readings over time tells a story.
Two features deserve special attention:
- mag_norm_var is the magnetometer discovery. Human movement perturbs the local magnetic field. At rest: ~2 µT². During activity: 10-12 µT². This was not designed, it emerged from the data.
- proximity_mean and proximity_max solved the Calme vs Presence problem. A hand above the Nano drops the IR reading from ~235 to ~115. No other sensor captured this difference reliably. Adding these two features increased the feature set from 12 to 14 and eliminated all Calme/Presence confusion.
The timestamp fix
An important implementation detail: InfluxDB's internal now() clock was offset from the system clock. Queries using now() returned 0 results. The fix is to use Python's timestamp() multiplied by 1e9 for nanosecond-epoch alignment:
Pipeline 6 - Training Strategy (ml_train_habitat.py)
The data leakage problem
A naive random train/test split is fundamentally wrong for time-series data. Consecutive windows share 9 out of 10 data points, splitting them randomly means the model has seen nearly identical samples during training. The reported accuracy would be inflated and the model would fail in the real world.
The correct approach splits by session, the last session of each class is held out for testing:
GroupKFold cross-validation
The same logic applies to cross-validation. GroupKFold ensures each fold's validation set contains only sessions unseen during training for that fold:
Why RandomForest?
The goal of this tutorial is to understand every step of the AI pipeline. RandomForest is transparent: you can inspect feature importance, trace exactly why a classification was made, and understand what the model learned. With 655 samples and 14 tabular features, it is also the correct engineering choice, but more importantly, it is the right pedagogical choice. Deep learning would be a black box here, not a learning tool.
Pipeline 7 - Real-Time Inference (ml_predict_habitat.py)
The inference loop
Every 500ms, the prediction process queries the last 10 data points from InfluxDB, reorders them chronologically (the query returns DESC), computes 14 features, and asks the model for per-class probabilities:
The rolling buffer
A single prediction can be uncertain due to transient sensor noise. The rolling buffer of 5 predictions smooths these transients. Individual predictions often show 30-50% confidence, the average consistently converges to the correct label. In live testing, zero misclassifications were observed despite the seemingly low per-prediction confidence:
Writing back to InfluxDB
The predicted label, confidence, and per-class probabilities are written to a separate InfluxDB measurement (habitat) every second. Grafana reads from this measurement to drive the dashboard panels:
The LED matrix feedback
When the predicted class changes, a 15-byte command is sent over /dev/ttyHS1 to the STM32, which renders the corresponding bar pattern on the LED matrix. A XOR checksum prevents corrupted commands from displaying garbage:
Detection Latency
The end-to-end latency from a real-world event to a dashboard update is approximately 4-5 seconds. This breaks down as: 500ms inference loop + ~2.5s for the 5-prediction buffer to stabilize + up to 5s for Grafana's refresh cycle. For behavioral classification, distinguishing whether a room is calm, occupied, or active, this latency is entirely acceptable. This system does not react to events. It interprets context.
INSTALLATION
Everything you need to get the system running from scratch on the Arduino UNO Q
This step covers the complete installation process: connecting to the UNO Q, installing Python dependencies, setting up InfluxDB and Grafana, uploading the Nano firmware, and launching the system. The process takes approximately 1-2 hours on first install.
Prerequisites
- Arduino Nano 33 BLE Sense Rev2 with the Phisualize firmware uploaded (covered in the Nano Firmware step)
- Arduino UNO Q powered and connected to your local network via WiFi or USB-C
- Arduino IDE 2.x installed on your development machine for Nano firmware upload
- SSH client on your development machine (Terminal on Mac/Linux, PowerShell or PuTTY on Windows)
- Arduino App Lab installed from the Arduino website for UNO Q STM32 firmware
1 - Connect to the UNO Q via SSH
Find the UNO Q's IP address from your router's DHCP table, or from the Arduino Lab for Boards app. Then connect:
All Python processes in this project run over SSH, not via App Lab. App Lab is used only for the STM32 LED matrix firmware. Keep two terminals open: one for SSH, one for App Lab.
2 - Install Python Dependencies
The UNO Q ships with Python 3.13 but without pip. The first step is to install pip via apt, then install the required packages:
Install pip and base packages via apt
Install remaining packages via pip
Use this flag with caution, running "pip3 install --break-system-packages" bypasses the system package manager's dependency protection. On a stable, dedicated system like the UNO Q running only Phisualize, this is safe. On a general-purpose Linux system, it can create dependency conflicts that are difficult to recover from without a full reinstall.
The --break-system-packages flag is required on the UNO Q to bypass the system package manager's restrictions. This is the correct and expected approach on this platform:
scikit-learn pulls in scipy (32 MB) and joblib automatically. The full download is ~50 MB, allow 2-3 minutes on a typical home connection at the UNO Q's WiFi speed.
Verify all packages
All six should print a version number. If any fail, re-run the corresponding pip3 install command.
3 - Install and Configure InfluxDB
Install InfluxDB 1.x (arm64)
Use the arm64 .deb package, verify your architecture first with
An aarch64 result confirms arm64 is correct.
Create the database
Set retention policy (recommended)
4 - Install and Configure Grafana
Install Grafana
Do NOT use the apt repository method for Grafana on the UNO Q. The Grafana GPG key (B53AE77B...) is not recognized by the sqv signature tool on Debian Trixie, which causes apt update to fail with a signature error. This error blocks App Lab from starting on every subsequent boot, a critical problem.
The correct approach is to install Grafana directly from the .deb package, bypassing the repository entirely:
If you accidentally added the Grafana apt repository before installing, remove it immediately:
Leaving this file in place causes App Lab to fail on every launch with a GPG signature error that blocks the entire environment.
5 - Upload the Nano Firmware
On your development machine (not the UNO Q), open Arduino IDE 2.x:
- Install board support: Boards Manager → search 'Arduino Mbed OS Nano Boards' → Install
- Install libraries: Library Manager → install Arduino_BMI270_BMM150, Arduino_LPS22HB, Arduino_HS300x, Arduino_APDS9960, Arduino_CMSIS-DSP
- Open nano.cpp and all PhiXxx files in the same sketch folder
- Select board: Tools → Board → Arduino Nano 33 BLE
- Upload: verify sizeof(SensorFeatures) prints 118 in Serial Monitor after upload
If size of prints anything other than 118, there is a struct alignment mismatch. The Python capture_daemon will receive malformed packets. Do not proceed until this shows 118.
6 - Upload the STM32 Firmware via App Lab
App Lab manages the UNO Q's STM32 microcontroller separately from the Linux SoC. Open App Lab, connect to the UNO Q, and create a new project with two files:
- unoQstm32.cpp : the STM32 passthrough and LED matrix firmware
- unoQmain.py : the minimal Python stub that keeps the App Lab runtime alive
Upload via App Lab's Run button. The LED matrix should flash white briefly on boot, then go dark. This confirms the STM32 is running.
7 - Deploy the Python Scripts
Copy all Python scripts to the UNO Q via SCP from your development machine:
Then on the UNO Q via SSH:
8 - First Launch
With the Nano connected to the UNO Q via UART and both powered on, launch the system from SSH:
The script launches three processes with setsid, they survive terminal closure. Expected output:
The three processes run independently with setsid. Closing the SSH terminal does NOT stop them. Always use stop_phisualize.sh for a clean shutdown, it also removes the shared memory segment cleanly.
Next Steps
At this point the system is running in production mode, collecting sensor data, writing to InfluxDB, and attempting live classification. However, since no model has been trained yet, ml_predict_habitat.py will fail to load the .pkl file and log an error.
The next step is data collection and training: run ml_collect_habitat.py for each of the four habitat classes, then run ml_train_habitat.py to generate the model. Once habitat_signature_pipeline.pkl exists in ~/ml_models/, restart the system and live classification will begin.
Data Collection and Training
Teaching the system to recognize your environment : from labeled sessions to a live model
This step is where the system learns. You will collect labeled sensor data for each habitat class, then train a RandomForest classifier on that data. The result is a .pkl model file that the prediction process loads to run live inference. The entire process runs on the UNO Q, no external compute required.
Before starting, make sure the system is running: ./start_phisualize.sh launched, Nano connected, data flowing into InfluxDB. Verify with:
The Four Habitat Classes
The system classifies the environment into four behavioral states. Each must be collected under controlled, reproducible conditions, the quality of your labels directly determines model accuracy.
- 😴 Calme : Empty room, near-total silence. No one nearby, no audio playing, device completely immobile. This is the baseline. Audio RMS ~0.001-0.002, proximity at baseline ~235, magnetometer stable.
- 🧍 Presence : Your hand held 5-10 cm above the Nano, with some ambient sound nearby (phone, low music). The proximity sensor is the key discriminator: it drops from ~235 to ~115-125. Audio slightly elevated but not dominant.
- 🏃 Activite : Active movement near the device. Walking around it, moving objects nearby, gesturing. The magnetometer variance spikes to 10-12 µT² (vs ~2 µT² at rest). Audio RMS rises to ~0.007-0.008.
- 🎵 Ambiance : Music or film playing at normal listening volume. Audio RMS ~0.033-0.035, sustained and consistent. Device and environment otherwise still.
These definitions are specific to your environment. If you replicate this project, define your own classes based on your use case, the pipeline works for any set of behavioral states as long as they produce distinct sensor signatures.
Step 1 - Collect Data (ml_collect_habitat.py)
The collection script queries InfluxDB for sensor data recorded during a timed session, computes 14 features over a sliding window of 10 points, and saves the result as a CSV file in ~/ml_data/.
Run a collection session
Via SSH on the UNO Q, with the system running in a separate terminal:
Collect all four classes
Repeat for each class. Run 3 sessions per class minimum, more is better. Between sessions, take a short break to reset the environment:
A common mistake is collecting all sessions of the same class back-to-back in identical conditions. Try varying the time of day and exact setup slightly between sessions, this makes the model more robust to natural variation.
Step 2 - Train the Model (ml_train_habitat.py)
The training script loads all CSV files from ~/ml_data/, builds the feature matrix, splits by session to avoid data leakage, trains a RandomForest pipeline, and saves the model to ~/ml_models/.
Understanding the CV scores
The GroupKFold cross-validation score of 50.5% on one fold is not a bug, it is expected. GroupKFold splits by session, and with only 3 sessions per class, one fold may contain a session that is more variable than the others. The test accuracy of 97.2% on a held-out session per class is the reliable metric. Cross-validation here is a consistency check, not the primary accuracy measure.
Understanding the confusion matrix
The 6 misclassifications in the original training were all Ambiance predicted as Activite, both classes share elevated audio energy. This is the model's one genuine ambiguity. In practice it resolves within 2-3 inference cycles as the rolling buffer stabilizes.
Confusion Matrix:
See matrix confusion schema.
Saved files
The .pkl file contains the full pipeline: StandardScaler + RandomForestClassifier. Loading it with joblib.load() gives you a pipeline.predict_proba() method that accepts a 14-feature vector and returns class probabilities directly.
Step 3 - Launch Live Inference
Once the model exists, restart the system to activate the prediction process:
The system is now classifying your environment in real time. Open Grafana at http://<UNO-Q-IP>:3000 and watch the Habitat Rescue dashboard — the state panel, confidence gauge, and probability history should all update within 5 seconds.
Retraining and Adaptation
The model is trained on your specific environment at a specific time. If accuracy degrades over days or weeks (seasonal changes, new furniture, different ambient noise floor), simply collect new sessions and retrain. The pipeline is designed for this:
For this tutorial, 3 sessions of 60 seconds per class (655 windows total) achieved 97.2% accuracy in a controlled domestic environment. For a production deployment, 10+ sessions per class collected across different times of day and varying conditions would significantly improve robustness. Each additional real-world session is worth more than any hyperparameter adjustment.
InfluxDB
Time-series persistent storage running locally on the Arduino UNO Q
InfluxDB is the data backbone of Phisualize. Every sensor packet received from the Nano is decoded and written to InfluxDB. The ML prediction process reads from it to build feature windows. The prediction results are written back to it. Grafana reads from it to render the dashboards. Everything flows through this single local database, no cloud, no external dependency.
The version used is InfluxDB 1.6.7, installed manually on the UNO Q Linux environment. Version 1.x uses InfluxQL, a SQL-like query language that is straightforward to read and write, and compatible with the Python influxdb client library used throughout this project.
Why InfluxDB?
Most sensor projects store data in CSV files or SQLite. InfluxDB offers three specific advantages for this use case:
- Time-series native: Every data point is automatically indexed by timestamp. Queries like 'last 10 points' or 'data between two times' are first-class operations, not workarounds.
- Persistent across reboots: Data is stored on the UNO Q's 32GB eMMC. The behavioral history of a room accumulates over days and weeks without any manual intervention. A SQLite file would work too, but InfluxDB handles concurrent writes from multiple processes cleanly.
- Grafana native integration: InfluxDB is one of Grafana's core datasources. No adapter, no middleware just a direct connection with full InfluxQL support in the query editor.
Installation on the UNO Q
InfluxDB 1.6.7 is installed manually via the official Debian/Ubuntu package. On the UNO Q Linux environment, connect via SSH and run:
Use the arm64 variant for the UNO Q's Qualcomm SoC. The amd64 package will fail silently or produce a SIGILL error at runtime.
Create the database
Once InfluxDB is running, create the 'phisualize' database used by all three Python processes:
Configure data retention (optional but recommended)
By default, InfluxDB keeps data forever. On the UNO Q's 32GB eMMC, several months of data at 10 Hz would eventually fill the disk. A retention policy limits storage to a practical window, 30 days is sufficient for behavioral analysis:
If you skip the retention policy, monitor disk usage with 'df -h' periodically. A full eMMC will cause InfluxDB write failures and stop the entire pipeline silently.
Data Model
InfluxDB organizes data into measurements (equivalent to tables). this project uses two measurements in the 'phisualize' database:
Measurement: sensors
Written by ml_process_influx.py at ~10 Hz. One point per decoded packet from the Nano. Contains all raw sensor fields:
Measurement: habitat
Written by ml_predict_habitat.py every ~500ms. Contains the ML output:
InfluxDB 1.x stores string fields differently from numeric fields. The label field is a string, Grafana's value mapping system handles the string-to-display conversion without any additional processing.
Python Client Usage
All three Python processes use the influxdb Python client (v5.x). The connection is identical in all three:
Writing data
Points are written as a list of dicts. The ml_process batches 10 points per write to reduce I/O overhead:
Reading data
The feature engineering and prediction processes query InfluxDB using raw InfluxQL. The critical detail is timestamp alignment, InfluxDB uses nanosecond Unix epoch internally. Python's time objects must be converted correctly:
Real-time query for inference
The prediction process uses a different pattern, it always wants the latest 10 points regardless of absolute time, then reorders them chronologically for correct delta calculations:
Without the .iloc[::-1] reorder, delta features compute end-minus-start in reverse ,audio_rms_delta and pressure_grad return inverted values, corrupting those features silently. The model will still run but accuracy degrades.
Useful CLI Commands for Debugging
These commands are useful during development to verify data is flowing correctly:
Service Management
InfluxDB runs as a systemd service. Standard commands apply:
InfluxDB starts automatically on boot thanks to 'systemctl enable'. If the UNO Q loses power, the database service resumes on next boot and all historical data on the eMMC is preserved, this is one of the core advantages of the UNO Q over a standard microcontroller.
DASHBOARDS: GRAFANA
Two dashboards : raw sensor observability and real-time behavioral intelligence
Grafana connects to InfluxDB running locally on the UNO Q and refreshes every 5 seconds. Two separate dashboards serve two distinct purposes: one for raw sensor monitoring, one for ML output and behavioral interpretation. Both read from the same InfluxDB instance but from different measurements.
Grafana is accessible from any device on the local network at http://[UNO-Q-IP]:3000, no cloud, no external account required. All data stays local on the eMMC.
Data Architecture
Two InfluxDB measurements feed the dashboards:
- sensors : written by ml_process_influx.py at ~10 Hz. Contains all raw sensor fields: audio_rms, audio_zcr, audio_band_0 through audio_band_15, imu_x/y/z, mag_x/y/z, pressure, temperature, humidity, proximity.
- habitat : written by ml_predict_habitat.py every 500ms. Contains ML output: label (string), confidence (0.0-1.0), and per-class probabilities proba_calme, proba_presence, proba_activite, proba_ambiance.
This separation is intentional, sensors is high-frequency raw data for observability, habitat is low-frequency interpreted data for decision-making.
Dashboard 1 : Phisualize V2 Complete (Raw Sensors)
This dashboard shows what the system sees, every sensor reading in real time. It is the diagnostic and observability layer. During data collection, keep this dashboard open to visually confirm sensors are responding before labeling a session.
Audio Row
🔊 Audio RMS : Gauge (0 to 0.02)
Immediate visual reading of ambient sound level. Silence reads near zero, conversation around 0.005, music around 0.03.
🎵 Audio ZCR : Time Series
Zero-crossing rate over time. High for noisy or percussive sounds, low for tonal sounds like music. Complements RMS for audio characterization, two environments can have similar energy levels but very different spectral texture.
🎼 Mel Bands : Time Series (4 representative bands)
Bands 0, 4, 8, and 12 plotted together. Band 0 is the lowest frequency, band 12 is upper mid-range. Music fills all bands relatively evenly. Speech concentrates in the mid bands. Silence keeps all bands near zero. Watching this panel live makes the FFT pipeline tangible.
Motion & Magnetic Field Row
📐 IMU X/Y/Z : Time Series
Three accelerometer axes. At rest on a flat surface: X and Y near 0g, Z near 1g (gravity). Physical movement or shock introduces spikes. In practice the IMU proved less discriminative than the magnetometer for human presence, but essential for detecting device displacement.
🧲 Magnetometer X/Y/Z : Time Series
The most revealing panel in the sensor dashboard. At rest, all three axes are stable around the local Earth field vector. When a person moves nearby, or a phone or powered device is brought close, all three axes shift measurably. This is the EMF disturbance detection that became a core ML feature. Mag_norm_var at rest: ~2 µT². During human activity: 10-12 µT². Watching this panel live makes the discovery immediately intuitive.
Environment Row
🌡️ Temperature : Gauge (10°C to 35°C)
Current temperature from the HS3003. Slow-moving metric, useful for verifying sensor health and for long-term environmental context.
💧 Humidity : Gauge (0% to 100%)
Relative humidity. The pressure gradient from the companion LPS22HB proved to be the single most important feature in the ML model at 21.5% importance, pressure itself is remarkably stable indoors, which makes any gradient meaningful.
☁️ Pressure : Time Series (hPa)
Atmospheric pressure over time. The trend matters more than the absolute value. The pressure_grad feature (last minus first point in a window) captures slow environmental drifts that correlate with broader activity patterns.
Proximity Row
📍 Proximity : Bar Gauge (0 to 255)
A horizontal bar gauge spanning the full dashboard width. The APDS9960 IR sensor reads ~235 when nothing is detected, and drops to ~115 when a hand is held above the Nano at around 15-20cm. This single panel makes the Presence class immediately visible during data collection, it is the ground truth sensor for the class that was hardest to separate from Calme.
During data collection: before labeling a session as Presence, verify the proximity panel drops below 150. Before labeling Calme, verify audio_rms is near zero and proximity is at baseline ~235. The sensors dashboard is your ground truth.
Dashboard 2 : Habitat Rescue V1.4 (ML Intelligence)
"Note for English readers: The dashboard panels are in French as the system is deployed in my home. 'Calme' = Quiet, 'Présence' = Presence, 'Activité' = Activity."
This dashboard shows what the system concludes. It reads exclusively from the habitat measurement. Where the sensors dashboard shows data, this dashboard shows meaning.
Current State Row
🏠 État Actuel : Table with value mappings
The last predicted label, colored and labeled by value mapping rules:
- calme → 😴 CALME (blue)
- presence → 🧍 PRÉSENCE (orange)
- activite → 🏃 ACTIVITÉ (red)
- ambiance → 🎵 AMBIANCE (purple)
🎯 Confiance IA : Gauge (0% to 100%)
The rolling-averaged confidence score. Threshold at 80%: below is red, above is green. Values above 85% indicate a stable unambiguous environment. The gauge turns red when the system is uncertain, typically during transitions between states or in mixed environments.
Probability History : Full Width Time Series
📈 Historique des Probabilités
The most informative panel in the habitat dashboard. Four lines show per-class probability over the selected time range. The four probabilities always sum to 100%. Transitions appear as one line rising while others fall. The smoothness reflects the 5-prediction rolling buffer.
Reading this panel is intuitive: Calme near 95% for an extended period = empty quiet room. Sudden rise in Activite with Calme dropping = movement detected. Ambiance rising sharply = music started. This single panel gives a complete behavioral history of the room.
Application Status Row : Four Stat Panels
Four panels in a row demonstrate how behavioral states map to real-world automation triggers. Each reads the same last label and applies regex mappings:
🌙 Mode Éco —> blue / ÉCO when calme. Heating or lighting could reduce.
💡 Lampe Salon —> yellow / ALLUMÉ when presence. Comfort lighting activates.
🚨 Alarme —> red / ALERTE when activite. Unexpected movement flagged.
🎵 Musique —> purple / PLAY when ambiance. Music or film detected.
All four panels use the same base query:
Regex mapping example (Alarme panel):
These panels are concept demonstrations. Connecting them to Home Assistant, Node-RED, or any MQTT broker is a straightforward next step, the classified label is already available in InfluxDB as a queryable string field.
Event Log : Journal des Événements
A table showing all non-Calme events in reverse chronological order, with their confidence scores. Calme is excluded, it represents the absence of activity, not an event worth logging. Only Presence, Activite, and Ambiance detections appear. This gives a behavioral diary of the room at a glance.
Design Decisions
Why two dashboards?
Mixing raw sensor data with ML output creates visual noise. The sensors dashboard is for development and debugging, you use it to understand what the sensors see. The habitat dashboard is the operational view, you use it to understand what the room is doing. Keeping them separate preserves the clarity of purpose for each.
Why InfluxQL raw queries instead of Flux?
InfluxDB 1.x InfluxQL queries are self-explanatory:
No transformation pipelines to debug, no unfamiliar syntax for readers new to Grafana. Every query in these dashboards can be read and understood in seconds.
Refresh rate: 5 seconds
The ML inference loop runs every 500ms, but refreshing Grafana faster than 5 seconds adds unnecessary query load on the UNO Q. With a ~4-5 second end-to-end detection latency, a 5-second refresh shows state changes within one or two cycles, fast enough to feel responsive, light enough to preserve system stability.
eMMC persistence across reboots
InfluxDB stores its data on the UNO Q's 32GB eMMC. All sensor and habitat history survives a reboot. After restarting the system, Grafana immediately shows the complete history going back days. This transforms the system from a real-time detector into a long-term behavioral monitoring platform, without any manual backup or data management.
CONCLUSION
Beyond Classification: Real-World Applications
Because the system outputs interpreted behavioral states rather than raw sensor values, it can serve as a contextual trigger layer for home automation platforms such as Home Assistant.
Possible applications, not implemented here, but straightforward to add:
- Eco mode: Reduce heating or lighting during prolonged Calme state
- Comfort mode: Restore ambient settings when Presence is detected
- Security mode: Flag unexpected Activite during nighttime Calme periods, a behavioral anomaly, not a motion trigger
- Ambiance mode: Synchronize lighting or display profiles when music is detected
The distinction matters: this is not a motion detector. It is a behavioral classifier. It does not react to events, it interprets context.
Challenges and Lessons Learned
- Process isolation: Running ML inference in the same process as UART capture caused crashes under sustained load. Separating them into two independent Python processes, with InfluxDB as the bridge, solved the stability problem completely. The system then ran for 3 days without an unplanned stop.
- Power sensitivity: The UNO Q triggered a self-protection shutdown early in testing due to a minor board shock combined with a marginal power supply. This is normal protective behavior. Lesson: use quality cables and a stable 5V supply. Once addressed, the issue never recurred.
- Timestamp synchronization: InfluxDB's internal clock was offset from the system clock. Queries using now() returned no results. Solution: use timestamp() * 1e9 in Python for accurate nanosecond-epoch alignment.
- Session-based train/test split: Random splitting caused data leakage between correlated windows. The correct approach is to split by session, ensuring no session appears in both train and test.
- Magnetometer discovery: Its variance under human movement (10-12 µT² vs ~2 µT² at rest) proved more discriminative than IMU data. An unexpected finding that became a core feature.
- The barometer proved surprisingly effective. During testing, opening a window caused an immediate pressure shift, suggesting that H.S.A. could easily be adapted for intrusion detection or monitoring home ventilation.
Future Work
- Magnetic gesture interface: left/right object placement as directional commands, using the magnetometer X/Y axis differential
- Multi-room generalization with transfer learning
- Long-term drift adaptation as the environment changes seasonally
- Anomaly detection mode: flag behavioral states that fall outside learned patterns
- On-device incremental training without restarting the pipeline
Conclusion
A room has a behavioral fingerprint. With structured feature engineering and efficient edge AI, that fingerprint can be recognized, and acted upon.
Running entirely on Arduino hardware, centered around the UNO Q, this project demonstrates that meaningful environmental intelligence does not require cloud infrastructure, specialized AI accelerators, or massive datasets. 655 labeled windows, 14 hand-crafted features, a RandomForest, and a thoughtful architecture.
"Credits & Acknowledgments"
"A big thanks to Gemini AI and its 'Nano Banana' image engine for this beautiful hero shot!"
"Original project documented in French by me, translated into English with the assistance of ClaudeAI to ensure technical clarity for the global community."
Thank you !