Feb 2026 · Research Reflection

Perception is becoming the real interface layer

Future systems will rely less on explicit commands and more on fused estimates built from physiology, behaviour, light, thermal exposure, and environmental context.

Research question. How does perception change when a system has to read people and environments together rather than classify a single modality?

Interfaces are changing. For many intelligent systems, the next interface may not be a menu, a typed prompt, or even a voice command. It may be an ongoing perceptual loop in which the system estimates state continuously from multiple sensors and adapts before the user has to ask. That framing shifts perception from a front-end concern to a systems problem.

A useful abstraction is an observation model:

Observation Model
\[z_t^{(m)} = h_m(x_t) + \varepsilon_t^{(m)}, \qquad \hat{x}_t = f(z_{1:T}^{(1)}, \ldots, z_{1:T}^{(M)})\]

Each modality m produces a noisy observation z_t^(m) of an unobserved state x_t. The hard part is not only sensing but fusing asynchronous, incomplete, and differently calibrated measurements. Multimodal machine learning has treated representation and alignment as central challenges for this reason [1]. In my own direction, this is not abstract. It appears concretely in datasets and studies where thermal variables, light conditions, and human physiological responses need to be interpreted together rather than as parallel measurements.

Why richer perception matters

Vision-only interfaces are powerful, but they do not exhaust what a system can know. Behavioural observations can tell a system what happened; physiological and environmental signals can help explain why it happened and whether the same visible behaviour means the same internal state. In human settings that distinction matters. Attention, overload, discomfort, and uncertainty often present weakly in any single channel.

The CLTR dataset is close to the kind of perception stack I think will matter more in future systems: chrono-biological context, light exposure, thermal variables, and physiological response are all part of the sensing problem, not secondary annotations. Likewise, the PPD work shows why perception models must account for the fact that subjective thermal perception and physiological response can diverge under controlled indoor exposure. A perception layer that only tracks one of them will miss part of the human state.

Wearable sensing is making this technically plausible. Recent reviews emphasize that skin-interfaced and continuously worn sensors are increasing the quality and breadth of physiological measurements available outside traditional laboratory settings [2, 3]. That does not remove the research problem. It deepens it, because more sensors mean more synchronization, more context dependence, and more opportunities for artifact leakage.

Perception is a calibration problem

Once a system moves beyond a single modality, perception becomes a calibration problem. Temperature influences electrodermal measures. Motion influences wearables. Task structure influences EEG interpretation. Even thermoregulatory state estimation from non-invasive sensors depends on context and modelling assumptions [4]. In other words, perception quality depends not only on sensor availability but on how carefully the sensing stack is tied to environment and timing.

A useful engineering requirement is alignment in both time and context. If \(\Delta t\) is acquisition lag between streams and \(c_t\) is contextual mismatch, then perceptual error can be thought of as increasing with both terms:

Alignment Burden
\[E_{perc} \propto \alpha |\Delta t| + \beta c_t\]

This is exactly why protocol-aware multimodal studies matter. They define the conditions under which fusion is meaningful. For me, perception is becoming the real interface layer because future systems will need to infer human state across light, thermal, physiological, and behavioural dimensions rather than reading one stream in isolation.

References

  1. Baltrušaitis T, Ahuja C, Morency LP. Multimodal Machine Learning: A Survey and Taxonomy. IEEE TPAMI, 2018.
  2. Stuart T, Hanna J, Gutruf P. Wearable devices for continuous monitoring of biosignals: Challenges and opportunities. APL Bioengineering, 2022.
  3. Cherian J et al. Wearable Sensing for Clinical Physiology Monitoring: Emerging Paradigms. Physiology, 2025.
  4. Buller MJ et al. Human thermoregulatory system state estimation using non-invasive physiological sensors. EMBC, 2011.
  5. El Kounni A, Tomar P, Vellei M, Le Dreau J, Pisello AL, Inard C, Ramallo Gonzalez A. Chrono-Light Thermophysiology Response (CLTR) dataset. Zenodo, 2026.
  6. Tomar P, Pisello AL. Physiological-perceptual divergence in human thermal adaptation. Building and Environment, 2026.