Human-Robot Interaction Architecture Design
Human-robot interaction (HRI) architecture defines the structural frameworks, communication protocols, safety layers, and interface modalities that govern how robotic systems receive intent from and return feedback to human operators or collaborators. This page covers the scope of HRI architecture as a professional and engineering discipline, the layered mechanisms through which interaction is mediated, the deployment scenarios that shape architectural decisions, and the boundaries that distinguish one design approach from another. The subject is central to any serious engagement with robotics architecture frameworks and to the regulatory standards that increasingly govern collaborative robot deployment in the United States.
Definition and scope
HRI architecture encompasses the complete set of engineered subsystems — perception, decision logic, communication interfaces, and safety interlocks — that allow a robotic system to detect, interpret, and respond to human presence, instruction, or physical contact. It is distinct from robot control architecture in that it treats the human as an active variable in the feedback loop rather than an environmental obstacle or passive load.
The scope spans three recognized interaction paradigms, classified by physical and operational proximity:
- Remote interaction — human and robot operate in separate physical spaces; communication is mediated entirely through teleoperation consoles, supervisory control dashboards, or network interfaces.
- Proximate interaction — human and robot share a workspace but do not physically contact one another; spatial monitoring and collision-avoidance logic dominate the architecture.
- Contact interaction — physical exchange between human and robot is an intended function, as in collaborative assembly, surgical assistance, or rehabilitation robotics.
ISO/TS 15066:2016, published by the International Organization for Standardization, establishes the technical specification governing collaborative robot (cobot) operations and defines four collaboration modes — safety-rated monitored stop, hand guiding, speed and separation monitoring, and power and force limiting — each with distinct architectural implications. These modes correspond directly to the classification boundaries that HRI architects use to scope system requirements.
The field draws on contributions from NIST's Robotics Systems Group, which has published measurement science work on interaction quality metrics, operator cognitive load, and interface standardization relevant to HRI systems deployed in manufacturing, logistics, and public-sector environments.
How it works
HRI architecture functions as a layered stack in which each layer processes signals from the environment or from adjacent layers and passes structured outputs upward or downward. A representative decomposition includes five discrete layers:
- Sensing and perception layer — Cameras, LiDAR, force-torque sensors, EMG inputs, and proximity detectors generate raw data about human state, position, gesture, or intent. This layer connects directly to sensor fusion architecture pipelines that resolve multi-modal signals into unified human-state estimates.
- Intent interpretation layer — Machine learning classifiers, rule-based parsers, or hybrid models convert fused sensor data into discrete commands or behavioral predictions. Speech recognition, gesture classification, and gaze tracking operate here.
- Decision and arbitration layer — The robot's control logic evaluates interpreted human intent against task objectives, safety constraints, and operational envelopes. This layer interfaces with motion planning architecture and real-time control systems to translate decisions into executable trajectories.
- Safety and compliance layer — Hardware and software interlocks enforce the force limits, speed thresholds, and separation distances mandated by ISO/TS 15066 and ISO 10218-1/2. This layer cannot be overridden by higher-level commands — it operates as a floor, not a preference. Related structural considerations appear in the robot safety architecture reference.
- Feedback and communication layer — Visual displays, haptic actuators, audio alerts, and status indicators relay robot state and intent back to the human operator, closing the interaction loop.
The ROS 2 (Robot Operating System 2) middleware framework, maintained by Open Robotics and standardized for interoperability by the ROS-Industrial Consortium, is the dominant open-source platform for implementing HRI communication buses and topic-based message passing between these layers. Its architecture is detailed in the ROS Robot Operating System architecture reference.
Common scenarios
HRI architecture manifests differently across deployment environments. Three scenarios illustrate the structural variation:
Collaborative manufacturing assembly — A cobot arm operating on a shared workbench uses power-and-force-limiting mode per ISO/TS 15066. The architecture must enforce contact force limits below 150 N on most body regions (per the standard's biomechanical limits) while maintaining task throughput. Speed and separation monitoring zones — typically 300 mm to 1,000 mm from the operator — trigger adaptive velocity scaling through the decision layer. This scenario is common in electronics and automotive sub-assembly lines.
Healthcare and rehabilitation robotics — Surgical assist robots and exoskeletal rehabilitation devices require contact-interaction architectures with sub-millimeter positional accuracy and closed-loop force feedback measured in the 1–10 N range. The FDA regulates these systems under 21 CFR Part 880 for physical medicine devices, imposing design controls and software validation requirements that shape every layer of the HRI stack. The AI integration in robotics architecture page covers how learning-based components are qualified within these regulated contexts.
Public-facing service robots — Robots deployed in retail, hospitality, or healthcare reception environments interact with untrained, non-consenting individuals. These systems require robust multi-modal intent detection — combining proximity sensing, speech processing, and behavioral prediction — without relying on task-specific interfaces. The architecture must handle unexpected contact, erratic movement, and linguistic diversity. Mobile robot architecture frameworks address the navigation dimensions of these deployments.
Decision boundaries
Selecting an HRI architecture variant requires resolving five structural questions that establish hard design boundaries:
Physical contact vs. no-contact — Systems where contact is an intended function require force-controlled actuators, torque sensing at each joint, and a safety layer capable of responding within 50 ms or less. Systems where contact is a failure mode require spatial monitoring and emergency stop chains instead. The actuator control interfaces reference covers the hardware requirements for each branch.
Structured vs. unstructured environments — Factory deployments with fixed operator roles and known task sequences permit deterministic, rule-based intent interpretation. Unstructured public deployments require probabilistic models and higher tolerance for ambiguous inputs, with corresponding increases in processing latency and hardware cost.
Supervised vs. autonomous operation — In supervisory control architectures, a human operator retains override authority at all times; the robot's autonomy is bounded by explicit command grants. In autonomous architectures with human monitoring, the decision layer operates independently except when safety thresholds are breached. This distinction determines the applicable regulatory classification and the depth of software validation required under ANSI/RIA R15.06 and related OSHA general duty clause interpretations (OSHA 29 CFR 1910.212).
Real-time vs. deferred feedback — Applications requiring immediate reaction to human state — surgical robots, exoskeletons, collaborative press-tending — demand hard real-time control loops with cycle times under 1 ms. Supervisory and remote-operation applications can tolerate soft real-time or near-real-time loops. This boundary propagates into middleware selection, processor architecture, and OS configuration. The middleware selection for robotics page addresses these tradeoffs.
Edge vs. cloud computation — Latency-sensitive intent interpretation must run on local processors at the robot or workstation. Heavy model inference for speech or gesture recognition may offload to cloud infrastructure where latency permits. The distribution boundary is set by the interaction paradigm, with contact-interaction systems typically prohibiting cloud dependency for safety-critical paths. Edge computing for robotics and cloud robotics architecture provide the comparative framework for this decision.
HRI architecture design at a professional level requires coordinated expertise across perception engineering, control systems, human factors, and regulatory compliance — disciplines covered within the broader robotics architecture landscape described on the domain index.
References
- ISO/TS 15066:2016 — Robots and Robotic Devices: Collaborative Robots
- ISO 10218-1:2011 — Robots and Robotic Devices: Safety Requirements for Industrial Robots, Part 1
- NIST Robotics Systems Group — Robot Systems Research
- OSHA 29 CFR 1910.212 — General Machine Guarding Standards
- FDA 21 CFR Part 880 — Medical Device Classification: Physical Medicine Devices
- ROS 2 Documentation — Robot Operating System (Open Robotics)
- Association for Advancing Automation (A3) / Robotic Industries Association — Safety Standards