Agent Integration Notes
Agent Integration
Official HA-VLN repository: https://github.com/F1y1113/HA-VLN
This page explains how to connect your own agent to the HA-VLN environment for local method development.
The goal of this page is practical integration, not to force a specific model architecture. The repository agent/ directory is reference material only. What matters is that your agent can run correctly with the HA-VLN task configuration, simulator APIs, and evaluation workflow.
1. Redirect Base Task Config
Redirect BASE_TASK_CONFIG_PATH in your agent configuration to the HA-VLN task config:
BASE_TASK_CONFIG_PATH: path/to/HASimulator/config/HAVLNCE_task.yaml
Also verify the following key switches:
SIMULATOR:
ADD_HUMAN: True
ALLOW_SLIDING: True
HUMAN_COUNTING: False
TASK:
MEASUREMENTS:
- COLLISIONS_DETAIL
- DISTANCE_TO_HUMAN
Important path note:
HAVLNCE_task.yamloften contains relative paths such as../Data/...- these paths are resolved against the process CWD where your agent starts
- if your launcher runs in another directory, switch those paths to absolute paths to avoid silent file-not-found failures
2. Choose the Task Variant You Need
If you are comparing settings or running ablations, switch only BASE_TASK_CONFIG_PATH while keeping the rest of your agent code unchanged:
HAVLNCE_task.yaml: dynamic environment + HA-R2R instructionsHAVLNCE_R2R_task.yaml: dynamic environment + original R2R instructionsVLNCE_task.yaml: static environment + original R2R instructionsVLNCE_HAR2R_task.yaml: static environment + HA-R2R instructions
3. Synchronize with the Dynamic Human Timeline
Consume dynamic-scene clock signals before step() so each agent action runs on the latest physical state.
from habitat.core.env import Env
from HASimulator.environments import HAVLNCE
class HAVLNWrapper(Env):
def __init__(self, config, dataset=None):
super().__init__(config, dataset)
self.use_dynamic_human = getattr(self._config.TASK_CONFIG.SIMULATOR, "ADD_HUMAN", False)
if self.use_dynamic_human:
self.havlnce_tool = HAVLNCE(self._config.TASK_CONFIG, self._sim)
self.havlnce_tool._reset_signal_queue_and_counters()
def reset(self):
if self.use_dynamic_human:
self.havlnce_tool.reset()
return super().reset()
def step(self, action):
if self.use_dynamic_human:
self.havlnce_tool._handle_signals()
return super().step(action)
Why this matters:
- HA-VLN advances human motions using a child-thread clock
- without
_handle_signals()before each step, policy actions may be evaluated on stale geometry or stale NavMesh
4. Use the Provided Text Resources First
Before building custom text preprocessing, check the text resources already bundled with HA-R2R:
- the main dataset files such as
Data/HA-R2R/train/train.json.gzincludeinstruction_vocab.word_list - the public splits also provide pre-tokenized BERT-format files such as
train_bertidx.json.gz,val_seen_bertidx.json.gz, andval_unseen_bertidx.json.gz
Recommended guidance:
- if your method uses a classic word-level vocabulary, first try the vocabulary bundled in the HA-R2R dataset files
- if your method uses a BERT-style text encoder, first try the provided
*_bertidx.json.gzfiles - only build or extend your own text pipeline if your method truly requires it
5. Integrate Human-Aware Metrics in Eval
After each episode ends, you can inject TCR, CR, and strict SR and strip high-dimensional fields before aggregation:
from HASimulator.metric import Calculate_Metric
metric_calc = Calculate_Metric(split="val_unseen")
metric_calc(stats_episodes[ep_id], ep_id)
for key in ["distance_to_human", "collisions_detail"]:
if key in stats_episodes[ep_id]:
stats_info.setdefault(ep_id, {})[key] = stats_episodes[ep_id].pop(key)
Metric semantics:
TCR: net new trajectory collisions after excluding the pre-computed unavoidable collision componentCR: collision flag for the episodeSR: strict success rate under the extra conditionTCR == 0
6. Optional Human Counting and Visualization Hook
If HUMAN_COUNTING: True, ensure GroundingDINO is installed and the weight path is configured correctly in detector.py before evaluation.
You can also stitch observation frames and instruction text inside the evaluation loop for debugging videos.
from copy import deepcopy
from habitat_extensions.utils import observations_to_image, generate_video
from habitat.utils.visualizations.utils import append_text_to_image
rgb_frames = [[] for _ in range(envs.num_envs)]
for i in range(envs.num_envs):
if len(config.VIDEO_OPTION) > 0:
if config.TASK_CONFIG.SIMULATOR.HUMAN_COUNTING:
observations_ = deepcopy(observations)
observations_[i]["rgb"] = detected_img[i]
frame = observations_to_image(observations_[i], infos[i])
else:
frame = observations_to_image(observations[i], infos[i])
frame = append_text_to_image(frame, current_episodes[i].instruction.instruction_text)
rgb_frames[i].append(frame)
if dones[i] and len(config.VIDEO_OPTION) > 0:
ep_id = current_episodes[i].episode_id
generate_video(
video_option=config.VIDEO_OPTION,
video_dir=config.VIDEO_DIR,
images=rgb_frames[i],
episode_id=ep_id,
checkpoint_idx=checkpoint_index,
metrics={"spl": stats_episodes[ep_id]["spl"]},
tb_writer=writer,
)
rgb_frames[i] = []
7. API Cross-Reference
Use the API pages together with this integration guide:
- human distance, angle, and counting hooks:
api/human_state.md - clock sync and forced frame controls:
api/scene_updates.md - collision detail and strict human-aware metrics:
api/collision_checks.md