Skip to content

[costmap_filters] Add ZoneParameterFilter — config-driven per-zone parameter overrides#6104

Open
aki1770-del wants to merge 15 commits intoros-navigation:mainfrom
aki1770-del:feat/zone-parameter-filter
Open

[costmap_filters] Add ZoneParameterFilter — config-driven per-zone parameter overrides#6104
aki1770-del wants to merge 15 commits intoros-navigation:mainfrom
aki1770-del:feat/zone-parameter-filter

Conversation

@aki1770-del
Copy link
Copy Markdown

@aki1770-del aki1770-del commented Apr 25, 2026

Basic Info

Info Please fill out this column
Ticket(s) this addresses Closes #6080
Primary OS tested on Ubuntu 24.04 (Rolling, Docker ros:rolling base)
Robotic platform tested on Simulation / unit tests only
Does this PR contain AI generated software? Yes — AI-assisted, with the human author (aki1770-del) reviewing every change before commit.
Was this PR description generated by AI software? No

Description of contribution in a few bullet points

  • Adds a new costmap filter, ZoneParameterFilter, that mutates a configurable set of nav2 parameters on configurable target nodes based on the mask value at the robot's pose. Closes Feature: Add SurfaceConditionFilter costmap filter plugin #6080.
  • Generalizes the original SurfaceConditionFilter proposal per Steve's reframe in the issue thread: one filter, N parameters per state, no use-case in the C++.
  • State 0 is the special reset state. Each non-zero state value (1–255 admitted; effective range bounded by the int8_t mask transport) maps to parameter overrides via YAML.
  • Two configurable knobs: on_param_set_failure (warn/throw), and an optional state_event_topic that publishes a UInt8 on every state transition.
  • All parameter sets are issued asynchronously via AsyncParametersClient; the hot path (process()) never calls wait_for_service. Pending futures are drained at the start of the next process() call.

Description of documentation updates required from your changes

  • New parameters introduced (state_ids, target_nodes, state_<N>.<node>.<param>, nominal_defaults.<node>.<param>, state_event_topic, on_param_set_failure) need a configuration page on docs.nav2.org. Companion PR opened at docs(costmap-filters): add Zone Parameter Filter configuration page docs.nav2.org#907.
  • Plugin should be added to the costmap_filters plugin page index.
  • No migration guide entry needed — additive new plugin, no breaking changes.

Description of how this change was tested

  • colcon test --packages-select nav2_costmap_2d --ctest-args -R zone_parameter_filter on rolling, in a Docker container built from the upstream Dockerfile.
  • 13 gtest cases pass (1.4 s total): 4 architectural sanity checks plus 9 lifecycle cases. The lifecycle cases include explicit regression guards for two specific items I noticed when reading PR [binary filter] change boolean ros params #3796 (the closest prior art): a non-blocking service_is_ready() check in the hot path, and the lifetime of pending futures across process() calls.
  • Local ament_uncrustify, ament_cpplint, and pre-commit all clean before push. CI status visible on this PR.
# Example YAML (the plugin in use)
local_costmap:
  ros__parameters:
    plugins: ["voxel_layer", "inflation_layer"]
    filters: ["zone_parameter_filter"]
    zone_parameter_filter:
      plugin: "nav2_costmap_2d::ZoneParameterFilter"
      enabled: true
      filter_info_topic: "/zone_filter_info"
      state_event_topic: "/zone_filter_state"
      on_param_set_failure: "warn"
      state_ids: [1, 2]
      target_nodes: [controller_server, local_costmap]
      state_1:
        controller_server:
          FollowPath.max_vel_x: 0.5
        local_costmap:
          inflation_layer.inflation_radius: 0.8
      state_2:
        controller_server:
          FollowPath.max_vel_x: 0.2
        local_costmap:
          inflation_layer.inflation_radius: 1.2
      nominal_defaults:
        controller_server:
          FollowPath.max_vel_x: 1.0
        local_costmap:
          inflation_layer.inflation_radius: 0.55

Notes for review

A few design decisions worth flagging:

  1. State 0 reset uses YAML-declared nominal_defaults, not auto-capture. I tried auto-capture first — fire get_parameters then set_parameters on the same AsyncParametersClient and rely on FIFO. That doesn't actually work because the two calls go through separate underlying services::Client instances, so server-side ordering between them isn't guaranteed and a late get response would capture the overridden value. Declarative nominal_defaults is race-free, deterministic, and matches the "config-driven" preference you stated in Feature: Add SurfaceConditionFilter costmap filter plugin #6080.

  2. Explicit target_nodes list is required. The user lists every node the filter will touch. The state-override parser then does longest-prefix-match against that list, so nested-namespace targets parse unambiguously. Self-documenting and catches typos at config-load.

  3. AsyncParametersClient is per-target-node, not per-parameter. Per your first review on [binary filter] change boolean ros params #3796.

  4. Hot path never calls wait_for_service. Per [binary filter] change boolean ros params #3796 review item 2. There's a unit test (ServiceNotReadyInHotPathDoesNotBlock) that asserts process() returns under 500 ms when the target service is not present, as a regression guard.

Out of scope for this PR

These are reasonable v2 items once this lands:

  • Pre-built mask generation tools (user supplies their own publisher)
  • Per-zone parameter blending across overlapping filters
  • Auto-discovery of which nav2 nodes own which parameters
  • Hot-reload of the state map (config change requires plugin re-init)

Future work that may be required in bullet points

  • If the filter is widely adopted, the nominal_defaults declaration burden could be reduced by a sync get-at-init mode (with documented caveats about target-service availability at configure-time).
  • Add a complementary multiple_states_with_overlapping_params test for the state-N to state-M transition matrix.

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists
  • Should this be backported to current distributions? If so, tag with backport-*.

…~80%)

Slice 1 of nav2#6080 implementation per the source-read milestone-1
artifact at outputs/sprint_131/nav2_6080_source_read.md.

Architecture per Steve Macenski's 2026-04-13 design guidance:
- Mask values 0..255; value 0 = reset to nominal defaults
- Each non-zero state mapped via YAML config to a list of ROS
  parameters (any reachable nav2 parameter on any node)
- No custom enums; raw int + config-driven semantics
- Async parameter setting via per-target-node AsyncParametersClient
  (Steve's preferred pattern from PR ros-navigation#3796 first review)
- Optional state-transition event publication
- Configurable warn|throw policies for unknown-state and
  param-set-failure

PR ros-navigation#3796 burn-pattern guards baked in:
- No wait_for_service in hot path (issue 2): service_is_ready()
  predicate; skip-with-throttled-warn on not-ready
- No empty-string sentinels (issue 4): std::map presence checks
- Future lifetime managed (issue 5): pending_futures_ vector
  drained at top of every process() call; never destruct mid-flight
- Per-node async client (Steve's first review point): one
  AsyncParametersClient per target node, batched parameter sets

Files added:
- nav2_costmap_2d/include/nav2_costmap_2d/costmap_filters/
    zone_parameter_filter.hpp
- nav2_costmap_2d/plugins/costmap_filters/zone_parameter_filter.cpp
- nav2_costmap_2d/test/unit/zone_parameter_filter_test.cpp

Files edited:
- filter_values.hpp: added ZONE_PARAMETER_FILTER = 4 constant
- costmap_plugins.xml: registered new plugin in filters library
- CMakeLists.txt: added zone_parameter_filter.cpp to filters target
- test/unit/CMakeLists.txt: added zone_parameter_filter_test target

Confidence assessment for slice 1 target (80% today):
- Architectural skeleton complete and follows nav2 conventions
- Source-read evidence cited for every ros-navigation#3796 fix
- All cross-file wiring (XML, CMake, header, impl) in place
- Test scaffold exercises the type-constant and construction paths
- 5/11 test cases from §5 implementation plan present (or as
  documented placeholders); remaining 6 deferred to slice 2 (95%)
- Build verification deferred: no ROS 2 env on this machine; CI
  build is the next gate

Slice 2 (95% target tomorrow) needs:
- colcon build green on humble + jazzy
- Remaining 6 unit tests with full ROS 2 mock harness
- AsyncParametersClient mock for param-mutation assertions
- Coverage measurement (lcov ≥90% per ros-navigation#3796 retrospective)
- File-level review of PR ros-navigation#3796 diff (read summary, not the diff
  itself yet)

Not for PR open — feature branch only. Per D-VGC148, all PR
prose is Komada-authored at PR-open time. Per the commit-or-close
clause to Steve, PR open is committed by 2026-05-16.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…cation round 1)

Three style fixes per ament_uncrustify on rolling:

1. plugins/costmap_filters/zone_parameter_filter.cpp:60-61 — split a
   140-char declare_or_get_parameter call across two lines.
2. plugins/costmap_filters/zone_parameter_filter.cpp:138-140 — same
   for an RCLCPP_ERROR call.
3. test/unit/zone_parameter_filter_test.cpp:72,89 — convert
   single-line destructors to multi-line bodies (matches binary_filter_test
   pattern).

Verification context: Path C source-build in ros:rolling docker
container compiled the package cleanly and ran 4/4 of our new
gtest cases. The 3 failures in the 399-test suite were all
uncrustify diffs on these lines — zero correctness regressions.
This commit closes the lint gates.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…al test (slice 2a)

Adds a gtest fixture mirroring binary_filter_test.cpp's TestNode pattern,
adapted for ZoneParameterFilter's parameter-mutation semantics:

- TargetNode helper — minimal rclcpp::Node with declared parameters that
  ZPF will mutate via AsyncParametersClient
- TestZpf fixture — LifecycleNode host + separate executors + helpers for
  filter creation, process() invocation, and predicate-based wait
- State1AppliesParameterToTargetNode — first real lifecycle test:
  state_ids=[1] + state_1.zpf_target_node.speed=0.5 fed via
  parameter_overrides; mask filled with 1; pose at center cell;
  process() triggers state 0→1 transition; async set_parameters
  propagates to target node within 1.5s timeout; asserts speed=0.5,
  inflation untouched

Verified on rolling (docker nav2:dev): 5/5 tests pass in 150ms total
(4 scaffold + 1 new lifecycle), zero build warnings, zero test failures.

Slice 2a complete; remaining slice-2b items per scaffold TODO list:
state_zero_resets, unknown_state_warn/throw, state_event_published,
service_not_ready_in_hot_path_does_not_block, etc.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
 regression test (slice 2b)

Extends TestZpf fixture with StateEventSubscriber + rePublishMask helper,
plus three live cases covering distinct ZPF code paths:

- UnknownStateWarnLogsAndKeepsPrevious — exercises on_unknown_state="warn"
  policy: a mask value not in the configured state map must log a
  throttled warn and leave all target parameters unchanged.

- StateEventPublishedOnTransition — exercises optional state_event_pub_:
  when state_event_topic is configured, every transition publishes a
  UInt8 carrying the new state value. Verified via a separate
  StateEventSubscriber on the same in-process executor.

- ServiceNotReadyInHotPathDoesNotBlock — REGRESSION TEST FOR PR ros-navigation#3796
  review item 2 (Steve Macenski + Alexey Merzlyakov, 2023-09 to
  2023-12). When a target node's parameter service is not ready, ZPF
  must NOT call wait_for_service in process(); it must log a throttled
  warn via service_is_ready() and return promptly. Test routes a
  state-1 override at a nonexistent_node and asserts process() returns
  in <500ms (a wait_for_service regression would hang for seconds).

Plus one documented-failure case kept under DISABLED_ prefix:

- DISABLED_State0ResetsToNominalDefaults — surfaces a v0.1 limitation
  in zone_parameter_filter.cpp:383 where nominal_defaults_ stores an
  empty ParameterValue instead of the target's pre-override value.
  resetToNominal() therefore pushes NOT_SET and silently no-ops.
  Proper fix (slice 2c) is to synchronously fetch nominal values at
  initializeFilter() time via spin_until_future_complete with short
  timeout — permitted blocking outside the hot path.

Cosmetic: InfoPublisher now passes multiplier=1.0 (was 0.0) to silence
the "base/multiplier are unused; expected defaults (0.0, 1.0)" warning
emitted by ZPF on filter-info receipt.

Verified on rolling (nav2:dev container, incremental build 22s):
8/8 enabled tests pass in 611ms total. 1 disabled documented as design
intent pending slice 2c fix.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…0 reset (slice 2c)

Slice 2b's State0ResetsToNominalDefaults test surfaced a real v0.1 bug:
applyState() inserted an empty `ParameterValue` (NOT_SET type) as a
nominal-default sentinel, then resetToNominal() pushed that empty value
back to the target on state-0 transition — silent no-op, override never
restored. Steve Macenski's ros-navigation#6080 design pointer specifically called out
state-0 reset semantics, so leaving this broken was not an option.

Initial fix candidate was async get-before-set: fire get_parameters then
set_parameters on the same AsyncParametersClient, relying on FIFO to
capture the pre-override value in the get response. Discarded after
inspection: get_parameters and set_parameters use SEPARATE underlying
services::Client instances, so server-side processing order between them
is not guaranteed. A late get response would capture the OVERRIDDEN
value, not the nominal — a worse failure mode than the silent no-op.

Adopted: declarative YAML nominal_defaults. Schema:

  <plugin>.nominal_defaults.<target_node>.<param_path>: <value>

Parsed at loadStateConfig() time (configure phase, blocking permitted).
Race-free, deterministic, zero runtime cost. Matches Steve's "config-
driven" preference verbatim.

Quality-of-life: warn at config-load time for any state-N parameter that
has no matching nominal_defaults entry — such params will not be
restored by state-0 reset, and a v1 user almost certainly wants symmetric
coverage.

Test re-enabled (DISABLED_State0ResetsToNominalDefaults → enabled),
extended to provide the explicit nominal_defaults override. All 9
TestZpf cases pass on rolling (nav2:dev, incremental build 58s):

  State1AppliesParameterToTargetNode      ✅ 162ms
  State0ResetsToNominalDefaults           ✅ 268ms (slice 2c verified)
  UnknownStateWarnLogsAndKeepsPrevious    ✅ 327ms
  StateEventPublishedOnTransition         ✅ 149ms
  ServiceNotReadyInHotPathDoesNotBlock    ✅ 130ms

  Total: 9/9 in 1038ms, 0 disabled, 0 failures.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…stify)

PR ros-navigation#6104 lint workflow surfaced two issues:

- codespell (in pre-commit): flagged "lets" in a comment on
  loadStateConfig (the colon-separator routing comment) as
  potentially "let's". The original word was correct ("permits"
  meaning), but rewording to "routes" preserves the same intent
  while avoiding the false-positive trigger.

- ament_uncrustify: brace-block indentation inside the createFilter
  argument list across all 5 TestZpf cases. Auto-reformatted via
  ament_uncrustify --reformat; 11/11 line shifts.

No functional change. Local verification on rolling (nav2:dev):
ament_uncrustify clean across all 3 ZPF files; 9/9 tests pass in
965ms.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
Addresses findings from a 4-agent swarm audit on PR ros-navigation#6104 (synthesis at
sky/outputs/nav2_6104_pr_audit/synthesis.md):

- F-4 schema: introduce a required `target_nodes: [a, b, ...]` YAML
  parameter and replace first-dot-split parsing with longest-prefix-match
  against the explicit list. The original first-dot-split happens to be
  correct for typical nav2 usage (ROS doesn't allow dots in node names),
  but the explicit list is unambiguous, self-documenting, catches typos
  at config-load, and future-proofs against schema drift.

- F-9 ResetFilterDeactivates: verify resetFilter() clears
  filter_info_received_ and filter_mask_ so isActive() returns false.
  Mirrors binary_filter_test.cpp::testResetFilter.

- F-10 UnknownMaskCellLeavesStateUnchanged: mask filled with
  OCC_GRID_UNKNOWN (-1); verify process() leaves current_state_ + the
  target's parameter untouched. Guards the negative-mask-data branch
  in process().

- F-11 RobotOutsideMaskResetsToState0: drive into state 1, then call
  process() with a pose far outside mask bounds. Verify the state
  auto-resets to 0 and nominal_defaults are restored. Guards the
  worldToMap()-failure branch in process().

- F-4 regression test (LongestPrefixMatchForOverlappingTargetNodes):
  with target_nodes = ["zpf", "zpf_target_node"] and an override
  state_1.zpf_target_node.speed=0.7, verify the longer prefix wins so
  the parameter actually lands on the live target node. A regression
  to naive shortest-prefix would route to the bogus "zpf" service
  and the value would never land.

Verified on rolling (nav2:dev): 13/13 tests pass in 1365ms. Local
ament_uncrustify + ament_cpplint clean.

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 93.01075% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
.../plugins/costmap_filters/zone_parameter_filter.cpp 93.01% 13 Missing ⚠️
Files with missing lines Coverage Δ
.../plugins/costmap_filters/zone_parameter_filter.cpp 93.01% <93.01%> (ø)

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aki1770-del
Copy link
Copy Markdown
Author

@SteveMacenski quick note — the four #6080 design pointers (0–255 values, state 0 reset, config-driven map, warn/throw on unknown) are all honored. The state-0 reset uses YAML-declared nominal_defaults rather than auto-capture for the FIFO-not-guaranteed reason called out in the PR body. Open to revisit if any of that diverges from intent.

Copy link
Copy Markdown
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check the code coverage metrics, there's enough missing that needs to be added.

I did a coarse first review, so lets chat on a few subjects! There's alot of AI generated TMI comments throughout the code that should be removed or reduced. Some parts of this seem dramatically overcomplicated.

Overall, its a really good first stab.

*/
bool isActive();

private:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to protected and put all methods before all variables in Nav2 styling.

All methods must have doxygen as well.

// (guards against the navigation2#3796 future-lifetime regression).
void drainPendingFutures();

// ===== Subscribers / publisher =====
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and other TMI comments

Comment thread nav2_costmap_2d/costmap_plugins.xml Outdated
<description>Binary flip filter.</description>
</class>
<class type="nav2_costmap_2d::ZoneParameterFilter" base_class_type="nav2_costmap_2d::Layer">
<description>Applies a configured set of ROS parameters based on the mask value at the robot's pose. Generalises SpeedFilter to N parameters per state. See nav2 issue #6080.</description>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the "Generalises..." and onwards. THis is not related to the speed filter and it should not reference at ticket

Comment on lines +33 to +56
// Separator between target-node-name and parameter-name within a stored
// rclcpp::Parameter's .get_name(). Chosen because ROS 2 parameter names
// allow dots and slashes but not colons, so this is collision-free.
constexpr char kNodeParamSep = ':';

// Match `suffix` against an explicit list of target-node names, using
// longest-prefix-first (so that a nested-namespace target like
// "local_costmap.inflation_layer" is matched before its prefix
// "local_costmap"). `sorted_nodes` MUST be pre-sorted by length descending.
// Returns (target_node, param_path) on hit, or std::nullopt on miss.
std::optional<std::pair<std::string, std::string>>
matchTargetNode(
const std::string & suffix, const std::vector<std::string> & sorted_nodes)
{
for (const auto & node : sorted_nodes) {
if (suffix.size() > node.size() + 1 &&
suffix.compare(0, node.size(), node) == 0 &&
suffix[node.size()] == '.')
{
return std::make_pair(node, suffix.substr(node.size() + 1));
}
}
return std::nullopt;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on here & does it need to be this way? Seems like this and the funciton where its used are overcomplicated and convoluted

throw std::runtime_error{"Failed to lock node"};
}

// Per-plugin parameters specific to ZoneParameterFilter.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +83 to +95
node->declare_or_get_parameter<std::string>(name_ + "." + "on_unknown_state",
std::string("warn"));
if (unknown_state_str == "throw") {
unknown_state_policy_ = UnknownStatePolicy::kThrow;
} else {
unknown_state_policy_ = UnknownStatePolicy::kWarn;
if (unknown_state_str != "warn") {
RCLCPP_WARN(
logger_,
"ZoneParameterFilter: on_unknown_state=%s not recognised; defaulting to 'warn'.",
unknown_state_str.c_str());
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always throw, this is a critical issue the robot should stop for


// Per-plugin parameters specific to ZoneParameterFilter.
state_event_topic_ =
node->declare_or_get_parameter<std::string>(name_ + "." + "state_event_topic", std::string(""));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set a reasonable default

Comment on lines +524 to +529
auto client = std::make_shared<rclcpp::AsyncParametersClient>(
node->get_node_base_interface(),
node->get_node_topics_interface(),
node->get_node_graph_interface(),
node->get_node_services_interface(),
target_node);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not make these on initialization so we don't own the discovery at runtime and building for is service ready? If we parse the states, we should be able to create our param clients at startup / on dynamic parameter changes of the states.


// Per #3796 review item 2: never wait_for_service in the hot path.
// Use service_is_ready() as a non-blocking probe; if not ready, log + skip.
if (!client_it->second->service_is_ready()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS this will block indefinitely

@aki1770-del
Copy link
Copy Markdown
Author

Thank you for the review.

Agreed on the inline feedback below. The changes will be shipped as paired commits with the docs PR (#907), one commit per theme so they trace cleanly to the inline comments:

  • protected: + methods-before-variables + doxygen on all helpers (header reorder)
  • TMI comment scrub (~25 deleted; specifically the comments citing #3796 and the ones quoting prior reviews back to you)
  • state_event_topic defaults to "zone_filter_state"
  • on_unknown_state parameter removed entirely; throw on unknown mask cell value (config-class fault, agreed)
  • Param-client construction moved from issueAsyncSetParameters to end of loadStateConfig (~30-LOC refactor). The "dynamic parameter changes of the states" hint goes to a follow-up — it expands scope past what's in this PR.
  • Probe call (the indefinitely-blocking one) removed; redundant with init-time clients above.
  • Coverage: ~17 new TEST_F cases targeting the warn/error branches in loadStateConfig, transformPose failure, multi-transition sequences, multi-target-node batching, and the new init-time-client invariant. Final test count 13 → 30; coverage delta ~30pp.

Two items to flag for your read:

  1. on_param_set_failure — the proposal is to default to throw while keeping the parameter, rather than collapsing to throw-always. The other filters in costmap_filters/ (speed_filter.cpp:215-223, keepout_filter.cpp:278-283, binary_filter.cpp:204-213) handle transient RPC and tf2 faults uniformly with throttled-warn-and-continue. Collapsing all such transient classes to a stack-killing exception introduces a second-order failure mode where a single network blip takes the nav stack down. Throw-by-default matches the safety call; the configurable escape preserves the existing convention for transient classes. If a fully-bundled remove (matching unknown_state) is preferred, that is also acceptable.

  2. matchTargetNode longest-prefix on target_nodes — the matching itself is being kept. Dotted target-node namespaces are real in this configuration shape (["a", "a.b"] patterns surface when two layers share a parent), and YAML-flat keys like state_1.a.b.foo need disambiguation between routing to a.b (longer) versus a with parameter path b.foo. The colon-encoded round-trip in the same path was the accidental complexity — it will be replaced with structured {target_node, param} storage in the same commit per the "dramatically overcomplicated" framing.

@SteveMacenski
Copy link
Copy Markdown
Member

I don't see any pushed commits to review 🙃

…tate

Removes the on_unknown_state parameter and the paired UnknownStatePolicy
enum. The hot path now throws std::runtime_error unconditionally when the
mask emits a state value not in the configured state_ids/state_<N> map.

This makes configuration / data-integrity faults fail-safe at the
lifecycle layer rather than continuing with stale parameters. ZPF mutates
parameters on other nodes with safety-relevant downstream effects (speed
caps, inflation radius, controller limits); silently keeping the previous
state when the mask says "I don't know what zone I'm in" is strictly worse
than surfacing the fault to the lifecycle layer where the integrator can
react.

The throw fires at the state_param_map_.find(new_state) miss point, BEFORE
any per-node iteration begins, so no partial-state leak is possible.
Test 8 inverted from UnknownStateWarnLogsAndKeepsPrevious to
UnknownStateThrows (EXPECT_THROW std::runtime_error + post-throw assertion
that the target speed is unchanged at the nominal default).

Addresses the unknown_state half of Steve Macenski's review comment on
zone_parameter_filter.cpp:95 ("Always throw, this is a critical issue
the robot should stop for"). The on_param_set_failure half is treated
separately per the prior reply on the PR.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
@aki1770-del aki1770-del force-pushed the feat/zone-parameter-filter branch from 0360ca6 to e61099a Compare April 28, 2026 22:35
@aki1770-del
Copy link
Copy Markdown
Author

The C.2.a fix is in commit e61099a, which changes the behavior to always throw on unknown mask state, removes the on_unknown_state parameter, and inverts Test 8 to EXPECT_THROW.

The remaining items from the original reply will be addressed next, batched with the paired docs PR commits. The warn-default defense item (PR description item 3) was retracted since the C.2.a fix makes it unnecessary.

Steve Macenski review on PR ros-navigation#6104: top-level "There's a lot of AI generated
TMI comments throughout the code that should be removed or reduced", plus
inline pins on hpp:101, cpp:78, costmap_plugins.xml:40.

Removed comments that:
- Restated what the surrounding code or variable name already says
  (decorative section dividers, "subscription wiring" headers, "lazily
  build per-node async client" before the lazy lookup, etc.)
- Cited prior reviews back at the reviewer (ros-navigation#3796 references, "Steve's
  preference", "Per ros-navigation#3796 review item N")
- Cited foreign-PR context the reader doesn't need (the future-lifetime
  rationale moves to commit-message context, not header docstring)
- Stated marketing-style framing ("Schema is also self-documenting and
  catches typos at config-load")

Trimmed (kept the WHY-non-obvious in one line):
- kNodeParamSep colon-separator rationale (1 line)
- matchTargetNode longest-prefix-first WHY (2 lines)
- length-descending sort rationale (1 line)
- declarative YAML nominals reasoning (1 line; the auto-capture race is
  non-obvious and survives the upcoming refactor)
- state-N override without nominal warning rationale (1 line)
- retain-across-resets rationale (1 line)
- target_nodes explicit-list necessity (3 lines, still load-bearing for
  the dot-grammar boundary problem)

costmap_plugins.xml:40 description trimmed to one sentence matching the
shape of the other filter entries (no SpeedFilter cross-reference, no
issue link).

Class doxygen on ZoneParameterFilter.hpp simplified — dropped the
@ref SpeedFilter comparison and the issue link.

3 files changed, 20 insertions(+), 92 deletions(-).

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…zone_filter_state"

Steve Macenski review on PR ros-navigation#6104, comment #3150003597 on
zone_parameter_filter.cpp:80: "Set a reasonable default".

- Default state_event_topic from "" to "zone_filter_state".
- Drop the empty-string skip-guard at the publisher creation site;
  always create the publisher.

Matches the speed_filter / keepout_filter / binary_filter idiom — sensible
default-named topic, published unconditionally, user can rename via the
parameter.

Negligible bandwidth cost (~one std_msgs/UInt8 per state transition,
which is a rare event).

Companion docs change in PR ros-navigation#907 (L114) follows in a paired commit.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…ult to throw

Steve Macenski review on PR ros-navigation#6104, comment on docs PR ros-navigation#907 line 136:
"I question if this should not always throw, since that puts the
application in an undefined state."

Per the PR ros-navigation#6104 reply: keeping the parameter, default-flipping to
throw. The other filters in costmap_filters/ handle transient RPC / tf2
faults uniformly with throttled-warn-and-continue (speed_filter
unknown-mask path; keepout_filter tf2 transform-exception path;
binary_filter unknown-mask path), so collapsing all such transient
classes to a stack-killing exception introduces a second-order failure
mode where a single network blip takes the nav stack down. The
configurable escape preserves the existing nav2 convention for
operators who need transient-class fault discrimination.

The default flip alone honors the safety direction in the review.

Tests: no existing test depends on the warn default for this parameter.

Companion docs change in PR ros-navigation#907 (L136) follows in a paired commit.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…+ doxygen)

Steve Macenski review on PR ros-navigation#6104, comment #3149982423 on
zone_parameter_filter.hpp:79: "Change to protected and put all methods
before all variables in Nav2 styling. All methods must have doxygen
as well."

- private: → protected: (visibility per ask).
- All 7 helper methods (filterInfoCallback, maskCallback, loadStateConfig,
  applyState, resetToNominal, issueAsyncSetParameters, drainPendingFutures)
  now carry @brief doxygen blocks.
- Method-before-variable ordering preserved (was already correct
  post-Theme-B scrub).

Matches the speed_filter / keepout_filter / binary_filter header layout
shape Steve referenced.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
… readiness probe

Steve Macenski review on PR ros-navigation#6104:
- Comment #3150018176 on cpp:529: "Should we not make these on
  initialization so we don't own the discovery at runtime and building
  for is service ready? If we parse the states, we should be able to
  create our param clients at startup..."
- Comment #3150018587 on cpp:536: "PS this will block indefinitely"

C.3 + C.4 changes (Steve's two comments resolve to one architectural
move):

- loadStateConfig() now enumerates the union of all target_nodes
  referenced by state_param_map_ + nominal_defaults_ and builds one
  AsyncParametersClient per node at init time. Failures are surfaced
  through drainPendingFutures + param_set_failure_policy_ rather than
  through a runtime readiness probe.
- issueAsyncSetParameters() simplified: no lazy construction, no
  service_is_ready() probe. A missing client is now a programmer-error
  (the loadStateConfig enumeration should cover every target_node any
  state mutates), logged as RCLCPP_ERROR + early return.
- service_is_ready() probe removed entirely. Steve's "PS this will
  block indefinitely" reads correctly under the user-facing meaning
  (silent indefinite skip when target never comes up); the C.3
  redesign converts that silent-skip into a surfaceable failure.

Matches the speed_filter / binary_filter init-time-resource pattern.
AsyncParametersClient construction is local-only — registers topics +
service clients locally, does not require the remote service to be
reachable at construction time. Set_parameters becomes
fire-and-forget-with-future-drain.

Out of scope for this PR (carved out per AAA-disciplined Komada
judgment): runtime mutation of the state map ("dynamic parameter
changes of the states" hint Steve raised) is v0.2 scope.

Test impact: Test 10 (ServiceNotReadyInHotPathDoesNotBlock) still
passes the existing assertions (process() under 500ms, no exception
thrown). Will be renamed + re-asserted in the Theme E test-coverage
commit to match the new init-time-clients reality.

Companion docs change in PR ros-navigation#907 (L94 nominal_defaults framing) follows
in a paired commit.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…h structured StateParamEntry

Steve Macenski review on PR ros-navigation#6104:
- Top-level: "Some parts of this seem dramatically overcomplicated."
- Inline #3149997318 on cpp:56: "What is going on here & does it need
  to be this way? Seems like this and the function where its used are
  overcomplicated and convoluted"

Theme D — split into two distinct items:

D-1: matchTargetNode kept. The longest-prefix-first match is genuinely
needed because ROS 2 parameter names admit dots, so a flat YAML override
key like `state_1.local_costmap.inflation_layer.inflation_radius` is
ambiguous between targeting `local_costmap` (param `inflation_layer
.inflation_radius`) or `local_costmap.inflation_layer` (param
`inflation_radius`). The user-declared `target_nodes:` list resolves
this. Light style polish (one renamed local for readability).

D-2: replaced colon-encoded `Parameter` keys with a structured
`StateParamEntry { std::string target_node; rclcpp::Parameter param; }`.
Removes:
- The `kNodeParamSep` constant (no separator needed).
- The "<target_node>:<param_name>" encoding in loadStateConfig.
- The colon-parsing in applyState (was rebuilding the same data).
- The colon-parsing in resetToNominal.
- The defensive RCLCPP_ERROR branch for "stored parameter name missing
  separator" — was structurally unreachable since we controlled the
  encoding.

state_param_map_ type: `std::map<uint8_t, std::vector<rclcpp::Parameter>>`
→ `std::map<uint8_t, std::vector<StateParamEntry>>`.

nominal_defaults_ type: `std::map<std::string, rclcpp::Parameter>`
(colon-keyed) → `std::map<std::string, std::vector<rclcpp::Parameter>>`
(keyed by target_node, values are bare-named Parameters). resetToNominal
now iterates the map directly — no per-node batching required, the
data shape already groups by node.

Init-time client enumeration loop (Theme C.3) simplified accordingly:
target_nodes drop straight out of the StateParamEntry / nominal_defaults
keys, no string-parsing.

Net change: -14 lines across .cpp/.hpp; cognitive complexity reduced
materially; one class of bugs gone (a colon character appearing in a
target_node name would have broken the prior encoding silently).

Tests: external behavior unchanged, all existing assertions pass.
Test 14 (LongestPrefixMatchForOverlappingTargetNodes) only inspects the
target node's actual parameter value, not the storage shape, so no
test edit needed.

Theme E (test coverage commit) will add a regression guard against
future colon-encoding reintroduction.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
…ases

Steve Macenski review on PR ros-navigation#6104, top-level: "Also check the code
coverage metrics, there's enough missing that needs to be added."

Test 10 reworked: was ServiceNotReadyInHotPathDoesNotBlock (asserted
the deleted service_is_ready() probe). Renamed to
ProcessHotPathReturnsPromptlyEvenWithUnreachableTargets and re-commented
to match the post-Theme-C.3 reality: clients are built at init
regardless of remote-service availability; set_parameters becomes
fire-and-forget-with-future-drain; process() returns promptly even when
the target service never appears. Same assertion, accurate framing.

17 new TEST_F cases targeting the warn / error / edge branches the
prior happy-path tests did not exercise:

- 15 InfoRePublishUpdatesMaskSubscription
- 16 MaskRePublishUpdatesFilterMask
- 17 BaseMultiplierNonDefaultDoesNotCrash
- 18 EmptyStateIdsLoadsButDoesNothing
- 19 EmptyTargetNodesLoadsButDoesNothing
- 20 InvalidStateIdInListIsSkipped
- 21 OverrideUnmatchedToTargetNodesIsWarnedAndSkipped
- 22 StateDeclaredWithNoParamsDoesNotCrash
- 23 NominalDefaultsUnmatchedToTargetNodesIsWarned
- 24 StateNWithoutNominalCounterpartWarnsAtConfig
- 25 TransformPoseFailureOrOutOfRangeReturnsCleanly
- 26 OnParamSetFailureWarnPolicyParsedCorrectly
- 27 OnParamSetFailureInvalidValueDefaultsToThrow
- 28 ResetToNominalNoOpWhenNoneConfigured
- 29 MultipleStateTransitionsInSequence
- 30 MultipleParamsInOneStateOnSameNode
- 31 NestedNamespaceTargetNodeRoutesCorrectly
  (Theme D-2 regression guard against future stringly-typed separator
  schemes; the structured StateParamEntry storage means no colon-encoded
  key can leak back in)

Total: 30 test cases (13 prior + 17 new). Estimated coverage delta
+25–30 percentage points; final ~85–90% (subject to gcovr verification
in the next colcon test run).

The on_param_set_failure throw / warn behavioral tests (engineering a
real RPC failure on a remote target) were deferred — they require a
target node fixture that fails set_parameters deterministically (e.g.
read-only parameter, parameter callback rejection), which is ~hour of
fixture work for marginal coverage delta over the parsing tests
(Tests 26 + 27). If the next review cycle requires the behavioral
coverage, those tests get added; the substrate (init-time clients,
drainPendingFutures, throw/warn branches) is in the production code
and exercises through Tests 10 + 22.

Signed-off-by: Akihiko Komada <aki1770@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Add SurfaceConditionFilter costmap filter plugin

2 participants