Skip to content

nvme-apple: serialize the controller-global command tag space on t8015#2

Open
yhavry wants to merge 1 commit into
HoolockLinux:hoolockfrom
yhavry:A11-NVMe-duplicate-tag-error-for-tag-0-workaround
Open

nvme-apple: serialize the controller-global command tag space on t8015#2
yhavry wants to merge 1 commit into
HoolockLinux:hoolockfrom
yhavry:A11-NVMe-duplicate-tag-error-for-tag-0-workaround

Conversation

@yhavry
Copy link
Copy Markdown

@yhavry yhavry commented Jun 5, 2026

On A11, ANS2 apparently treats the firmware-visible command tag as
controller-global rather than queue-local. The admin queue uses tag 0
and the IO queue is not given reserved tags, so both can submit tag 0
concurrently, which the controller reports as a duplicate-tag error.

Track active t8015 tags in a controller-wide bitmap:

  • reserve the tag before hardware submission;
  • if it is already active, return BLK_STS_RESOURCE so blk-mq retries
    rather than busy-waiting;
  • release it only after the CQ head has been acknowledged to ANS;
  • clear stale tag state on controller reset.

Signed-off-by: Yuriy Havrylyuk yhavry@gmail.com

On t8015 (A11), the admin queue uses command tag 0 and the IO tagset is
given no reserved tags, so the admin and IO queues can have tag 0
outstanding at the same time. When that happens the controller reports a
duplicate-tag error for tag 0.

We have no documentation for ANS2's tag handling on this generation, but
the failure strongly suggests it treats the command tag as a single
controller-wide resource rather than a per-queue one: making the tag
unique across both queues makes the error go away. Enforce that with a
controller-wide bitmap of active tags and:
 - reserve the tag before hardware submission;
 - if it is already active, return BLK_STS_RESOURCE so blk-mq retries
   rather than busy-waiting;
 - release it only after the CQ head has been acknowledged to ANS;
 - clear stale tag state on controller reset.

Signed-off-by: Yuriy Havrylyuk <yhavry@gmail.com>

Signed-off-by: yhavry <34289148+yhavry@users.noreply.github.com>
@asdfugil asdfugil force-pushed the A11-NVMe-duplicate-tag-error-for-tag-0-workaround branch from bc64db6 to da64d58 Compare June 5, 2026 09:53
Comment thread drivers/nvme/host/apple.c
return BLK_STS_OK;

out_release_tag:
apple_nvme_t8015_release_cid(anv, cmnd->common.command_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a separate label here since there was only one goto out_free_cmd, so just put this line after out_free_cmd:

Comment thread drivers/nvme/host/apple.c
"NVMMU TCB invalidation failed\n");
}

static bool apple_nvme_t8015_reserve_tag(struct apple_nvme *anv,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with existing function put _t8015 at the end of the function name instead of in the middle.

Comment thread drivers/nvme/host/apple.c
{
u16 tag;

if (anv->hw->has_lsq_nvmmu)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not check for anv->hw->has_lsq_nvmmu in functions marked as t8015. Instead check at the call site.

Comment thread drivers/nvme/host/apple.c
return !test_and_set_bit(tag, &anv->t8015_active_tags);
}

static void apple_nvme_t8015_release_cid(struct apple_nvme *anv,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Comment thread drivers/nvme/host/apple.c
{
u16 tag;

if (anv->hw->has_lsq_nvmmu)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Comment thread drivers/nvme/host/apple.c

if (!anv->hw->has_lsq_nvmmu) {
writel(q->cq_head, q->cq_db);
readl(q->cq_db);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this readl()?

Comment thread drivers/nvme/host/apple.c
}

if (found)
if (found && anv->hw->has_lsq_nvmmu)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function apple_nvme_poll_cq() may complete multiple commands in one invocation, is it possible to set a bitfield in the apple_nvme_cqe_pending() while loop, and only release tags and write doorbell according to the bitfield after the loop is completed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants