Cortex-M backend: Add AoT scratch-buffer planning. by Erik-Lundell · Pull Request #19636 · pytorch/executorch

Erik-Lundell · 2026-05-18T12:06:17Z

This is done for conv, depthwise conv, transpose conv, and bmm.

Add scratch tensors to the operator signatures, which are then
assigned exir.memory.alloc. These allocs are automatically memory
planned by ExecuTorch.

Introduce required_cmsis_buffer_sizewhich computes the buffer
size from node properties + the Cortex-M configuration.
The function uses functions registered by target in
backends/cortex_m/passes/scratch_buffer_sizes.py
This is used to set the size of the allocs in ConvertToCortexMPass

Finally, modify the kernels to use the new scratch tensor instead
of allocating temporary memory. Add a new macro CORTEX_M_ENABLE_ASSERT
to do a safety check that the aot computed buffer size is equal to the
buffer size computed at runtime. Use this when testing.

cc @psiddh @AdrianLundell @digantdesai @rascani @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Add scratch tensors to the operator signatures, which are then assigned exir.memory.alloc. These allocs are automatically memory planned by ExecuTorch. Introduce `required_cmsis_buffer_size`which computes the buffer size from node properties + the Cortex-M configuration. The function uses functions registered by target in backends/cortex_m/passes/scratch_buffer_sizes.py This is used to set the size of the allocs in ConvertToCortexMPass Finally, modify the kernels to use the new scratch tensor instead of allocating temporary memory. Add a new macro CORTEX_M_ENABLE_ASSERT to do a safety check that the aot computed buffer size is equal to the buffer size computed at runtime. Use this when testing. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: Ia7ec8eda87833888a0639b480e531fd17818298a

Follow the plan from previous buffer planning work. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I4bf3ca1cc421421b61903cba24856d0fd635d64a

We can now reduce the memory size to 0 when building the cortex_m test runner. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: Ieb1292c2db4651cd1f0756aa9d43ecedd5e262e5

pytorch-bot · 2026-05-18T12:06:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19636

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 2 New Failures

As of commit 264fc9e with merge base 824cbff ():

NEW FAILURES - The following jobs have failed:

pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_fp16_conv2d
trunk / test-arm-backend-ethos-u (test_pytest_ops_ethos_u85) / linux-job (gh)
RuntimeError: Command docker exec -t 201be11cdc139a715560669a4fa0871f1d3d6b189264a4e9a45dfb18010cf192 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Erik-Lundell · 2026-05-18T12:09:35Z

This is a polished version of #16580

rascani · 2026-05-18T16:53:42Z

+#ifdef CORTEX_M_ENABLE_ASSERTS
+  const int32_t runtime_buffer_bytes =
+      arm_fully_connected_s8_get_buffer_size(&out_dims);
+  if (ctx.size != static_cast<size_t>(runtime_buffer_bytes)) {


WDYT about doing ctx.size < runtime_buffer_bytes here? Essentially, if we have more memory than what we actually need, do we still want to error out?

To me, this is a correctness assert. We don't just want to avoid failure, we want to make sure to ensure correctness.

rascani · 2026-05-18T16:57:07Z

+    ctx.buf = scratch.mutable_data_ptr<int8_t>();
+  }
+
+#ifdef CORTEX_M_ENABLE_ASSERTS


As much as I want to limit the runtime checks, I think it'd be good to have this check always on and non-optional. Without this, we could wind up writing past end of buffers.

Also, naming nit: technically this is not an assert, as it should not crash the program if it fails. Maybe ENABLE_RUNTIME_CHECKS?

The idea is that we should be confident that we are doing the correct allocation after testing. Users can turn this on to verify for example that they have not mixed up cmsis_nn versions, but then skip it in production. That's also why I want it to be a crash. If there is a mismatch here, I want to enforce a fix. Also, when we have this flag available, we can use it in more places.

rascani · 2026-05-18T17:01:01Z

+        int(
+            cmsis_nn.convolve_wrapper_buffer_size(
+                backend,
+                cmsis_nn.DataType.A8W8,


Probably be good to make data type a parameter for all of these.

Can we keep that for a later patch, when the backend starts supporting more than int8?

rascani · 2026-05-18T19:42:02Z

            )

+        with node.graph.inserting_before(node):
+            scratch = node.graph.call_function(


exir.passes.make_alloc_node is the canonical helper for emitting memory.alloc nodes. It's used by to_out_variant and to_scratch_op_pass use internally, and it sets meta["val"] and meta["tensor_meta"] on the alloc. The memory planner keys off meta["val"] / meta["tensor_meta"] to build the TensorSpec that drives lifetime analysis and arena placement.

Thanks, I'll use that

rascani · 2026-05-18T19:49:42Z

            )
            return exir_ops.edge.cortex_m.quantized_conv2d.default, new_args

+    def _set_scratch_buffer_size(self, node: torch.fx.Node) -> None:


IIUC, the current flow creates each scratch alloc with a placeholder shape, wires it into the cortex_m op's args, and then walks back through node.args[-(i+1)] in _set_scratch_buffer_size to mutate the alloc's shape once the size is known. This works, but the two-phase flow has a few drawbacks:

The alloc -> size pairing is positional and implicit, split across two files. Adding a non-scratch trailing arg to any op signature silently mis-pairs.

The UNINITIALIZED_ALLOC_ARGS sentinel + later mutation requires a reader to follow two hops to understand the alloc's actual size.

_set_scratch_buffer_size exists only to undo the placeholder.

All the values required_cmsis_nn_buffer_sizes needs are already available locally in get*_replacement. If the size functions took inputs directly (e.g. an explicit ConvBufferSizeInputs dataclass), you could compute sizes before constructing the cortex_m op and emit allocs at their final size on one pass.

I see your point, but I would like to keep the current design with the following arguments:

node already perfectly captures the context needed to compute scratch buffer sizes. Introducing a new dataclass interface creates more boilerplate code with the risk of mismatching args.

Importantly, node.meta["val"].shape is needed for conv, which requires either executing the node the calculate it (as is done now), or some duplication of logic to compute the shape from its args.

cortex_m node creation is only done in one place in the pass (L513) , and the initialization happens directly after, so the logic is not too far separated.

There is a check that the trailing args are allocs, so there is no silent mis-pairing. If alloc sizes were given in the wrong order, the runtime check would catch it.

I have pushed a commit to try to clarify the pattern though.

digantdesai · 2026-05-18T20:05:36Z

+
+    return [
+        int(
+            cmsis_nn.convolve_wrapper_buffer_size(


Mainly clarify the uninitialized/intialize alloc pattern. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I062a5048094129be6ed8e9f7eafc096f34132b2f

Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I8da2906a5f4cc69d15d033d8e5d1113d8b4afc4e

mansnils and others added 3 commits May 18, 2026 08:09

Cortex-M backend: Add support for planned bmm scratch size.

936ed4b

Follow the plan from previous buffer planning work. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I4bf3ca1cc421421b61903cba24856d0fd635d64a

Cortex-M backend: Add support for planned transposed conv scratch size.

699e731

We can now reduce the memory size to 0 when building the cortex_m test runner. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: Ieb1292c2db4651cd1f0756aa9d43ecedd5e262e5

Erik-Lundell requested review from digantdesai, kirklandsign, larryliu0820 and rascani as code owners May 18, 2026 12:06

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 18, 2026

Erik-Lundell added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label May 18, 2026

rascani reviewed May 18, 2026

View reviewed changes

digantdesai reviewed May 18, 2026

View reviewed changes

Address review comments

23c86c1

Mainly clarify the uninitialized/intialize alloc pattern. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I062a5048094129be6ed8e9f7eafc096f34132b2f

github-actions Bot added the module: arm Issues related to arm backend label May 19, 2026

Change helper function names.

264fc9e

Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I8da2906a5f4cc69d15d033d8e5d1113d8b4afc4e

Conversation

Erik-Lundell commented May 18, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19636

❗ 1 Active SEVs

❌ 2 New Failures

Uh oh!

Erik-Lundell commented May 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Erik-Lundell commented May 18, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented May 18, 2026 •

edited

Loading