All: Remove deprecated methods for 1.12.0 by dramaticlly · Pull Request #16449 · apache/iceberg

dramaticlly · 2026-05-20T18:02:11Z

Remove all methods, classes, and fields marked with "@deprecated will be removed in 1.12.0" across all modules, since 1.11.0 has been released.

Deleted Classes

SystemProperties → use SystemConfigs
PartitionStats → use PartitionStatistics interface / BasePartitionStatistics
DataReader → use PlannedDataReader
GenericAppenderFactory → use GenericFileWriterFactory
BaseFileWriterFactory → use RegistryBasedFileWriterFactory
S3SignRequest, S3SignResponse, S3SignRequestParser, S3SignResponseParser, S3ObjectMapper → use RemoteSign* equivalents

Notable Non-Mechanical Changes

Beyond deleting deprecated code, the following adjustments were needed. Items marked ⚠️ reviewer attention depart from a strict deprecation removal and warrant a closer look:

⚠️ PositionDelete.set(path, pos, row) and row() — Kept without @Deprecated. The public deprecation notice said position deletes with row data are no longer supported, but internal code (BaseTaskWriter, BasePositionDeltaWriter, RewriteTablePathUtil, Avro) still relies on these methods. Reviewer: confirm the methods are safe to retain as package-internal helpers, or whether a follow-up should narrow visibility / push removal to 1.13.0.
BaseScan.io() — Kept as a non-deprecated protected method. Internal subclasses (DataScan, BaseDistributedDataScan) use it; the deprecation targeted external callers only.
⚠️ HadoopFileIO.serializeConfWith() — Kept without @Deprecated because the HadoopConfigurable interface still declares the method. Also inlined the constructor logic since the removed constructor was the delegation target. Reviewer: the cleaner path is to deprecate the interface method (and the ResolvingFileIO / SerializableFileIOWithSize × 3 implementations) for 1.13.0 removal, then drop the SerializationUtil instanceof HadoopConfigurable branch and the iceberg.mr.config.serialization.disabled opt-in — those are the only remaining consumers after Core: Move Hadoop conf serialization into SerializableConfiguration #15583 made SerializableConfiguration itself SerializableSupplier-compatible. Happy to land this as a follow-up.
BaseParquetReaders.createStructReader(List, StructType, Integer) — Changed from a concrete method (that delegated to the removed 2-param version) to abstract. Both existing subclasses (InternalReader, GenericParquetReaders) already override it.
RewriteTablePathUtil.PositionDeleteReaderWriter.writer() — The 4-arg method was a default delegating to the removed 5-arg version. Made it abstract; updated all implementors (RewriteTablePathSparkAction in v3.5/v4.0/v4.1) to implement the 4-arg signature directly.
⚠️ S3V4RestSignerClient — Removed deprecated property fallback logic from baseSignerUri(), endpoint(), and check(). The legacy properties (s3.signer.uri, s3.signer.endpoint) are no longer supported, and endpoint() now requires RESTCatalogProperties.SIGNER_ENDPOINT (previously defaulted to "v1/aws/s3/sign"). Reviewer: this is a stricter behavior change than a pure removal — please confirm downstream catalog adapters / integration tests are aware they must now supply RESTCatalogProperties.SIGNER_ENDPOINT explicitly.
Parquet.java — Removed the NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED branch; without the config, always use NameMapping.empty() (the safe default).
MetricsConfig.forPositionDelete(Table) — All callers (FlinkAppenderFactory, RegistryBasedFileWriterFactory) updated to use the no-arg forPositionDelete() since row-level position delete metrics are no longer supported.

9. ⚠️ GenericFileWriterFactory — Re-added writerProperties to the constructor and Builder since it was a public non-deprecated setter used by GenericAppenderHelper and kafka-connect RecordUtils. Reviewer: confirm this re-addition matches the intended public surface (i.e., the field was incidentally lost while removing GenericAppenderFactory and should remain on GenericFileWriterFactory).

nastra · 2026-05-22T10:20:20Z

-    // TODO change to required in 1.12.0
-    if (!properties().containsKey(S3_SIGNER_ENDPOINT)
-        && !properties().containsKey(RESTCatalogProperties.SIGNER_ENDPOINT)) {
-      LOG.warn(


this shouldn't be removed. Instead, this should go through a Preconditions.checkArgument check

yeah I think RESTCatalogProperties.SIGNER_ENDPOINT is replaced with RESTCatalogProperties.SIGNER_URI and we can rely on LINE 206 to ensure its existence. Sorry it's a bit messy right now

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…DeleteRowSchema - TestS3FileIOProperties: Add required signer.endpoint property since the deprecated default was removed - TestSparkWriterMetrics: Override checkRowStatistics and checkNotExistingRowStatistics to match behavior without positionDeleteRowSchema (consistent with Flink tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BaseScan.io() and PositionDelete.set(path, pos, row)/row() still have callers and cannot be removed yet. Restore their deprecation annotations to preserve the original removal intent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…io() The method had only 4 internal callers (DataScan, BaseDistributedDataScan) which are trivially replaced with table().io(). Also moves revapi entries to the 1.12.0 section with unified justification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ification All deprecated API removals for this release should be under the "1.12.0" version key with justification "Removing deprecated API scheduled for removal in 1.12.0", sorted alphabetically by class/method name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dramaticlly · 2026-05-26T06:37:22Z

@nastra do you want to take another look as now CI is green. I leave the PositionDelete.set(path, pos, row) and row() as is given there's nontrivial amount of caller still use the Deprecated method, plan to handle in a follow up PR

The `specsById` map carried on `BaseScanTaskResponse` and its builder is parser context, not part of the REST spec for scan-planning response models. It was deprecated in 1.11.0 (apache#14485) with a note that visibility would be reduced in 1.12.0. Demote `specsById()` getter and `withSpecsById()` setter (plus the builder's `deleteFiles()` getter) from public to protected so the parsers and subclasses can still use them while keeping the field out of the public API. To preserve cross-package construction paths used by `CatalogHandlers`, `RESTServerCatalogAdapter`, and tests in `org.apache.iceberg.rest`: - Add `builder(Map<Integer, PartitionSpec> specsById)` factory on each of `PlanTableScanResponse`, `FetchPlanningResultResponse`, and `FetchScanTasksResponse` for initial server-side construction. - Add `toBuilder()` on each response for copy-with-modification patterns (used when adapters rebuild a response to inject credentials or rewrite status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…NSAFE branch The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking the name mapping for a vectorized read: 1. caller-provided nameMapping → use it 2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe fallback to position-based field resolution) 3. default → use `NameMapping.empty()` (safe) PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but collapsed the default to `null` rather than `NameMapping.empty()`, making unsafe position-based resolution the new default. That contradicts the PR description ("always use NameMapping.empty() (the safe default)") and silently regresses readers that don't supply an explicit mapping. Restore the safe default by using `NameMapping.empty()` when no mapping is provided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`build.gradle` line 145 sets `oldVersion = "1.11.0"` — revapi compares the current code against the 1.11.0 released baseline. Section keys in `.palantir/revapi.yml` correspond to that baseline, so new breaks accepted during the 1.12.0 dev cycle belong under `"1.11.0"`, not under a not-yet-released `"1.12.0"` section. Evidence in-file: the `"1.10.0"` section already contains entries with justifications like "Removing deprecated code for 1.11.0" — entries added during the 1.11.0 dev cycle when `oldVersion` was `"1.10.0"`. Same shape applies now. This commit: - Renames the section header `"1.12.0"` -> `"1.11.0"`, merging the new entries from PR apache#16449 into the existing 1.11.0 section that was already on upstream/main (the partition-stats entries from PR apache#14998). - Restores the original justifications on the 17 partition-stats entries that were rewritten by an earlier fixup commit (`fb5ddae9b2 Core: Move partition stats revapi entries to 1.12.0`), back to "Removed deprecated functionality for partition stats". - Drops a duplicate `class org.apache.iceberg.PartitionStats` entry that the earlier fixup commit had introduced. Net effect: the cumulative PR diff against upstream/main is now additive only (new entries appended to the existing 1.11.0 section, no churn on existing entries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pvary · 2026-06-25T11:58:06Z

⚠️ PositionDelete.set(path, pos, row) and row() — Kept without @deprecated. The public deprecation notice said position deletes with row data are no longer supported, but internal code (BaseTaskWriter, BasePositionDeltaWriter, RewriteTablePathUtil, Avro) still relies on these methods. Reviewer: confirm the methods are safe to retain as package-internal helpers, or whether a follow-up should narrow visibility / push removal to 1.13.0.

We should remove them. Some of the calls are already with null, so that should not be an issue, the others are not something where we can communicate the deprecation with annotation. That is why we added https://iceberg.apache.org/spec/#position-delete-files-with-row-data.

So I suggest that remove the functionality too - Definitely in another PR.

BaseParquetReaders.createStructReader(List, StructType, Integer) — Changed from a concrete method (that delegated to the removed 2-param version) to abstract. Both existing subclasses (InternalReader, GenericParquetReaders) already override it.

This is ok.

RewriteTablePathUtil.PositionDeleteReaderWriter.writer() — The 4-arg method was a default delegating to the removed 5-arg version. Made it abstract; updated all implementors (RewriteTablePathSparkAction in v3.5/v4.0/v4.1) to implement the 4-arg signature directly.

This is ok.

MetricsConfig.forPositionDelete(Table) — All callers (FlinkAppenderFactory, RegistryBasedFileWriterFactory) updated to use the no-arg forPositionDelete() since row-level position delete metrics are no longer supported.

This is ok.

GenericFileWriterFactory — Re-added writerProperties to the constructor and Builder since it was a public non-deprecated setter used by GenericAppenderHelper and kafka-connect RecordUtils. Reviewer: confirm this re-addition matches the intended public surface (i.e., the field was incidentally lost while removing GenericAppenderFactory and should remain on GenericFileWriterFactory).

I don't see this "re-addition" in the PR, but it should be fine, as long as the positionDeleteRowSchema is not added back

After restoring `NameMapping.empty()` as the default in `Parquet.ReadBuilder` (commit 8881516), two data-module tests fail because they wrote parquet files via raw `AvroParquetWriter` (no Iceberg field IDs in the file metadata) and read them back without providing a `NameMapping`, implicitly relying on the unsafe positional-fallback ID assignment that the `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED` config previously enabled by default. The deprecation message on that config explicitly said "Fallback ID assignment in Parquet is UNSAFE and will be removed in 1.12.0. Use name mapping instead." Honor that guidance: - `TestGenericData.testTwoLevelList` and `TestParquetEncryptionWithWriteSupport.testTwoLevelList` now pass an explicit `withNameMapping(MappingUtil.create(schema))` to the reader. - Add `TestGenericData.testReadWithoutFieldIdsOrNameMappingReturnsNullFields` to assert the new strict default behavior: when a parquet file has no field IDs and no `NameMapping` is supplied, projected fields read as null (rather than being bound by position as before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…itional-fallback Same root cause as the data-module migrations in ac9ee16: the test writes a parquet file via raw AvroParquetWriter (no Iceberg field IDs) and then reads it back without supplying a NameMapping. Pre-1.12 this worked via the NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED positional fallback (defaulted to true on upstream/main); after this PR's Parquet ReadBuilder restores NameMapping.empty() as the default, projected fields no longer bind by position and the assertion NPEs. Migrate all three flink versions (v1.20, v2.0, v2.1) to supply `withNameMapping(MappingUtil.create(schema))`, matching the strict default and aligning with the deprecation message's guidance to "Use name mapping instead". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After restoring `NameMapping.empty()` as the safe default in `Parquet.ReadBuilder` (commit 8881516), Spark tests that import external Parquet files via `SparkTableUtil.importSparkTable` and then read them through Iceberg fail field resolution: the imported files have no Iceberg field IDs and no `schema.name-mapping.default` property is auto-set, so the strict empty mapping returns null for every projected field. Mirror the established pattern from `AddFilesProcedure.ensureNameMappingPresent` and `BaseTableCreationSparkAction`: when the target Iceberg table does not already have `schema.name-mapping.default` set, auto-derive a name mapping from the target schema before importing. Subsequent reads via Iceberg pick up the property automatically through `BaseReader`. Applied identically to v3.5, v4.0, and v4.1. This honors the deprecation guidance Russell Spitzer documented when deprecating the NETFLIX_UNSAFE positional fallback ("Use name mapping instead") and brings `importSparkTable` in line with `AddFilesProcedure`, which has been doing this for some time. Verified on Spark 4.1 via the full `org.apache.iceberg.spark.source.*` test package (`-DtestParallelism=8`); previously-failing `TestIcebergSourceHadoopTables.testTableWithInt96Timestamp` and `TestIdentityPartitionData.testProjections` now pass with no collateral failures elsewhere. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ate TestSparkParquetReader fixtures After restoring `NameMapping.empty()` as the safe default in `Parquet.ReadBuilder` (commit 8881516), `IcebergGenerics.read(table)` started failing field resolution on tables whose data files lack Iceberg field IDs. Pre-1.12 the unsafe positional-fallback path masked this; now the strict default refuses to bind by position. `GenericReader` (data module) never consulted `schema.name-mapping.default` even though every Iceberg table that reads non-Iceberg-written files is expected to carry it. Spark's `BaseReader` already reads the property (`BaseReader.java:100`); bring `GenericReader` in line with that pattern so `IcebergGenerics.read(table)` auto-applies the mapping when present. Also migrate `TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly` (v3.5 / v4.0 / v4.1) — the test creates a `HadoopTables` table directly (bypassing `SparkTableUtil.importSparkTable`, so the auto-set in commit 9ebb273 doesn't reach it) and reads files written by raw Spark `ParquetWriteSupport` without Iceberg field IDs: - `rowsFromFile`: pass `.withNameMapping(MappingUtil.create(schema))` on the direct `Parquet.read(...)` call. - `tableFromInputFile`: set `schema.name-mapping.default` at table creation time so the `GenericReader` change above picks it up automatically. Verified: - `:iceberg-data:test` clean - Targeted `TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly` passes on v3.5, v4.0, v4.1 - `:iceberg-flink:iceberg-flink-2.1:test` full sweep clean (30m, `-DtestParallelism=8`) - Spotless / checkstyle / revapi clean across all versions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dramaticlly · 2026-06-26T05:30:29Z

Want to callout the potential behavior change due to removal of the SystemConfigs .NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED config (deprecated since 1.5.0), given #9324 suggested that we shall always use NameMapping.empty() as the the safe default for assigning column IDs by position if a file is missing IDs and there is no name mapping.

Before we have

iceberg/parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java

Lines 1556 to 1563 in cfebc8e

    
           NameMapping mapping; 
        
           if (nameMapping != null) { 
        
             mapping = nameMapping; 
        
           } else if (SystemConfigs.NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED.value()) { 
        
             mapping = null; 
        
           } else { 
        
             mapping = NameMapping.empty(); 
        
           }

, after this PR the default change from null to NameMapping.empty()

`mapping` value	Behavior when file has no field IDs
`null`	unsafe positional ID assignment
`NameMapping.empty()`	strict resolution against an empty mapping, projected fields read as null / fail

Test migrations off positional fallback

These tests wrote Parquet files via raw AvroParquetWriter / ParquetWriteSupport (no Iceberg field IDs) and read them back without a NameMapping. They had been passing only because the default for unset NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED was true. With the strict default in place, each is migrated to supply withNameMapping(MappingUtil.create(schema))

ac9ee16a64 TestGenericData.testTwoLevelList, TestParquetEncryptionWithWriteSupport.testTwoLevelList + a new TestGenericData.testReadWithoutFieldIdsOrNameMappingReturnsNullFields that asserts the new strict-default behavior (previously no test covered the NameMapping.empty() path that the PR is shipping).
8da506219d TestFlinkParquetReader.testTwoLevelList across v1.20, v2.0, v2.1.
8846661ea7 TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly across v3.5, v4.0, v4.1.

Production fixes auto-applying the table's `schema.name-mapping.default`

While migrating the Spark tests we found two real product gaps where Iceberg APIs were not consulting the well-known DEFAULT_NAME_MAPPING table property even though the comparable APIs already did. Both gaps used to be invisible because the same callers relied on the unsafe positional fallback; with that path gone, the gaps surface as test failures. Each commit brings the API into line with the existing pattern in AddFilesProcedure.ensureNameMappingPresent / BaseReader:

9ebb273258 SparkTableUtil.importSparkTable now auto-sets schema.name-mapping.default on the target table when not already set, matching the long-standing behavior of AddFilesProcedure and BaseTableCreationSparkAction. Without this, every user who imported a non-Iceberg Parquet table needed to manually configure name mapping to read it back.
8846661ea7 GenericReader (used by IcebergGenerics.read(table)) now reads schema.name-mapping.default from the table properties and threads it through ReadBuilder.withNameMapping(...), matching what Spark's BaseReader does. Without this, even tables that do have the property set would still fail to apply it on the generic read path.

Migrate the vectorized dictionary-encoded parquet read test off unsafe positional-fallback ID assignment by attaching a NameMapping created from the schema to the Parquet read builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added spark parquet core data flink ORC AWS GCP KAFKACONNECT labels May 20, 2026

nastra reviewed May 22, 2026

View reviewed changes

dramaticlly force-pushed the 1.12deprecation branch from cadfa6b to d62346b Compare May 23, 2026 00:04

github-actions Bot removed the ORC label May 23, 2026

dramaticlly force-pushed the 1.12deprecation branch 2 times, most recently from ab4da8d to aa6430d Compare May 23, 2026 00:19

All: Remove deprecated methods for 1.12.0

0436357

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dramaticlly force-pushed the 1.12deprecation branch from aa6430d to 0436357 Compare May 25, 2026 17:08

dramaticlly and others added 5 commits May 25, 2026 12:04

AWS: Fix testApplySignerConfiguration missing required signer.endpoint

d2a0084

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dramaticlly marked this pull request as ready for review May 26, 2026 06:35

gaborkaszab mentioned this pull request May 26, 2026

Core: Remove deprecated APIs for 1.12.0 #16452

Open

5 tasks

nssalian added this to the Iceberg 1.12.0 milestone Jun 13, 2026

manuzhang mentioned this pull request Jun 20, 2026

Remove deprecated methods for 1.12.0 #16445

Open

manuzhang requested a review from nastra June 20, 2026 12:25

github-actions Bot added the OPENAPI label Jun 25, 2026

dramaticlly force-pushed the 1.12deprecation branch from 73bda60 to 13b5244 Compare June 25, 2026 07:31

dramaticlly force-pushed the 1.12deprecation branch from 13b5244 to 8881516 Compare June 25, 2026 07:32

nastra requested review from pvary and singhpk234 June 25, 2026 09:24

dramaticlly and others added 4 commits June 25, 2026 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

All: Remove deprecated methods for 1.12.0#16449

All: Remove deprecated methods for 1.12.0#16449
dramaticlly wants to merge 14 commits into
apache:mainfrom
dramaticlly:1.12deprecation

dramaticlly commented May 20, 2026 •

edited

Loading

Uh oh!

nastra May 22, 2026

Uh oh!

dramaticlly May 22, 2026

Uh oh!

dramaticlly commented May 26, 2026 •

edited

Loading

Uh oh!

pvary commented Jun 25, 2026

Uh oh!

dramaticlly commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

dramaticlly commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deleted Classes

Notable Non-Mechanical Changes

Uh oh!

nastra May 22, 2026

Choose a reason for hiding this comment

Uh oh!

dramaticlly May 22, 2026

Choose a reason for hiding this comment

Uh oh!

dramaticlly commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Jun 25, 2026

Uh oh!

dramaticlly commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test migrations off positional fallback

Production fixes auto-applying the table's schema.name-mapping.default

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dramaticlly commented May 20, 2026 •

edited

Loading

dramaticlly commented May 26, 2026 •

edited

Loading

dramaticlly commented Jun 26, 2026 •

edited

Loading

Production fixes auto-applying the table's `schema.name-mapping.default`