Skip to content

All: Remove deprecated methods for 1.12.0#16449

Open
dramaticlly wants to merge 14 commits into
apache:mainfrom
dramaticlly:1.12deprecation
Open

All: Remove deprecated methods for 1.12.0#16449
dramaticlly wants to merge 14 commits into
apache:mainfrom
dramaticlly:1.12deprecation

Conversation

@dramaticlly

@dramaticlly dramaticlly commented May 20, 2026

Copy link
Copy Markdown
Contributor

Remove all methods, classes, and fields marked with "@deprecated will be removed in 1.12.0" across all modules, since 1.11.0 has been released.

Deleted Classes

  • SystemProperties → use SystemConfigs
  • PartitionStats → use PartitionStatistics interface / BasePartitionStatistics
  • DataReader → use PlannedDataReader
  • GenericAppenderFactory → use GenericFileWriterFactory
  • BaseFileWriterFactory → use RegistryBasedFileWriterFactory
  • S3SignRequest, S3SignResponse, S3SignRequestParser, S3SignResponseParser, S3ObjectMapper → use RemoteSign* equivalents

Notable Non-Mechanical Changes

Beyond deleting deprecated code, the following adjustments were needed. Items marked ⚠️ reviewer attention depart from a strict deprecation removal and warrant a closer look:

  1. ⚠️ PositionDelete.set(path, pos, row) and row() — Kept without @Deprecated. The public deprecation notice said position deletes with row data are no longer supported, but internal code (BaseTaskWriter, BasePositionDeltaWriter, RewriteTablePathUtil, Avro) still relies on these methods. Reviewer: confirm the methods are safe to retain as package-internal helpers, or whether a follow-up should narrow visibility / push removal to 1.13.0.

  2. BaseScan.io() — Kept as a non-deprecated protected method. Internal subclasses (DataScan, BaseDistributedDataScan) use it; the deprecation targeted external callers only.

  3. ⚠️ HadoopFileIO.serializeConfWith() — Kept without @Deprecated because the HadoopConfigurable interface still declares the method. Also inlined the constructor logic since the removed constructor was the delegation target. Reviewer: the cleaner path is to deprecate the interface method (and the ResolvingFileIO / SerializableFileIOWithSize × 3 implementations) for 1.13.0 removal, then drop the SerializationUtil instanceof HadoopConfigurable branch and the iceberg.mr.config.serialization.disabled opt-in — those are the only remaining consumers after Core: Move Hadoop conf serialization into SerializableConfiguration #15583 made SerializableConfiguration itself SerializableSupplier-compatible. Happy to land this as a follow-up.

  4. BaseParquetReaders.createStructReader(List, StructType, Integer) — Changed from a concrete method (that delegated to the removed 2-param version) to abstract. Both existing subclasses (InternalReader, GenericParquetReaders) already override it.

  5. RewriteTablePathUtil.PositionDeleteReaderWriter.writer() — The 4-arg method was a default delegating to the removed 5-arg version. Made it abstract; updated all implementors (RewriteTablePathSparkAction in v3.5/v4.0/v4.1) to implement the 4-arg signature directly.

  6. ⚠️ S3V4RestSignerClient — Removed deprecated property fallback logic from baseSignerUri(), endpoint(), and check(). The legacy properties (s3.signer.uri, s3.signer.endpoint) are no longer supported, and endpoint() now requires RESTCatalogProperties.SIGNER_ENDPOINT (previously defaulted to "v1/aws/s3/sign"). Reviewer: this is a stricter behavior change than a pure removal — please confirm downstream catalog adapters / integration tests are aware they must now supply RESTCatalogProperties.SIGNER_ENDPOINT explicitly.

  7. Parquet.java — Removed the NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED branch; without the config, always use NameMapping.empty() (the safe default).

  8. MetricsConfig.forPositionDelete(Table) — All callers (FlinkAppenderFactory, RegistryBasedFileWriterFactory) updated to use the no-arg forPositionDelete() since row-level position delete metrics are no longer supported.

9. ⚠️ GenericFileWriterFactory — Re-added writerProperties to the constructor and Builder since it was a public non-deprecated setter used by GenericAppenderHelper and kafka-connect RecordUtils. Reviewer: confirm this re-addition matches the intended public surface (i.e., the field was incidentally lost while removing GenericAppenderFactory and should remain on GenericFileWriterFactory).

// TODO change to required in 1.12.0
if (!properties().containsKey(S3_SIGNER_ENDPOINT)
&& !properties().containsKey(RESTCatalogProperties.SIGNER_ENDPOINT)) {
LOG.warn(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be removed. Instead, this should go through a Preconditions.checkArgument check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think RESTCatalogProperties.SIGNER_ENDPOINT is replaced with RESTCatalogProperties.SIGNER_URI and we can rely on LINE 206 to ensure its existence. Sorry it's a bit messy right now

@github-actions github-actions Bot removed the ORC label May 23, 2026
@dramaticlly dramaticlly force-pushed the 1.12deprecation branch 2 times, most recently from ab4da8d to aa6430d Compare May 23, 2026 00:19
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dramaticlly and others added 5 commits May 25, 2026 12:04
…DeleteRowSchema

- TestS3FileIOProperties: Add required signer.endpoint property since the
  deprecated default was removed
- TestSparkWriterMetrics: Override checkRowStatistics and
  checkNotExistingRowStatistics to match behavior without
  positionDeleteRowSchema (consistent with Flink tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BaseScan.io() and PositionDelete.set(path, pos, row)/row() still have
callers and cannot be removed yet. Restore their deprecation annotations
to preserve the original removal intent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…io()

The method had only 4 internal callers (DataScan, BaseDistributedDataScan)
which are trivially replaced with table().io(). Also moves revapi entries
to the 1.12.0 section with unified justification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ification

All deprecated API removals for this release should be under the "1.12.0"
version key with justification "Removing deprecated API scheduled for
removal in 1.12.0", sorted alphabetically by class/method name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dramaticlly dramaticlly marked this pull request as ready for review May 26, 2026 06:35
@dramaticlly

dramaticlly commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

@nastra do you want to take another look as now CI is green. I leave the PositionDelete.set(path, pos, row) and row() as is given there's nontrivial amount of caller still use the Deprecated method, plan to handle in a follow up PR

The `specsById` map carried on `BaseScanTaskResponse` and its builder is
parser context, not part of the REST spec for scan-planning response
models. It was deprecated in 1.11.0 (apache#14485) with a note that visibility
would be reduced in 1.12.0.

Demote `specsById()` getter and `withSpecsById()` setter (plus the
builder's `deleteFiles()` getter) from public to protected so the
parsers and subclasses can still use them while keeping the field out of
the public API.

To preserve cross-package construction paths used by `CatalogHandlers`,
`RESTServerCatalogAdapter`, and tests in `org.apache.iceberg.rest`:

- Add `builder(Map<Integer, PartitionSpec> specsById)` factory on each
  of `PlanTableScanResponse`, `FetchPlanningResultResponse`, and
  `FetchScanTasksResponse` for initial server-side construction.
- Add `toBuilder()` on each response for copy-with-modification patterns
  (used when adapters rebuild a response to inject credentials or
  rewrite status).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dramaticlly added a commit to dramaticlly/iceberg that referenced this pull request Jun 25, 2026
…NSAFE branch

The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking
the name mapping for a vectorized read:

  1. caller-provided nameMapping → use it
  2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe
     fallback to position-based field resolution)
  3. default → use `NameMapping.empty()` (safe)

PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but
collapsed the default to `null` rather than `NameMapping.empty()`, making
unsafe position-based resolution the new default. That contradicts the
PR description ("always use NameMapping.empty() (the safe default)") and
silently regresses readers that don't supply an explicit mapping.

Restore the safe default by using `NameMapping.empty()` when no mapping
is provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dramaticlly added a commit to dramaticlly/iceberg that referenced this pull request Jun 25, 2026
…NSAFE branch

The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking
the name mapping for a vectorized read:

  1. caller-provided nameMapping → use it
  2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe
     fallback to position-based field resolution)
  3. default → use `NameMapping.empty()` (safe)

PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but
collapsed the default to `null` rather than `NameMapping.empty()`, making
unsafe position-based resolution the new default. That contradicts the
PR description ("always use NameMapping.empty() (the safe default)") and
silently regresses readers that don't supply an explicit mapping.

Restore the safe default by using `NameMapping.empty()` when no mapping
is provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NSAFE branch

The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking
the name mapping for a vectorized read:

  1. caller-provided nameMapping → use it
  2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe
     fallback to position-based field resolution)
  3. default → use `NameMapping.empty()` (safe)

PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but
collapsed the default to `null` rather than `NameMapping.empty()`, making
unsafe position-based resolution the new default. That contradicts the
PR description ("always use NameMapping.empty() (the safe default)") and
silently regresses readers that don't supply an explicit mapping.

Restore the safe default by using `NameMapping.empty()` when no mapping
is provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`build.gradle` line 145 sets `oldVersion = "1.11.0"` — revapi compares
the current code against the 1.11.0 released baseline. Section keys
in `.palantir/revapi.yml` correspond to that baseline, so new
breaks accepted during the 1.12.0 dev cycle belong under `"1.11.0"`,
not under a not-yet-released `"1.12.0"` section.

Evidence in-file: the `"1.10.0"` section already contains entries with
justifications like "Removing deprecated code for 1.11.0" — entries added
during the 1.11.0 dev cycle when `oldVersion` was `"1.10.0"`. Same shape
applies now.

This commit:
- Renames the section header `"1.12.0"` -> `"1.11.0"`, merging the new
  entries from PR apache#16449 into the existing 1.11.0 section that was
  already on upstream/main (the partition-stats entries from PR apache#14998).
- Restores the original justifications on the 17 partition-stats entries
  that were rewritten by an earlier fixup commit
  (`fb5ddae9b2 Core: Move partition stats revapi entries to 1.12.0`),
  back to "Removed deprecated functionality for partition stats".
- Drops a duplicate `class org.apache.iceberg.PartitionStats` entry
  that the earlier fixup commit had introduced.

Net effect: the cumulative PR diff against upstream/main is now
additive only (new entries appended to the existing 1.11.0 section,
no churn on existing entries).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nastra nastra requested review from pvary and singhpk234 June 25, 2026 09:24
@pvary

pvary commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

⚠️ PositionDelete.set(path, pos, row) and row() — Kept without @deprecated. The public deprecation notice said position deletes with row data are no longer supported, but internal code (BaseTaskWriter, BasePositionDeltaWriter, RewriteTablePathUtil, Avro) still relies on these methods. Reviewer: confirm the methods are safe to retain as package-internal helpers, or whether a follow-up should narrow visibility / push removal to 1.13.0.

We should remove them. Some of the calls are already with null, so that should not be an issue, the others are not something where we can communicate the deprecation with annotation. That is why we added https://iceberg.apache.org/spec/#position-delete-files-with-row-data.

So I suggest that remove the functionality too - Definitely in another PR.

BaseParquetReaders.createStructReader(List, StructType, Integer) — Changed from a concrete method (that delegated to the removed 2-param version) to abstract. Both existing subclasses (InternalReader, GenericParquetReaders) already override it.

This is ok.

RewriteTablePathUtil.PositionDeleteReaderWriter.writer() — The 4-arg method was a default delegating to the removed 5-arg version. Made it abstract; updated all implementors (RewriteTablePathSparkAction in v3.5/v4.0/v4.1) to implement the 4-arg signature directly.

This is ok.

MetricsConfig.forPositionDelete(Table) — All callers (FlinkAppenderFactory, RegistryBasedFileWriterFactory) updated to use the no-arg forPositionDelete() since row-level position delete metrics are no longer supported.

This is ok.

GenericFileWriterFactory — Re-added writerProperties to the constructor and Builder since it was a public non-deprecated setter used by GenericAppenderHelper and kafka-connect RecordUtils. Reviewer: confirm this re-addition matches the intended public surface (i.e., the field was incidentally lost while removing GenericAppenderFactory and should remain on GenericFileWriterFactory).

I don't see this "re-addition" in the PR, but it should be fine, as long as the positionDeleteRowSchema is not added back

dramaticlly and others added 4 commits June 25, 2026 09:36
After restoring `NameMapping.empty()` as the default in `Parquet.ReadBuilder`
(commit 8881516), two data-module tests fail because they wrote parquet
files via raw `AvroParquetWriter` (no Iceberg field IDs in the file metadata)
and read them back without providing a `NameMapping`, implicitly relying on
the unsafe positional-fallback ID assignment that the
`NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED` config previously enabled by
default. The deprecation message on that config explicitly said
"Fallback ID assignment in Parquet is UNSAFE and will be removed in 1.12.0.
Use name mapping instead."

Honor that guidance:

- `TestGenericData.testTwoLevelList` and
  `TestParquetEncryptionWithWriteSupport.testTwoLevelList` now pass an
  explicit `withNameMapping(MappingUtil.create(schema))` to the reader.
- Add `TestGenericData.testReadWithoutFieldIdsOrNameMappingReturnsNullFields`
  to assert the new strict default behavior: when a parquet file has no
  field IDs and no `NameMapping` is supplied, projected fields read as null
  (rather than being bound by position as before).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itional-fallback

Same root cause as the data-module migrations in ac9ee16: the test
writes a parquet file via raw AvroParquetWriter (no Iceberg field IDs)
and then reads it back without supplying a NameMapping. Pre-1.12 this
worked via the NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED positional
fallback (defaulted to true on upstream/main); after this PR's Parquet
ReadBuilder restores NameMapping.empty() as the default, projected
fields no longer bind by position and the assertion NPEs.

Migrate all three flink versions (v1.20, v2.0, v2.1) to supply
`withNameMapping(MappingUtil.create(schema))`, matching the strict
default and aligning with the deprecation message's guidance to
"Use name mapping instead".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After restoring `NameMapping.empty()` as the safe default in `Parquet.ReadBuilder`
(commit 8881516), Spark tests that import external Parquet files via
`SparkTableUtil.importSparkTable` and then read them through Iceberg fail
field resolution: the imported files have no Iceberg field IDs and no
`schema.name-mapping.default` property is auto-set, so the strict empty
mapping returns null for every projected field.

Mirror the established pattern from `AddFilesProcedure.ensureNameMappingPresent`
and `BaseTableCreationSparkAction`: when the target Iceberg table does not
already have `schema.name-mapping.default` set, auto-derive a name mapping
from the target schema before importing. Subsequent reads via Iceberg pick
up the property automatically through `BaseReader`.

Applied identically to v3.5, v4.0, and v4.1.

This honors the deprecation guidance Russell Spitzer documented when
deprecating the NETFLIX_UNSAFE positional fallback ("Use name mapping
instead") and brings `importSparkTable` in line with `AddFilesProcedure`,
which has been doing this for some time.

Verified on Spark 4.1 via the full `org.apache.iceberg.spark.source.*`
test package (`-DtestParallelism=8`); previously-failing
`TestIcebergSourceHadoopTables.testTableWithInt96Timestamp` and
`TestIdentityPartitionData.testProjections` now pass with no
collateral failures elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ate TestSparkParquetReader fixtures

After restoring `NameMapping.empty()` as the safe default in `Parquet.ReadBuilder`
(commit 8881516), `IcebergGenerics.read(table)` started failing field
resolution on tables whose data files lack Iceberg field IDs. Pre-1.12 the
unsafe positional-fallback path masked this; now the strict default refuses
to bind by position.

`GenericReader` (data module) never consulted `schema.name-mapping.default`
even though every Iceberg table that reads non-Iceberg-written files is
expected to carry it. Spark's `BaseReader` already reads the property
(`BaseReader.java:100`); bring `GenericReader` in line with that pattern so
`IcebergGenerics.read(table)` auto-applies the mapping when present.

Also migrate `TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly`
(v3.5 / v4.0 / v4.1) — the test creates a `HadoopTables` table directly
(bypassing `SparkTableUtil.importSparkTable`, so the auto-set in commit
9ebb273 doesn't reach it) and reads files written by raw Spark
`ParquetWriteSupport` without Iceberg field IDs:

- `rowsFromFile`: pass `.withNameMapping(MappingUtil.create(schema))` on
  the direct `Parquet.read(...)` call.
- `tableFromInputFile`: set `schema.name-mapping.default` at table creation
  time so the `GenericReader` change above picks it up automatically.

Verified:
- `:iceberg-data:test` clean
- Targeted `TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly`
  passes on v3.5, v4.0, v4.1
- `:iceberg-flink:iceberg-flink-2.1:test` full sweep clean (30m,
  `-DtestParallelism=8`)
- Spotless / checkstyle / revapi clean across all versions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dramaticlly

dramaticlly commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Want to callout the potential behavior change due to removal of the SystemConfigs .NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED config (deprecated since 1.5.0), given #9324 suggested that we shall always use NameMapping.empty() as the the safe default for assigning column IDs by position if a file is missing IDs and there is no name mapping.

Before we have

NameMapping mapping;
if (nameMapping != null) {
mapping = nameMapping;
} else if (SystemConfigs.NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED.value()) {
mapping = null;
} else {
mapping = NameMapping.empty();
}
, after this PR the default change from null to NameMapping.empty()

mapping value Behavior when file has no field IDs
null unsafe positional ID assignment
NameMapping.empty() strict resolution against an empty mapping, projected fields read as null / fail

Test migrations off positional fallback

These tests wrote Parquet files via raw AvroParquetWriter / ParquetWriteSupport (no Iceberg field IDs) and read them back without a NameMapping. They had been passing only because the default for unset NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED was true. With the strict default in place, each is migrated to supply withNameMapping(MappingUtil.create(schema))

  • ac9ee16a64 TestGenericData.testTwoLevelList, TestParquetEncryptionWithWriteSupport.testTwoLevelList + a new TestGenericData.testReadWithoutFieldIdsOrNameMappingReturnsNullFields that asserts the new strict-default behavior (previously no test covered the NameMapping.empty() path that the PR is shipping).
  • 8da506219d TestFlinkParquetReader.testTwoLevelList across v1.20, v2.0, v2.1.
  • 8846661ea7 TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly across v3.5, v4.0, v4.1.

Production fixes auto-applying the table's schema.name-mapping.default

While migrating the Spark tests we found two real product gaps where Iceberg APIs were not consulting the well-known DEFAULT_NAME_MAPPING table property even though the comparable APIs already did. Both gaps used to be invisible because the same callers relied on the unsafe positional fallback; with that path gone, the gaps surface as test failures. Each commit brings the API into line with the existing pattern in AddFilesProcedure.ensureNameMappingPresent / BaseReader:

  • 9ebb273258 SparkTableUtil.importSparkTable now auto-sets schema.name-mapping.default on the target table when not already set, matching the long-standing behavior of AddFilesProcedure and BaseTableCreationSparkAction. Without this, every user who imported a non-Iceberg Parquet table needed to manually configure name mapping to read it back.
  • 8846661ea7 GenericReader (used by IcebergGenerics.read(table)) now reads schema.name-mapping.default from the table properties and threads it through ReadBuilder.withNameMapping(...), matching what Spark's BaseReader does. Without this, even tables that do have the property set would still fail to apply it on the generic read path.

Migrate the vectorized dictionary-encoded parquet read test off
unsafe positional-fallback ID assignment by attaching a NameMapping
created from the schema to the Parquet read builder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants