All: Remove deprecated methods for 1.12.0#16449
Conversation
| // TODO change to required in 1.12.0 | ||
| if (!properties().containsKey(S3_SIGNER_ENDPOINT) | ||
| && !properties().containsKey(RESTCatalogProperties.SIGNER_ENDPOINT)) { | ||
| LOG.warn( |
There was a problem hiding this comment.
this shouldn't be removed. Instead, this should go through a Preconditions.checkArgument check
There was a problem hiding this comment.
yeah I think RESTCatalogProperties.SIGNER_ENDPOINT is replaced with RESTCatalogProperties.SIGNER_URI and we can rely on LINE 206 to ensure its existence. Sorry it's a bit messy right now
cadfa6b to
d62346b
Compare
ab4da8d to
aa6430d
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
aa6430d to
0436357
Compare
…DeleteRowSchema - TestS3FileIOProperties: Add required signer.endpoint property since the deprecated default was removed - TestSparkWriterMetrics: Override checkRowStatistics and checkNotExistingRowStatistics to match behavior without positionDeleteRowSchema (consistent with Flink tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BaseScan.io() and PositionDelete.set(path, pos, row)/row() still have callers and cannot be removed yet. Restore their deprecation annotations to preserve the original removal intent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…io() The method had only 4 internal callers (DataScan, BaseDistributedDataScan) which are trivially replaced with table().io(). Also moves revapi entries to the 1.12.0 section with unified justification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ification All deprecated API removals for this release should be under the "1.12.0" version key with justification "Removing deprecated API scheduled for removal in 1.12.0", sorted alphabetically by class/method name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@nastra do you want to take another look as now CI is green. I leave the |
The `specsById` map carried on `BaseScanTaskResponse` and its builder is parser context, not part of the REST spec for scan-planning response models. It was deprecated in 1.11.0 (apache#14485) with a note that visibility would be reduced in 1.12.0. Demote `specsById()` getter and `withSpecsById()` setter (plus the builder's `deleteFiles()` getter) from public to protected so the parsers and subclasses can still use them while keeping the field out of the public API. To preserve cross-package construction paths used by `CatalogHandlers`, `RESTServerCatalogAdapter`, and tests in `org.apache.iceberg.rest`: - Add `builder(Map<Integer, PartitionSpec> specsById)` factory on each of `PlanTableScanResponse`, `FetchPlanningResultResponse`, and `FetchScanTasksResponse` for initial server-side construction. - Add `toBuilder()` on each response for copy-with-modification patterns (used when adapters rebuild a response to inject credentials or rewrite status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NSAFE branch
The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking
the name mapping for a vectorized read:
1. caller-provided nameMapping → use it
2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe
fallback to position-based field resolution)
3. default → use `NameMapping.empty()` (safe)
PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but
collapsed the default to `null` rather than `NameMapping.empty()`, making
unsafe position-based resolution the new default. That contradicts the
PR description ("always use NameMapping.empty() (the safe default)") and
silently regresses readers that don't supply an explicit mapping.
Restore the safe default by using `NameMapping.empty()` when no mapping
is provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NSAFE branch
The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking
the name mapping for a vectorized read:
1. caller-provided nameMapping → use it
2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe
fallback to position-based field resolution)
3. default → use `NameMapping.empty()` (safe)
PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but
collapsed the default to `null` rather than `NameMapping.empty()`, making
unsafe position-based resolution the new default. That contradicts the
PR description ("always use NameMapping.empty() (the safe default)") and
silently regresses readers that don't supply an explicit mapping.
Restore the safe default by using `NameMapping.empty()` when no mapping
is provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
73bda60 to
13b5244
Compare
…NSAFE branch
The prior 1.11 logic in `Parquet.ReadBuilder` had three branches when picking
the name mapping for a vectorized read:
1. caller-provided nameMapping → use it
2. `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED=true` → use null (unsafe
fallback to position-based field resolution)
3. default → use `NameMapping.empty()` (safe)
PR apache#16449 removed the deprecated NETFLIX_UNSAFE branch (case 2) but
collapsed the default to `null` rather than `NameMapping.empty()`, making
unsafe position-based resolution the new default. That contradicts the
PR description ("always use NameMapping.empty() (the safe default)") and
silently regresses readers that don't supply an explicit mapping.
Restore the safe default by using `NameMapping.empty()` when no mapping
is provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13b5244 to
8881516
Compare
`build.gradle` line 145 sets `oldVersion = "1.11.0"` — revapi compares the current code against the 1.11.0 released baseline. Section keys in `.palantir/revapi.yml` correspond to that baseline, so new breaks accepted during the 1.12.0 dev cycle belong under `"1.11.0"`, not under a not-yet-released `"1.12.0"` section. Evidence in-file: the `"1.10.0"` section already contains entries with justifications like "Removing deprecated code for 1.11.0" — entries added during the 1.11.0 dev cycle when `oldVersion` was `"1.10.0"`. Same shape applies now. This commit: - Renames the section header `"1.12.0"` -> `"1.11.0"`, merging the new entries from PR apache#16449 into the existing 1.11.0 section that was already on upstream/main (the partition-stats entries from PR apache#14998). - Restores the original justifications on the 17 partition-stats entries that were rewritten by an earlier fixup commit (`fb5ddae9b2 Core: Move partition stats revapi entries to 1.12.0`), back to "Removed deprecated functionality for partition stats". - Drops a duplicate `class org.apache.iceberg.PartitionStats` entry that the earlier fixup commit had introduced. Net effect: the cumulative PR diff against upstream/main is now additive only (new entries appended to the existing 1.11.0 section, no churn on existing entries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
We should remove them. Some of the calls are already with So I suggest that remove the functionality too - Definitely in another PR.
This is ok.
This is ok.
This is ok.
I don't see this "re-addition" in the PR, but it should be fine, as long as the |
After restoring `NameMapping.empty()` as the default in `Parquet.ReadBuilder` (commit 8881516), two data-module tests fail because they wrote parquet files via raw `AvroParquetWriter` (no Iceberg field IDs in the file metadata) and read them back without providing a `NameMapping`, implicitly relying on the unsafe positional-fallback ID assignment that the `NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED` config previously enabled by default. The deprecation message on that config explicitly said "Fallback ID assignment in Parquet is UNSAFE and will be removed in 1.12.0. Use name mapping instead." Honor that guidance: - `TestGenericData.testTwoLevelList` and `TestParquetEncryptionWithWriteSupport.testTwoLevelList` now pass an explicit `withNameMapping(MappingUtil.create(schema))` to the reader. - Add `TestGenericData.testReadWithoutFieldIdsOrNameMappingReturnsNullFields` to assert the new strict default behavior: when a parquet file has no field IDs and no `NameMapping` is supplied, projected fields read as null (rather than being bound by position as before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itional-fallback Same root cause as the data-module migrations in ac9ee16: the test writes a parquet file via raw AvroParquetWriter (no Iceberg field IDs) and then reads it back without supplying a NameMapping. Pre-1.12 this worked via the NETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLED positional fallback (defaulted to true on upstream/main); after this PR's Parquet ReadBuilder restores NameMapping.empty() as the default, projected fields no longer bind by position and the assertion NPEs. Migrate all three flink versions (v1.20, v2.0, v2.1) to supply `withNameMapping(MappingUtil.create(schema))`, matching the strict default and aligning with the deprecation message's guidance to "Use name mapping instead". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After restoring `NameMapping.empty()` as the safe default in `Parquet.ReadBuilder` (commit 8881516), Spark tests that import external Parquet files via `SparkTableUtil.importSparkTable` and then read them through Iceberg fail field resolution: the imported files have no Iceberg field IDs and no `schema.name-mapping.default` property is auto-set, so the strict empty mapping returns null for every projected field. Mirror the established pattern from `AddFilesProcedure.ensureNameMappingPresent` and `BaseTableCreationSparkAction`: when the target Iceberg table does not already have `schema.name-mapping.default` set, auto-derive a name mapping from the target schema before importing. Subsequent reads via Iceberg pick up the property automatically through `BaseReader`. Applied identically to v3.5, v4.0, and v4.1. This honors the deprecation guidance Russell Spitzer documented when deprecating the NETFLIX_UNSAFE positional fallback ("Use name mapping instead") and brings `importSparkTable` in line with `AddFilesProcedure`, which has been doing this for some time. Verified on Spark 4.1 via the full `org.apache.iceberg.spark.source.*` test package (`-DtestParallelism=8`); previously-failing `TestIcebergSourceHadoopTables.testTableWithInt96Timestamp` and `TestIdentityPartitionData.testProjections` now pass with no collateral failures elsewhere. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ate TestSparkParquetReader fixtures After restoring `NameMapping.empty()` as the safe default in `Parquet.ReadBuilder` (commit 8881516), `IcebergGenerics.read(table)` started failing field resolution on tables whose data files lack Iceberg field IDs. Pre-1.12 the unsafe positional-fallback path masked this; now the strict default refuses to bind by position. `GenericReader` (data module) never consulted `schema.name-mapping.default` even though every Iceberg table that reads non-Iceberg-written files is expected to carry it. Spark's `BaseReader` already reads the property (`BaseReader.java:100`); bring `GenericReader` in line with that pattern so `IcebergGenerics.read(table)` auto-applies the mapping when present. Also migrate `TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly` (v3.5 / v4.0 / v4.1) — the test creates a `HadoopTables` table directly (bypassing `SparkTableUtil.importSparkTable`, so the auto-set in commit 9ebb273 doesn't reach it) and reads files written by raw Spark `ParquetWriteSupport` without Iceberg field IDs: - `rowsFromFile`: pass `.withNameMapping(MappingUtil.create(schema))` on the direct `Parquet.read(...)` call. - `tableFromInputFile`: set `schema.name-mapping.default` at table creation time so the `GenericReader` change above picks it up automatically. Verified: - `:iceberg-data:test` clean - Targeted `TestSparkParquetReader.testInt96TimestampProducedBySparkIsReadCorrectly` passes on v3.5, v4.0, v4.1 - `:iceberg-flink:iceberg-flink-2.1:test` full sweep clean (30m, `-DtestParallelism=8`) - Spotless / checkstyle / revapi clean across all versions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Want to callout the potential behavior change due to removal of the Before we have iceberg/parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java Lines 1556 to 1563 in cfebc8e null to NameMapping.empty()
Test migrations off positional fallbackThese tests wrote Parquet files via raw
Production fixes auto-applying the table's
|
Migrate the vectorized dictionary-encoded parquet read test off unsafe positional-fallback ID assignment by attaching a NameMapping created from the schema to the Parquet read builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove all methods, classes, and fields marked with "@deprecated will be removed in 1.12.0" across all modules, since 1.11.0 has been released.
Deleted Classes
SystemProperties→ useSystemConfigsPartitionStats→ usePartitionStatisticsinterface /BasePartitionStatisticsDataReader→ usePlannedDataReaderGenericAppenderFactory→ useGenericFileWriterFactoryBaseFileWriterFactory→ useRegistryBasedFileWriterFactoryS3SignRequest,S3SignResponse,S3SignRequestParser,S3SignResponseParser,S3ObjectMapper→ useRemoteSign*equivalentsNotable Non-Mechanical Changes
Beyond deleting deprecated code, the following adjustments were needed. Items marked⚠️ reviewer attention depart from a strict deprecation removal and warrant a closer look:
PositionDelete.set(path, pos, row)androw()— Kept without@Deprecated. The public deprecation notice said position deletes with row data are no longer supported, but internal code (BaseTaskWriter,BasePositionDeltaWriter,RewriteTablePathUtil,Avro) still relies on these methods. Reviewer: confirm the methods are safe to retain as package-internal helpers, or whether a follow-up should narrow visibility / push removal to 1.13.0.BaseScan.io()— Kept as a non-deprecatedprotectedmethod. Internal subclasses (DataScan,BaseDistributedDataScan) use it; the deprecation targeted external callers only.HadoopFileIO.serializeConfWith()— Kept without@Deprecatedbecause theHadoopConfigurableinterface still declares the method. Also inlined the constructor logic since the removed constructor was the delegation target. Reviewer: the cleaner path is to deprecate the interface method (and theResolvingFileIO/SerializableFileIOWithSize× 3 implementations) for 1.13.0 removal, then drop theSerializationUtilinstanceof HadoopConfigurablebranch and theiceberg.mr.config.serialization.disabledopt-in — those are the only remaining consumers after Core: Move Hadoop conf serialization into SerializableConfiguration #15583 madeSerializableConfigurationitselfSerializableSupplier-compatible. Happy to land this as a follow-up.BaseParquetReaders.createStructReader(List, StructType, Integer)— Changed from a concrete method (that delegated to the removed 2-param version) toabstract. Both existing subclasses (InternalReader,GenericParquetReaders) already override it.RewriteTablePathUtil.PositionDeleteReaderWriter.writer()— The 4-arg method was adefaultdelegating to the removed 5-arg version. Made itabstract; updated all implementors (RewriteTablePathSparkActionin v3.5/v4.0/v4.1) to implement the 4-arg signature directly.S3V4RestSignerClient— Removed deprecated property fallback logic frombaseSignerUri(),endpoint(), andcheck(). The legacy properties (s3.signer.uri,s3.signer.endpoint) are no longer supported, andendpoint()now requiresRESTCatalogProperties.SIGNER_ENDPOINT(previously defaulted to"v1/aws/s3/sign"). Reviewer: this is a stricter behavior change than a pure removal — please confirm downstream catalog adapters / integration tests are aware they must now supplyRESTCatalogProperties.SIGNER_ENDPOINTexplicitly.Parquet.java— Removed theNETFLIX_UNSAFE_PARQUET_ID_FALLBACK_ENABLEDbranch; without the config, always useNameMapping.empty()(the safe default).MetricsConfig.forPositionDelete(Table)— All callers (FlinkAppenderFactory,RegistryBasedFileWriterFactory) updated to use the no-argforPositionDelete()since row-level position delete metrics are no longer supported.9.⚠️ GenericFileWriterFactory— Re-addedwriterPropertiesto the constructor and Builder since it was a public non-deprecated setter used byGenericAppenderHelperand kafka-connectRecordUtils. Reviewer: confirm this re-addition matches the intended public surface (i.e., the field was incidentally lost while removingGenericAppenderFactoryand should remain onGenericFileWriterFactory).