Skip to content

Merge tag 'REL_16_9' into Cloudberry#1760

Open
chenjinbao1989 wants to merge 5739 commits into
apache:mainfrom
chenjinbao1989:cbdb-pg16-merge
Open

Merge tag 'REL_16_9' into Cloudberry#1760
chenjinbao1989 wants to merge 5739 commits into
apache:mainfrom
chenjinbao1989:cbdb-pg16-merge

Conversation

@chenjinbao1989
Copy link
Copy Markdown
Contributor

Merge upstream PostgreSQL REL_16_9 into Cloudberry, upgrading the
kernel from PG 14.4 to PG 16.9. This spans two major versions (PG 15
and PG 16), bringing in 5730 commits.

Key features and changes, sorted by importance:

  1. MERGE command

    • SQL-standard MERGE for INSERT/UPDATE/DELETE in a single statement.
    • ruleutils decompilation, PlaceHolderVar support in actions
    • Fixes for partitioned tables, cross-partition triggers, RLS,
    • EvalPlanQual, self-modified tuples, pg_stat_statements
  2. Logical replication enhancements

    • Built-in support for prepared transactions (2PC)
    • Enable two-phase via replication protocol
    • Streaming changes after speculative aborts
    • New pg_create_subscription predefined role
    • Parallel apply workers (PG 16)
    • Row/column filtering, schema-level publication (PG 15)
    • Critical fix for data loss in logical replication
    • Slot invalidation, snapshot, and memory leak fixes
  3. SQL/JSON

    • IS JSON predicate
    • Standard JSON constructor functions (json_array, json_object,
      json_scalar, json_serialize)
    • Auto-generated query jumbling via gen_node_support.pl
    • Fix json_array() subquery double transformation
  4. WAL and storage subsystem

    • LZ4 compression for full-page writes (FPW)
    • Custom WAL resource managers
    • Skip WAL recycling/preallocation during archive recovery
    • XLOG_FPI_FOR_HINT honors full_page_writes setting
    • Direct I/O support on macOS
    • Buffer manager infrastructure refactor for faster relation extension
    • New smgrzeroextend(), FileZero(), FileFallocate()
    • Remove HeapBitmapScan skip_fetch optimization (correctness issue)
    • WAL/Recovery separation: xlogrecovery.c, xlogprefetcher.c (PG 15)
    • Backup sink architecture (PG 15)
  5. Query optimizer / executor
    Performance:

    • Hash table acceleration for NOT IN(values)
    • Datum sorts for single-column sorts
    • Result Cache renamed to Memoize with ongoing fixes
    • Parallel Hash Full Join
    • Reduced planning cost for deeply-nested views
    • Fewer pallocs when building partition bounds
    • Outer join optimization enhancement (PG 16)
    • Symmetric hash join optimization (PG 16)
      Critical fixes:
    • Planner failure to identify multiple hashable ScalarArrayOpExprs
    • Consistent whole-row Var construction in parsing and planning
    • WindowAgg evaluation crash
    • WITH RECURSIVE UNION assert failure
    • ORDER BY / DISTINCT aggregates with FILTER
    • setrefs.c missing expression processing on prune steps
    • Parallel Hash Join extreme skew detection
  6. pgstat monitoring enhancements
    pgstat shared memory rewrite (PG 15):

    • Stats collector process removed; stats now in shared memory
    • Monolithic pgstat.c split into 14 files under utils/activity/
      pg_stat_io (PG 16 new view):
    • Detailed I/O statistics: shared buffer hits, I/O times, writeback
      pg_stat_statements:
    • JIT counters, temp file block I/O timing
    • Utility query string normalization, MERGE support
    • 32-bit integer overflow fix
      Other:
    • pg_stat_wal time accumulation as instr_time
    • Macro-generated pg_stat_get*() functions for tables and databases
    • Enhanced pg_stat_reset_single_table_counters
    • SP-GiST index scans counted in pg_stat
    • New test helpers: pg_stat_force_next_flush(), pg_stat_have_stats()
  7. libpq and client protocol

    • New PQsendFlushRequest
    • Pipeline mode state machine fix
    • SASL code refactored to generic interface
    • Escape function fixes for invalid encoding data (security)
    • PQescapeLiteral()/PQescapeIdentifier() length handling fix
    • New pg_encoding_set_invalid()
    • Build-time check that libpq doesn't call exit()/abort()
  8. Security and privilege model

    • CREATEROLE privilege restrictions (major security improvement)
    • New GUCs: createrole_self_grant, reserved_connections
    • New predefined roles: pg_maintain, pg_create_subscription
    • Non-superuser predefined roles for vacuum/analyze
    • Revoke PUBLIC CREATE from public schema (now owned by pg_database_owner)
    • Security invoker views (SECURITY INVOKER)
    • session_authorization and role interaction fixes
    • scram_SaltedPassword() integer overflow fix
  9. pg_dump / pg_upgrade / pg_basebackup toolchain
    pg_dump:

    • Generic compression API, zstd support, LZ4 frame-only format
      pg_basebackup:
    • Extended compression options, server-side compression with -Fp
    • Client-side LZ4 decompression, parallel zstd compression
      pg_upgrade:
    • Fix unintentional 'NULL' string literal
    • Fix XMLSERIALIZE(NO INDENT) cross-version upgrade
    • pg_dumpall handling of dangling OIDs in pg_auth_members
  10. Partitioned table improvements

    • Self-referencing FKs in partitioned tables
    • Detach partition with top-level FK fix
    • Reset relhassubclass on ATTACH
    • Trigger rename consistency, preserve firing state on clone
    • Disallow partitionwise join/grouping on collation mismatch
  11. ICU and collation

    • Build ICU support by default
    • initdb uses uloc_getDefault()
    • CREATE DATABASE LOCALE applies to all collation providers
  12. Node support function auto-generation (FIXME)

    • gen_node_support.pl auto-generates copyfuncs/equalfuncs/outfuncs/readfuncs
    • Query jumbling code auto-generation
    • NodeTag ABI stability check
  13. Global renames and structural changes

    • RelFileNode -> RelFileLocator (pervasive rename across hundreds of files)
    • GUC system split: guc.c -> guc.c + guc_tables.c + guc_funcs.c
    • varatt.h separated from postgres.h
    • Test framework: PostgresNode.pm -> PostgreSQL::Test::Cluster
  14. SQL types and functions

    • Numeric scale allows negative or greater than precision
    • New ANY_VALUE aggregate function
    • unnest(multirange), range_agg with multirange inputs
    • pg_size_pretty/pg_size_bytes support petabytes
    • ALTER TABLE SET ACCESS METHOD
    • SYSTEM_USER function
    • numeric_mul() overflow fix
  15. psql improvements

    • PSQL_WATCH_PAGER for \watch command
    • New \drg command for role grants
    • \copy from sends data in larger chunks
    • pg_waldump --rmgr multiple specification
  16. Critical data corruption and crash fixes

    • Fix data corruption when relation truncation fails
    • Fail instead of corrupting page header on non-LP_NORMAL TID update
    • Fix unsafe BufferDescriptors access
    • GB18030 SIGSEGV from out-of-bounds read
    • Snowball stemmer null pointer dereference after OOM
    • Rare standby assertion failure on primary restart
    • catcache invalidation during list entry construction
  17. Vacuum / Autovacuum

    • Parallel VACUUM (PG 16, vacuumparallel.c new file)
    • ON COMMIT DELETE ROWS avoids ERROR after relhassubclass=f
    • Use WaitLatch() instead of pg_usleep() at end-of-vacuum truncation
    • Prevent numeric overflow in parallel numeric aggregates
    • ANALYZE preserves relhasindex for partitioned tables

Co-authored-by: liushengsong lss602726449@gmail.com
Co-authored-by: reshke reshke@double.cloud
Co-authored-by: Hao Wu gfphoenix78@gmail.com
Co-authored-by: Jianghua.yjh yjhjstz@gmail.com
Co-authored-by: Dianjin Wang wangdianjin@gmail.com

michaelpq and others added 30 commits November 11, 2024 10:19
This commit changes libpq so that errors reported by the backend during
the protocol negotiation for SSL and GSS are discarded by the client, as
these may include bytes that could be consumed by the client and write
arbitrary bytes to a client's terminal.

A failure with the SSL negotiation now leads to an error immediately
reported, without a retry on any other methods allowed, like a fallback
to a plaintext connection.

A failure with GSS discards the error message received, and we allow a
fallback as it may be possible that the error is caused by a connection
attempt with a pre-11 server, GSS encryption having been introduced in
v12.  This was a problem only with v17 and newer versions; older
versions discard the error message already in this case, assuming a
failure caused by a lack of support for GSS encryption.

Author: Jacob Champion
Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Michael Paquier
Security: CVE-2024-10977
Backpatch-through: 12
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 2bf252d27e0167b62b663baaab5e9b4c773ba9de
Many process environment variables (e.g. PATH), bypass the containment
expected of a trusted PL.  Hence, trusted PLs must not offer features
that achieve setenv().  Otherwise, an attacker having USAGE privilege on
the language often can achieve arbitrary code execution, even if the
attacker lacks a database server operating system user.

To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just
replaces each modification attempt with a warning.  Sites that reach
these warnings should evaluate the application-specific implications of
proceeding without the environment modification:

  Can the application reasonably proceed without the modification?

    If no, switch to plperlu or another approach.

    If yes, the application should change the code to stop attempting
    environment modifications.  If that's too difficult, add "untie
    %main::ENV" in any code executed before the warning.  For example,
    one might add it to the start of the affected function or even to
    the plperl.on_plperl_init setting.

In passing, link to Perl's guidance about the Perl features behind the
security posture of PL/Perl.

Back-patch to v12 (all supported versions).

Andrew Dunstan and Noah Misch

Security: CVE-2024-10979
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it.  This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.

Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Backpatch-through: 12
The SQL spec mandates that SET SESSION AUTHORIZATION implies
SET ROLE NONE.  We tried to implement that within the lowest-level
functions that manipulate these settings, but that was a bad idea.
In particular, guc.c assumes that it doesn't matter in what order
it applies GUC variable updates, but that was not the case for these
two variables.  This problem, compounded by some hackish attempts to
work around it, led to some security-grade issues:

* Rolling back a transaction that had done SET SESSION AUTHORIZATION
would revert to SET ROLE NONE, even if that had not been the previous
state, so that the effective user ID might now be different from what
it had been.

* The same for SET SESSION AUTHORIZATION in a function SET clause.

* If a parallel worker inspected current_setting('role'), it saw
"none" even when it should see something else.

Also, although the parallel worker startup code intended to cope
with the current role's pg_authid row having disappeared, its
implementation of that was incomplete so it would still fail.

Fix by fully separating the miscinit.c functions that assign
session_authorization from those that assign role.  To implement the
spec's requirement, teach set_config_option itself to perform "SET
ROLE NONE" when it sets session_authorization.  (This is undoubtedly
ugly, but the alternatives seem worse.  In particular, there's no way
to do it within assign_session_authorization without incompatible
changes in the API for GUC assign hooks.)  Also, improve
ParallelWorkerMain to directly set all the relevant user-ID variables
instead of relying on some of them to get set indirectly.  That
allows us to survive not finding the pg_authid row during worker
startup.

In v16 and earlier, this includes back-patching 9987a7bf3 which
fixed a violation of GUC coding rules: SetSessionAuthorization
is not an appropriate place to be throwing errors from.

Security: CVE-2024-10978
meson makes the backslashes in text2macro.pl's --strip argument
into forward slashes, effectively disabling comment stripping.
That hasn't caused us issues before, but it breaks the test case
for b7e3a52a8.  We don't really need the pattern to be adjustable,
so just hard-wire it into the script instead.

Context: mesonbuild/meson#1564
Security: CVE-2024-10979
Ooops, missed that v16 has another text2macro call in the MSVC scripts.

Security: CVE-2024-10979
v16 commit 8fe3e69 used REGRESS_OPTS in
a way needing this.  That broke "vcregress plcheck".  Back-patch
v16..v12; newer versions don't have this build system.
TestUpgradeXversion knows how to make the main regression database's
references to pg_regress.so be version-independent.  But it doesn't
do that for plperl's database, so that the C function added by
commit b7e3a52a8 is causing cross-version upgrade test failures.
Path of least resistance is to just drop the function at the end
of the new test.

In <= v14, also take the opportunity to clean up the generated
test files.

Security: CVE-2024-10979
…cks.

Commit 5a2fed911 had an unexpected side-effect: the parallel worker
launched for the new test case would fail if it couldn't use a
superuser-reserved connection slot.  The reason that test failed
while all our pre-existing ones worked is that the connection
privilege tests in InitPostgres had been based on the superuserness
of the leader's AuthenticatedUserId, but after the rearrangements
of 5a2fed911 we were testing the superuserness of CurrentUserId,
which the new test case deliberately made to be a non-superuser.

This all seems very accidental and probably not the behavior we really
want, but a security patch is no time to be redesigning things.
Pending some discussion about desirable semantics, hack it so that
InitPostgres continues to pay attention to the superuserness of
AuthenticatedUserId when starting a parallel worker.

Nathan Bossart and Tom Lane, per buildfarm member sawshark.

Security: CVE-2024-10978
The current code calls array_eq() and does not provide FmgrInfo.  This commit
provides initialization of FmgrInfo and uses C collation as the safe option
for text comparison because we don't know anything about the semantics of
opclass options.

Backpatch to 13, where opclass options were introduced.

Reported-by: Nicolas Maus
Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org
Backpatch-through: 13
Maintain the pg_stat_user_indexes.idx_scan pgstat counter during
contrib/Bloom index scans.

Oversight in commit 9ee014f, which added the Bloom index contrib
module.

Author: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/c48839d881388ee401a01807c686004d@oss.nttdata.com
Backpatch: 13- (all supported branches).
This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid in
the event when it is reused: an entry may refer to a different object
but requires the same hash key.  This can happen with various stats
kinds, like:
- Replication slots that compute internally an index number, for
different slot names.
- Stats kinds that use an OID in the object key, where a wraparound
causes the same key to be used if an OID is used for the same object.
- As of PostgreSQL 18, custom pgstats kinds could also be an issue,
depending on their implementation.

This issue is fixed by introducing a counter called "generation" in the
shared entries via PgStatShared_HashEntry, initialized at 0 when an
entry is created and incremented when the same entry is reused, to avoid
concurrent issues on drop because of other backends still holding a
reference to it.  This "generation" is copied to the local copy that a
backend holds when looking at an object, then cross-checked with the
shared entry to make sure that the entry is not dropped even if its
"refcount" justifies that if it has been reused.

This problem could show up when a backend shuts down and needs to
discard any entries it still holds, causing statistics to be removed
when they should not, or even an assertion failure.  Another report
involved a failure in a standby after an OID wraparound, where the
startup process would FATAL on a "can only drop stats once", stopping
recovery abruptly.  The buildfarm has been sporadically complaining
about the problem, as well, but the window is hard to reach with the
in-core tests.

Note that the issue can be reproduced easily by adding a sleep before
dshash_find() in pgstat_release_entry_ref() to enlarge the problematic
window while repeating test_decoding's isolation test oldest_xmin a
couple of times, for example, as pointed out by Alexander Lakhin.

Reported-by: Alexander Lakhin, Peter Smith
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com
Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org
Backpatch-through: 15
Previously, in unlucky cases, it was possible for pg_rewind to remove
certain WAL segments from the rewound demoted primary.  In particular
this happens if those files have been marked for archival (i.e., their
.ready files were created) but not yet archived; the newly promoted node
no longer has such files because of them having been recycled, but they
are likely critical for recovery in the demoted node.  If pg_rewind
removes them, recovery is not possible anymore.

Fix this by maintaining a hash table of files in this situation in the
scan that looks for a checkpoint, which the decide_file_actions phase
can consult so that it knows to preserve them.

Backpatch to 14.  The problem also exists in 13, but that branch was not
blessed with commit eb00f1d, so this patch is difficult to apply
there.  Users of older releases will just have to continue to be extra
careful when rewinding.

Co-authored-by: Полина Бунгина (Polina Bungina) <bungina@gmail.com>
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/CAAtGL4AhzmBRsEsaDdz7065T+k+BscNadfTqP1NcPmsqwA5HBw@mail.gmail.com
In commit 08c0d6a which introduced "rainbow" arcs in regex NFAs,
I didn't think terribly hard about what to do when creating the color
complement of a rainbow arc.  Clearly, the complement cannot match any
characters, and I took the easy way out by just not building any arcs
at all in the complement arc set.  That mostly works, but Nikolay
Shaplov found a case where it doesn't: if we decide to delete that
sub-NFA later because it's inside a "{0}" quantifier, delsub()
suffered an assertion failure.  That's because delsub() relies on
the target sub-NFA being fully connected.  That was always true
before, and the best fix seems to be to restore that property.
Hence, invent a new arc type CANTMATCH that can be generated in
place of an empty color complement, and drop it again later when we
start NFA optimization.  (At that point we don't need to do delsub()
any more, and besides there are other cases where NFA optimization can
lead to disconnected subgraphs.)

It appears that this bug has no consequences in a non-assert-enabled
build: there will be some transiently leaked NFA states/arcs, but
they'll get cleaned up eventually.  Still, we don't like assertion
failures, so back-patch to v14 where rainbow arcs were introduced.

Per bug #18708 from Nikolay Shaplov.

Discussion: https://postgr.es/m/18708-f94f2599c9d2c005@postgresql.org
…kwards.

Previously LogicalIncreaseRestartDecodingForSlot() accidentally
accepted any LSN as the candidate_lsn and candidate_valid after the
restart_lsn of the replication slot was updated, so it potentially
caused the restart_lsn to move backwards.

A scenario where this could happen in logical replication is: after a
logical replication restart, based on previous candidate_lsn and
candidate_valid values in memory, the restart_lsn advances upon
receiving a subscriber acknowledgment. Then, logical decoding restarts
from an older point, setting candidate_lsn and candidate_valid based
on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments
then update the restart_lsn to an LSN older than the current value.

In the reported case, after WAL files were removed by a checkpoint,
the retreated restart_lsn prevented logical replication from
restarting due to missing WAL segments.

This change essentially modifies the 'if' condition to 'else if'
condition within the function. The previous code had an asymmetry in
this regard compared to LogicalIncreaseXminForSlot(), which does
almost the same thing for different fields.

The WAL removal issue was reported by Hubert Depesz Lubaczewski.

Backpatch to all supported versions, since the bug exists since 9.4
where logical decoding was introduced.

Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com
Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me
Backpatch-through: 13
After commit 5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state
resulting from these commands ceased to affect sessions.  Restore the
longstanding behavior, which is like beginning the session with a SET
ROLE command.  If cherry-picking the CVE-2024-10978 fixes, default to
including this, too.  (This fixes an unintended side effect of fixing
CVE-2024-10978.)  Back-patch to v12, like that commit.  The release team
decided to include v12, despite the original intent to halt v12 commits
earlier this week.

Tom Lane and Noah Misch.  Reported by Etienne LAFARGE.

Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com
Commits aac2c9b4f et al. added a bool field to struct ResultRelInfo.
That's no problem in the master branch, but in released branches
care must be taken when modifying publicly-visible structs to avoid
an ABI break for extensions.  Frequently we solve that by adding the
new field at the end of the struct, and that's what was done here.
But ResultRelInfo has stricter constraints than just about any other
node type in Postgres.  Some executor APIs require extensions to index
into arrays of ResultRelInfo, which means that any change whatever in
sizeof(ResultRelInfo) causes a fatal ABI break.

Fortunately, this is easy to fix, because the new field can be
squeezed into available padding space instead --- indeed, that's where
it was put in master, so this fix also removes a cross-branch coding
variation.

Per report from Pavan Deolasee.  Patch v14-v17 only; earlier versions
did not gain the extra field, nor is there any problem in master.

Discussion: https://postgr.es/m/CABOikdNmVBC1LL6pY26dyxAS2f+gLZvTsNt=2XbcyG7WxXVBBQ@mail.gmail.com
In the dim past we figured it was okay to ignore collations
when combining UNION set-operation nodes into a single N-way
UNION operation.  I believe that was fine at the time, but
it stopped being fine when we added nondeterministic collations:
the semantics of distinct-ness are affected by those.  v17 made
it even less fine by allowing per-child sorting operations to
be merged via MergeAppend, although I think we accidentally
avoided any live bug from that.

Add a check that collations match before deciding that two
UNION nodes are equivalent.  I also failed to resist the
temptation to comment plan_union_children() a little better.

Back-patch to all supported branches (v13 now), since they
all have nondeterministic collations.

Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us
In 17~, age(xid) and mxid_age(xid) were listed as deprecated.  Based on
the discussion that led to 48b5aa3143, this is not intentional as this
could break many existing monitoring queries.  Note that vacuumdb also
uses both of them.

In 16, both functions were listed under "Control Data Functions", which
is incorrect, so let's move them to the list of functions related to
transaction IDs and snapshots.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Zzr2zZFyeFKXWe8a@ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://postgr.es/m/20231114013224.4z6oxa6p6va33rxr@awork3.anarazel.de
Backpatch-through: 16
Ordinarily transformSetOperationTree will collect all UNION/
INTERSECT/EXCEPT steps into the setOperations tree of the topmost
Query, so that leaf queries do not contain any setOperations.
However, it cannot thus flatten a subquery that also contains
WITH, ORDER BY, FOR UPDATE, or LIMIT.  I (tgl) forgot that in
commit 07b4c48 and wrote an assertion in rule deparsing that
a leaf's setOperations would always be empty.

If it were nonempty then we would want to parenthesize the subquery
to ensure that the output represents the setop nesting correctly
(e.g. UNION below INTERSECT had better get parenthesized).  So
rather than just removing the faulty Assert, let's change it into
an additional case to check to decide whether to add parens.  We
don't expect that the additional case will ever fire, but it's
cheap insurance.

Man Zeng and Tom Lane

Discussion: https://postgr.es/m/tencent_7ABF9B1F23B0C77606FC5FE3@qq.com
RelationSyncCache, the hash table in charge of tracking the relation
schemas sent through pgoutput, was forgetting to free the TupleDesc
associated to the two slots used to store the new and old tuples,
causing some memory to be leaked each time a relation is invalidated
when the slots of an existing relation entry are cleaned up.

This is rather hard to notice as the bloat is pretty minimal, but a
long-running WAL sender would be in trouble over time depending on the
workload.  sysbench has proved to be pretty good at showing the problem,
coupled with some memory monitoring of the WAL sender.

Issue introduced in 52e4f0c, that has added row filters for tables
logically replicated.

Author: Boyu Yang
Reviewed-by: Michael Paquier, Hou Zhijie
Discussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM
Backpatch-through: 15
Apparently this information has been outdated since first committed,
because we adopted a different implementation during development per
reviews and this detail was not updated in the README.

This has been wrong since commit 0ac5ad5 introduced the file in
2013.  Backpatch to all live branches.

Reported-by: Will Mortensen <will@extrahop.com>
Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com
It failed to set the archive_command as it desired because of a syntax
problem.  Oversight in commit 90bcc7c2db1d.

This bug doesn't cause the test to fail, because the test only checks
pg_rewind's output messages, not the actual outcome (and the outcome in
both cases is that the file is kept, not deleted).  But in either case
the message about the file being kept is there, so it's hard to get
excited about doing much more.

Reported-by: Antonin Houska <ah@cybertec.at>
Author: Alexander Kukushkin <cyberdemn@gmail.com>
Discussion: https://postgr.es/m/7822.1732167825@antos
If the executable's .o files were produced by a compiler (probably gcc)
not using -moutline-atomics, and the corresponding .bc files were
produced by clang using -moutline-atomics (probably by default), then
the generated bitcode functions would have the target attribute
"+outline-atomics", and could fail at runtime when inlined.  If the
target ISA at bitcode generation time was armv8-a (the most conservative
aarch64 target, no LSE), then LLVM IR atomic instructions would generate
calls to functions in libgcc.a or libclang_rt.*.a that switch between
LL/SC and faster LSE instructions depending on a runtime AT_HWCAP check.
Since the corresponding .o files didn't need those functions, they
wouldn't have been included in the executable, and resolution would
fail.

At least Debian and Ubuntu are known to ship gcc and clang compilers
that target armv8-a but differ on the use of outline atomics by default.

Fix, by suppressing the outline atomics attribute in bitcode explicitly.
Inline LL/SC instructions will be generated for atomic operations in
bitcode built for armv8-a.  Only configure scripts are adjusted for now,
because the meson build system doesn't generate bitcode yet.

This doesn't seem to be a new phenomenon, so real cases of functions
using atomics that are inlined by JIT must be rare in the wild given how
long it took for a bug report to arrive.  The reported case could be
reduced to:

postgres=# set jit_inline_above_cost = 0;
SET
postgres=# set jit_above_cost = 0;
SET
postgres=# select pg_last_wal_receive_lsn();
WARNING:  failed to resolve name __aarch64_swp4_acq_rel
FATAL:  fatal llvm error: Program used external function
'__aarch64_swp4_acq_rel' which could not be resolved!

The change doesn't affect non-ARM systems or later target ISAs.

Back-patch to all supported releases.

Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru>
Discussion: https://postgr.es/m/18610-37bf303f904fede3%40postgresql.org
psql's --help was missed the description of the \pset variable
xheader_width, that should be listed when using \? or --help=commands,
and described for --help=variables.

Oversight in a45388d.

Author: Pavel Luzanov
Discussion: https://postgr.es/m/1e3e06d6-0807-4e62-a9f6-c11481e6eb10@postgrespro.ru
Backpatch-through: 16
petere and others added 7 commits May 5, 2025 12:17
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 73452f0d3ca43035a492ff657802cc9060561413
Start the file with static functions not specific to pe_test_vectors
tests.  This way, new tests can use them without disrupting the file's
layout.  Change report_result() PQExpBuffer arguments to plain strings.
Back-patch to v13 (all supported versions), for the next commit.

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from().  Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString().  Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions.  All those functions have been
intended to handle untrusted, unterminated input safely.

A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page.  Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach.  Back-patch to v13 (all supported versions).

Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
Merge upstream PostgreSQL REL_16_9 into Cloudberry, upgrading the
kernel from PG 14.4 to PG 16.9. This spans two major versions (PG 15
and PG 16), bringing in 5730 commits.

Key features and changes, sorted by importance:

1. MERGE command
   - SQL-standard MERGE for INSERT/UPDATE/DELETE in a single statement.
   - ruleutils decompilation, PlaceHolderVar support in actions
   - Fixes for partitioned tables, cross-partition triggers, RLS,
   - EvalPlanQual, self-modified tuples, pg_stat_statements

2. Logical replication enhancements
   - Built-in support for prepared transactions (2PC)
   - Enable two-phase via replication protocol
   - Streaming changes after speculative aborts
   - New pg_create_subscription predefined role
   - Parallel apply workers (PG 16)
   - Row/column filtering, schema-level publication (PG 15)
   - Critical fix for data loss in logical replication
   - Slot invalidation, snapshot, and memory leak fixes

3. SQL/JSON
   - IS JSON predicate
   - Standard JSON constructor functions (json_array, json_object,
     json_scalar, json_serialize)
   - Auto-generated query jumbling via gen_node_support.pl
   - Fix json_array() subquery double transformation

4. WAL and storage subsystem
   - LZ4 compression for full-page writes (FPW)
   - Custom WAL resource managers
   - Skip WAL recycling/preallocation during archive recovery
   - XLOG_FPI_FOR_HINT honors full_page_writes setting
   - Direct I/O support on macOS
   - Buffer manager infrastructure refactor for faster relation extension
   - New smgrzeroextend(), FileZero(), FileFallocate()
   - Remove HeapBitmapScan skip_fetch optimization (correctness issue)
   - WAL/Recovery separation: xlogrecovery.c, xlogprefetcher.c (PG 15)
   - Backup sink architecture (PG 15)

5. Query optimizer / executor
   Performance:
   - Hash table acceleration for NOT IN(values)
   - Datum sorts for single-column sorts
   - Result Cache renamed to Memoize with ongoing fixes
   - Parallel Hash Full Join
   - Reduced planning cost for deeply-nested views
   - Fewer pallocs when building partition bounds
   - Outer join optimization enhancement (PG 16)
   - Symmetric hash join optimization (PG 16)
   Critical fixes:
   - Planner failure to identify multiple hashable ScalarArrayOpExprs
   - Consistent whole-row Var construction in parsing and planning
   - WindowAgg evaluation crash
   - WITH RECURSIVE UNION assert failure
   - ORDER BY / DISTINCT aggregates with FILTER
   - setrefs.c missing expression processing on prune steps
   - Parallel Hash Join extreme skew detection

6. pgstat monitoring enhancements
   pgstat shared memory rewrite (PG 15):
   - Stats collector process removed; stats now in shared memory
   - Monolithic pgstat.c split into 14 files under utils/activity/
   pg_stat_io (PG 16 new view):
   - Detailed I/O statistics: shared buffer hits, I/O times, writeback
   pg_stat_statements:
   - JIT counters, temp file block I/O timing
   - Utility query string normalization, MERGE support
   - 32-bit integer overflow fix
   Other:
   - pg_stat_wal time accumulation as instr_time
   - Macro-generated pg_stat_get*() functions for tables and databases
   - Enhanced pg_stat_reset_single_table_counters
   - SP-GiST index scans counted in pg_stat
   - New test helpers: pg_stat_force_next_flush(), pg_stat_have_stats()

7. libpq and client protocol
   - New PQsendFlushRequest
   - Pipeline mode state machine fix
   - SASL code refactored to generic interface
   - Escape function fixes for invalid encoding data (security)
   - PQescapeLiteral()/PQescapeIdentifier() length handling fix
   - New pg_encoding_set_invalid()
   - Build-time check that libpq doesn't call exit()/abort()

8. Security and privilege model
   - CREATEROLE privilege restrictions (major security improvement)
   - New GUCs: createrole_self_grant, reserved_connections
   - New predefined roles: pg_maintain, pg_create_subscription
   - Non-superuser predefined roles for vacuum/analyze
   - Revoke PUBLIC CREATE from public schema (now owned by pg_database_owner)
   - Security invoker views (SECURITY INVOKER)
   - session_authorization and role interaction fixes
   - scram_SaltedPassword() integer overflow fix

9. pg_dump / pg_upgrade / pg_basebackup toolchain
    pg_dump:
    - Generic compression API, zstd support, LZ4 frame-only format
    pg_basebackup:
    - Extended compression options, server-side compression with -Fp
    - Client-side LZ4 decompression, parallel zstd compression
    pg_upgrade:
    - Fix unintentional 'NULL' string literal
    - Fix XMLSERIALIZE(NO INDENT) cross-version upgrade
    - pg_dumpall handling of dangling OIDs in pg_auth_members

10. Partitioned table improvements
    - Self-referencing FKs in partitioned tables
    - Detach partition with top-level FK fix
    - Reset relhassubclass on ATTACH
    - Trigger rename consistency, preserve firing state on clone
    - Disallow partitionwise join/grouping on collation mismatch

11. ICU and collation
    - Build ICU support by default
    - initdb uses uloc_getDefault()
    - CREATE DATABASE LOCALE applies to all collation providers

12. Node support function auto-generation (FIXME)
    - gen_node_support.pl auto-generates copyfuncs/equalfuncs/outfuncs/readfuncs
    - Query jumbling code auto-generation
    - NodeTag ABI stability check

13. Global renames and structural changes
    - RelFileNode -> RelFileLocator (pervasive rename across hundreds of files)
    - GUC system split: guc.c -> guc.c + guc_tables.c + guc_funcs.c
    - varatt.h separated from postgres.h
    - Test framework: PostgresNode.pm -> PostgreSQL::Test::Cluster

14. SQL types and functions
    - Numeric scale allows negative or greater than precision
    - New ANY_VALUE aggregate function
    - unnest(multirange), range_agg with multirange inputs
    - pg_size_pretty/pg_size_bytes support petabytes
    - ALTER TABLE SET ACCESS METHOD
    - SYSTEM_USER function
    - numeric_mul() overflow fix

15. psql improvements
    - PSQL_WATCH_PAGER for \watch command
    - New \drg command for role grants
    - \copy from sends data in larger chunks
    - pg_waldump --rmgr multiple specification

16. Critical data corruption and crash fixes
    - Fix data corruption when relation truncation fails
    - Fail instead of corrupting page header on non-LP_NORMAL TID update
    - Fix unsafe BufferDescriptors access
    - GB18030 SIGSEGV from out-of-bounds read
    - Snowball stemmer null pointer dereference after OOM
    - Rare standby assertion failure on primary restart
    - catcache invalidation during list entry construction

17. Vacuum / Autovacuum
    - Parallel VACUUM (PG 16, vacuumparallel.c new file)
    - ON COMMIT DELETE ROWS avoids ERROR after relhassubclass=f
    - Use WaitLatch() instead of pg_usleep() at end-of-vacuum truncation
    - Prevent numeric overflow in parallel numeric aggregates
    - ANALYZE preserves relhasindex for partitioned tables

Co-authored-by: liushengsong <lss602726449@gmail.com>
Co-authored-by: reshke <reshke@double.cloud>
Co-authored-by: Hao Wu <gfphoenix78@gmail.com>
Co-authored-by: Jianghua.yjh <yjhjstz@gmail.com>
Co-authored-by: Dianjin Wang <wangdianjin@gmail.com>
@chenjinbao1989 chenjinbao1989 changed the title Merge tag 'REL_16_9' into Cloudberry [test] Merge tag 'REL_16_9' into Cloudberry May 21, 2026
@chenjinbao1989 chenjinbao1989 changed the title [test] Merge tag 'REL_16_9' into Cloudberry Merge tag 'REL_16_9' into Cloudberry May 22, 2026
@yjhjstz
Copy link
Copy Markdown
Member

yjhjstz commented May 23, 2026

@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 25, 2026

Great !

related extensions tested ?

I think this should be a follow-up work to adopt PG16 support for them.

Comment thread .github/workflows/apache-rat-audit.yml Outdated
Comment thread .github/workflows/pg16-merge-validation.yml Outdated
Comment thread GNUmakefile.in Outdated
Comment thread .cirrus.star Outdated
@my-ship-it
Copy link
Copy Markdown
Contributor

Great work!! Thanks @chenjinbao1989 and @lss602726449 !!!

Copy link
Copy Markdown
Contributor

@avamingli avamingli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~ LGTM.

- Remove Cirrus CI configuration files (.cirrus.star, .cirrus.tasks.yml,
  .cirrus.yml) as the project no longer uses Cirrus CI
- Remove pg16-merge-validation GitHub Actions workflow, no longer needed
- Update apache-rat-audit branch references from cbdb-postgres-merge
  to REL_2_STABLE
- Re-enable contrib modules in GNUmakefile.in that were commented out
  during the PG16 merge: auto_explain, formatter_fixedwidth,
  fuzzystrmatch, dblink, indexscan, hstore, pgcrypto, btree_gin,
  pg_trgm, tablefunc, passwordcheck, pg_buffercache
The Cirrus CI files were removed in the previous commit, so their
Apache Rat license check exclusions in pom.xml are no longer needed.
Wrap the DELETE FROM x1 WHERE f_leak(b) statement in start_ignore/
end_ignore because PAX does not support TupleFetchRowVersion, and
the NOTICE messages emitted before the ERROR are non-deterministic
depending on how many rows are processed before hitting the error.
Change gpstop -rai (immediate) to gpstop -raf (fast) in the
resgroup_auxiliary_tools_v2 test setup. Immediate shutdown may
cause restart failures in CI environments, preventing the resource
manager from being switched to group-v2 and causing all subsequent
resgroup tests to fail.
Comment thread .github/workflows/build-cloudberry-rocky8.yml Outdated
Comment thread .github/workflows/build-cloudberry-rocky8.yml Outdated
Comment thread .github/workflows/build-cloudberry.yml Outdated
Comment thread .github/workflows/build-cloudberry.yml Outdated
Comment thread .github/workflows/build-deb-cloudberry.yml Outdated
Comment thread .github/workflows/build-deb-cloudberry.yml Outdated
Copy link
Copy Markdown
Contributor

@leborchuk leborchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 26, 2026

When running autoconf to see if any changes were made to the configure file, I found some weird changes in it:

-#
-# gp_stats_collector
-#
-
-
-
-# Check whether --with-gp-stats-collector was given.
-if test "${with_gp_stats_collector+set}" = set; then :
-  withval=$with_gp_stats_collector;
-  case $withval in
-    yes)
-      :
-      ;;
-    no)
-      :
-      ;;
-    *)
-      as_fn_error $? "no argument expected for --with-gp-stats-collector option" "$LINENO" 5
-      ;;
-  esac
-
-else
-  with_gp_stats_collector=no
-
-fi
-
-#

Env: can run autoconf (sudo dnf install autoconf -y) in the Cloudberry Rocky 8 dev image: docker run --name cbdb-dev -it --rm -h cdw --shm-size=2gb apache/incubator-cloudberry:cbdb-build-rocky8-latest. FYI.

Comment thread src/backend/storage/file/fd.c Outdated
@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 26, 2026

When running a demo cluster, some warnings returned, FYI:

  • Updates: when running make destroy-demo-cluster --directory=~/cloudberry, also got the same warnning
make: Entering directory '/home/gpadmin/cloudberry'
GNUmakefile:206: warning: overriding recipe for target 'check-world-src/test-recurse'
GNUmakefile:176: warning: ignoring old recipe for target 'check-world-src/test-recurse'
GNUmakefile:206: warning: overriding recipe for target 'check-world-src/pl-recurse'
GNUmakefile:176: warning: ignoring old recipe for target 'check-world-src/pl-recurse'
GNUmakefile:206: warning: overriding recipe for target 'check-world-contrib-recurse'
GNUmakefile:176: warning: ignoring old recipe for target 'check-world-contrib-recurse'
GNUmakefile:206: warning: overriding recipe for target 'check-world-src/bin-recurse'
GNUmakefile:176: warning: ignoring old recipe for target 'check-world-src/bin-recurse'
GNUmakefile:207: warning: overriding recipe for target 'checkprep-src/test-recurse'
GNUmakefile:177: warning: ignoring old recipe for target 'checkprep-src/test-recurse'
GNUmakefile:207: warning: overriding recipe for target 'checkprep-src/pl-recurse'
GNUmakefile:177: warning: ignoring old recipe for target 'checkprep-src/pl-recurse'
GNUmakefile:207: warning: overriding recipe for target 'checkprep-contrib-recurse'
GNUmakefile:177: warning: ignoring old recipe for target 'checkprep-contrib-recurse'
GNUmakefile:207: warning: overriding recipe for target 'checkprep-src/bin-recurse'
GNUmakefile:177: warning: ignoring old recipe for target 'checkprep-src/bin-recurse'
GNUmakefile:209: warning: overriding recipe for target 'installcheck-world-src/test-recurse'
GNUmakefile:204: warning: ignoring old recipe for target 'installcheck-world-src/test-recurse'
GNUmakefile:209: warning: overriding recipe for target 'installcheck-world-src/pl-recurse'
GNUmakefile:204: warning: ignoring old recipe for target 'installcheck-world-src/pl-recurse'
GNUmakefile:209: warning: overriding recipe for target 'installcheck-world-src/bin-recurse'
GNUmakefile:204: warning: ignoring old recipe for target 'installcheck-world-src/bin-recurse'
make -C gpAux/gpdemo create-demo-cluster
make[1]: Entering directory '/home/gpadmin/cloudberry/gpAux/gpdemo'

I can create the demo cluster successfully, but if killing these warnings, that would be a good experience for users. AI answered me that we can update in the GNUmakefile.in.

@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 26, 2026

Hi @lss602726449 @chenjinbao1989, since this PR includes 100+ commits, we need to merge it via the CLI. Once it's ready, please let me know. I can help with this.

We can take this as a reference: https://github.com/apache/cloudberry/wiki/Rebase-and-merge.

@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 26, 2026

Some tests failed running make installcheck under the gpdemo env in Rocky Linux 8 + 9, FYI:

  • OS: Rocky 8 + 9
  • Gpdemo
  • cmd: make installcheck
[gpadmin@cdw cloudberry]$ cat src/test/regress/regression.diffs
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /home/gpadmin/cloudberry/src/test/regress/expected/cte_prune_optimizer.out /home/gpadmin/cloudberry/src/test/regress/results/cte_prune.out
--- /home/gpadmin/cloudberry/src/test/regress/expected/cte_prune_optimizer.out	2026-05-26 02:23:30.040989482 -0700
+++ /home/gpadmin/cloudberry/src/test/regress/results/cte_prune.out	2026-05-26 02:23:30.124990517 -0700
@@ -2259,7 +2259,7 @@
    ->  Sequence  (cost=0.00..2161.00 rows=1 width=4)
    Output: t1.v1
  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..2161.00 rows=1 width=4)
-(55 rows)
+(54 rows)

 -- sql 95
 explain verbose with ws_wh as
@@ -2312,7 +2312,7 @@
    ->  Sequence  (cost=0.00..1730.00 rows=1 width=12)
    Output: t1.v1, t1.v2, t1.v3
  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1730.00 rows=1 width=12)
-(44 rows)
+(43 rows)

 explain verbose with ws_wh as
 (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
@@ -2366,7 +2366,7 @@
    ->  Sequence  (cost=0.00..1730.00 rows=1 width=12)
    Output: t1.v1, t1.v2, t1.v3
  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1730.00 rows=1 width=12)
-(46 rows)
+(45 rows)

 GP_IGNORE:-- start_ignore
 GP_IGNORE:drop table tpcds_store_sales;
@@ -2488,7 +2488,7 @@
    Merge Key: t4_1.c, t4_1.d, (avg(share0_ref3.b) OVER (?)), (sum(share0_ref2.d) OVER (?))
    Output: t4_1.c, t4_1.d, (avg(share0_ref3.b) OVER (?)), (sum(share0_ref2.d) OVER (?))
  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1356697001.21 rows=10 width=24)
-(93 rows)
+(92 rows)

 WITH t(a,b,d) AS
 (
@@ -2783,7 +2783,7 @@
    Merge Key: share0_ref2.name, share0_ref3.language
    Output: share0_ref3.code, share0_ref3.name, share0_ref3.name_1, share0_ref3.language, share0_ref3.isofficial, share0_ref3.percentage, share0_ref2.code, share0_ref2.name, share0_ref2.language, share0_ref2.isofficial, share0_ref2.percentage
  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1736.01 rows=1 width=66)
-(74 rows)
+(73 rows)

 -- CTE in the main query and subqueries within the main query
 explain verbose with bad_headofstates as
@@ -2882,7 +2882,7 @@
    ->  Shared Scan (share slice:id 0:0)  (cost=0.00..437.00 rows=1 width=1)
    Output: (avg(country_1.population)), country_1.region, share0_ref2.headofstate
  Sequence  (cost=0.00..1748.00 rows=1 width=24)
-(77 rows)
+(76 rows)

 GP_IGNORE:-- start_ignore
 GP_IGNORE:drop table city;
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /home/gpadmin/cloudberry/src/test/regress/expected/explain_optimizer.out /home/gpadmin/cloudberry/src/test/regress/results/explain.out
--- /home/gpadmin/cloudberry/src/test/regress/expected/explain_optimizer.out	2026-05-26 02:23:45.974185626 -0700
+++ /home/gpadmin/cloudberry/src/test/regress/results/explain.out	2026-05-26 02:23:46.010186069 -0700
@@ -465,7 +466,6 @@
          "Settings": {                                      +
              "jit": "off",                                  +
              "Optimizer": "GPORCA",                         +
-             "optimizer": "on",                             +
              "enable_parallel": "off",                      +
              "parallel_setup_cost": "0",                    +
              "parallel_tuple_cost": "0",                    +

If not bugs, please ignore them.

lss602726449 and others added 3 commits May 26, 2026 18:24
- Remove duplicate recurse calls in GNUmakefile.in that caused
  "overriding recipe" warnings, merge src/tools/pg_bsd_indent
  into Cloudberry's check-world target
- Fix --with-gp_stats_collector option name to use hyphens
  (--with-gp-stats-collector) matching autoconf convention
- Regenerate configure with autoconf
- Fix flaky resgroup_cancel_terminate_concurrency test by adding
  pg_sleep before DROP ROLE to wait for temp table cleanup
- Fix flaky resgroup_dumpinfo test by adding pg_sleep before
  dump_test_check to wait for wait queue state to stabilize
- Update CI workflow branch refs from cbdb-postgres-merge to
  REL_2_STABLE
- Remove residual git conflict marker in 013_partition.pl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.