Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT SQUASH MERGE] Community Merge for 3_8 stable #498

Conversation

roshan0708
Copy link
Contributor

@roshan0708 roshan0708 commented Dec 9, 2024

Description

Merge community commits till stamp 15.10.
Extension PR: babelfish-for-postgresql/babelfish_extensions#3223

Issues Resolved

Task: BABEL-5435

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the PostgreSQL license, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

hlinnaka and others added 30 commits December 9, 2024 13:07
Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a, which replaced the implementation with the
one from pgcrypto.

Backpatch to v14, where we switched to the new implementation.

Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://www.postgresql.org/message-id/[email protected]
(cherry picked from commit a38f5f880d1e86fedf1616de5391df5f0d5e4239)
(cherry picked from commit 5d0eadb)
If the plancache entry for the CALL statement is already stale,
it's possible for us to fetch an old procedure OID out of it,
and then fail with "cache lookup failed for function NNN".
In ordinary usage this never happens because make_callstmt_target
is called just once immediately after building the plancache
entry.  It can be forced however by setting up an erroneous CALL
(that causes make_callstmt_target itself to report an error),
then dropping/recreating the target procedure, then repeating
the erroneous CALL.

To fix, use SPI_plan_get_cached_plan() to fetch the plancache's
plan, rather than assuming we can use SPI_plan_get_plan_sources().
This shouldn't add any noticeable overhead in the normal case,
and in the stale-plan case we'd have had to replan anyway a little
further down.

The other callers of SPI_plan_get_plan_sources() seem OK, because
either they don't need up-to-date plans or they know that the
query was just (re) planned.  But add some commentary in hopes
of not falling into this trap again.

Per bug #18574 from Song Hongyu.  Back-patch to v14 where this coding
was introduced.  (Older branches have comparable code, but it's run
after any required replanning, so there's no issue.)

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit de35207015567da892d825a704933e6681347f19)
(cherry picked from commit f119cd3)
Commit 0b9466f added a dependency on fe_memutils' pnstrdup() inside
informix.c.  This adds an exit() path in a library, which we don't
want.  (Unlike libpq, the ecpg libraries don't have an automated check
for that, but it makes sense to keep them to a similar standard.)  The
ecpg code can already handle failure results from the *strdup() call
by itself.

Author: Jacob Champion <[email protected]>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
(cherry picked from commit 2de129b356bfde9a02269c174396add065ace260)
(cherry picked from commit 25e4983)
getTimelineHistory() is called twice, to read the source and the
target timeline history files. However, the loop to print the file
with the --debug option used the wrong variable when dealing with the
source. As a result, the source's history was always printed as empty.

Spotted while debugging bug #18575, but this does not fix that bug,
just the debugging output. Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/[email protected]
(cherry picked from commit b5a5027c9796e2958392f4682928d47b5b0d0e47)
(cherry picked from commit 5f99767)
Trying to attach a table as a partition which is already on the
referenced side of a foreign key on the partitioned table that it is
being attached to, leads to strange behavior: we try to clone the
foreign key from the parent to the partition, but this new FK points to
the partition itself, and the mix of pg_constraint rows and triggers
doesn't behave well.

Rather than trying to untangle the mess (which might be possible given
sufficient time), I opted to forbid the ATTACH.  This doesn't seem a
problematic restriction, given that we already fail to create the
foreign key if you do it the other way around, that is, having the
partition first and the FK second.

Backpatch to all supported branches.

Reported-by: Alexander Lakhin <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 8c0944ac878d3d3e433da8feb2f7e134be8353f3)
(cherry picked from commit 30c4133)
To deparse a reference to a field of a RECORD-type output of a
subquery, EXPLAIN normally digs down into the subquery's plan to try
to discover exactly which anonymous RECORD type is meant.  However,
this can fail if the subquery has been optimized out of the plan
altogether on the grounds that no rows could pass the WHERE quals,
which has been possible at least since 3fc6e2d.  There isn't
anything remaining in the plan tree that would help us, so fall back
to printing the field name as "fN" for the N'th column of the record.
(This will actually be the right thing some of the time, since it
matches the column names we assign to RowExprs.)

In passing, fix a comment typo in create_projection_plan, which
I noticed while experimenting with an alternative fix for this.

Per bug #18576 from Vasya B.  Back-patch to all supported branches.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 12010f4146156ae276e5e606400ed9ce39a83e0b)
(cherry picked from commit 164de03)
This section claims we use CRC-32 for WAL records and two-phase
state files, but we've actually used CRC-32C since v9.5 (commit
5028f22).  Fix that.

Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/ZrUFpLP-w2zTAHqq%40nathan
Backpatch-through: 12
(cherry picked from commit 75bb3354e6da6cceae9756d2e3c473088e554b16)
(cherry picked from commit f44e4ba)
The code intends to allow GUCs to be set within parallel workers
via function SET clauses, but not otherwise.  However, doing so fails
for "session_authorization" and "role", because the assign hooks for
those attempt to set the subsidiary "is_superuser" GUC, and that call
falls foul of the "not otherwise" prohibition.  We can't switch to
using GUC_ACTION_SAVE for this, so instead add a new GUC variable
flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set
anyway.  (This is okay because is_superuser has context PGC_INTERNAL
and thus only hard-wired calls can change it.  We'd need more thought
before applying the flag to other GUCs; but maybe there are other
use-cases.)  This isn't the prettiest fix perhaps, but other
alternatives we thought of would be much more invasive.

While here, correct a thinko in commit 059de3ca4: when rejecting
a GUC setting within a parallel worker, we should return 0 not -1
if the ereport doesn't longjmp.  (This seems to have no consequences
right now because no caller cares, but it's inconsistent.)  Improve
the comments to try to forestall future confusion of the same kind.

Despite the lack of field complaints, this seems worth back-patching.
Thanks to Nathan Bossart for the idea to invent a new flag,
and for review.

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 2f4e895be76044fc09bdf96617043c94a12079e7)
(cherry picked from commit 6025dba)
Coverity thinks dpns->plan could be null at these points.  That
shouldn't really be possible, but it's easy enough to modify the
Asserts so they'd not core-dump if it were true.

These are new in b919a97a6.  Back-patch to v13; the v12 version
of the patch didn't have these Asserts.

(cherry picked from commit 16e67bc5f98f2f5691fb5c01d5a8310de1c62b81)
(cherry picked from commit ffdeeba)
If a partition undergoes DETACH CONCURRENTLY immediately followed by
DROP, this could cause a problem for a concurrent transaction
recomputing the partition descriptor when running a prepared statement,
because it tries to dereference a pointer to a tuple that's not found in
a catalog scan.

The existing retry logic added in commit dbca3469ebf8 is sufficient to
cope with the overall problem, provided we don't try to dereference a
non-existant heap tuple.

Arguably, the code in RelationBuildPartitionDesc() has been wrong all
along, since no check was added in commit 898e5e3 against receiving
a NULL tuple from the catalog scan; that bug has only become
user-visible with DETACH CONCURRENTLY which was added in branch 14.
Therefore, even though there's no known mechanism to cause a crash
because of this, backpatch the addition of such a check to all supported
branches.  In branches prior to 14, this would cause the code to fail
with a "missing relpartbound for relation XYZ" error instead of
crashing; that's okay, because there are no reports of such behavior
anyway.

Author: Kuntal Ghosh <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 305db95434c9f68811f2489ec57d5a0435608c46)
(cherry picked from commit ba3226a)
Change "$1" to "username".

Reported-by: [email protected]

Discussion: https://postgr.es/m/[email protected]

Backpatch-through: 12
(cherry picked from commit e8f36414afac579f32c06573344c8ff4d854b59d)
(cherry picked from commit 3e9fd85)
Commit c66a7d75e652 modified DROP DATABASE so that if interrupted, the
database is known to be in an invalid state and can only be dropped.
This is done by setting a flag using an in-place update, so that it's
not lost in case of rollback.

For databases with many ACLs, this may however fail like this:

  ERROR:  wrong tuple length

This happens because with many ACLs, the pg_database.datacl attribute
gets TOASTed. The dropdb() code reads the tuple from the syscache, which
means it's detoasted. But the in-place update expects the tuple length
to match the on-disk tuple.

Fixed by reading the tuple from the catalog directly, not from syscache.

Report and fix by Ayush Tiwari. Backpatch to 12. The DROP DATABASE fix
was backpatched to 11, but 11 is EOL at this point.

Reported-by: Ayush Tiwari
Author: Ayush Tiwari
Reviewed-by: Tomas Vondra
Backpatch-through: 12
Discussion: https://postgr.es/m/CAJTYsWWNkCt+-UnMhg=BiCD3Mh8c2JdHLofPxsW3m2dkDFw8RA@mail.gmail.com
(cherry picked from commit df9c5fb5837fed76e2c80daa03a2b0443983614b)
(cherry picked from commit 49da0d0)
MacPorts version 2.9.3 started failing in our ci_macports_packages.sh
script, for reasons not fully determined, but plausibly linked to the
release of 2.10.1.  2.10.1 seems to work, so let's switch to it.

Back-patch to 15, where CI began.

Reported-by: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/81f104e8-f0a9-43c0-85bd-2bbbf590a5b8%40eisentraut.org
(cherry picked from commit 4247575c637b4d049346b96514ac9ca3363276d5)
(cherry picked from commit 7b9dd52)
Commit ca051d8 called newlocale(LC_COLLATE, ...) instead of
newlocale(LC_COLLATE_MASK, ...), in code reached only on FreeBSD.  They
have the same value on that OS, explaining why it worked.  Fix.

Back-patch to 14, where ca051d8 landed.

(cherry picked from commit c1bb534bae431f4bbc50abf471507457e1b24efc)
(cherry picked from commit e481c91)
Commit 274bbced disabled session tickets for TLSv1.3 on top of the
already disabled TLSv1.2 session tickets, but accidentally caused
a regression where TLSv1.2 session tickets were incorrectly sent.
Fix by unconditionally disabling TLSv1.2 session tickets and only
disable TLSv1.3 tickets when the right version of OpenSSL is used.

Backpatch to all supported branches.

Reported-by: Cameron Vogt <[email protected]>
Reported-by: Fire Emerald <[email protected]>
Reviewed-by: Jacob Champion <[email protected]>
Discussion: https://postgr.es/m/DM6PR16MB3145CF62857226F350C710D1AB852@DM6PR16MB3145.namprd16.prod.outlook.com
Backpatch-through: v12
(cherry picked from commit 23c200940eae6e7d9cf5712c3514691bfdaf3904)
(cherry picked from commit c0dfbe3)
Add a comment explaining dropdb() can't rely on syscache. The issue with
flattened rows was fixed by commit 0f92b230f88b, but better to have
a clear explanation why the systable scan is necessary. The other places
doing in-place updates on pg_database have the same comment.

Suggestion and patch by Yugo Nagata. Backpatch to 12, same as the fix.

Author: Yugo Nagata
Backpatch-through: 12
Discussion: https://postgr.es/m/CAJTYsWWNkCt+-UnMhg=BiCD3Mh8c2JdHLofPxsW3m2dkDFw8RA@mail.gmail.com
(cherry picked from commit e498d22e21358e196c1db9b5f1300eae667ea2bd)
(cherry picked from commit 4651b01)
When a partition is detached and immediately dropped, a prepared
statement could try to compute a new partition descriptor that includes
it.  This leads to this kind of error:
ERROR:  could not open relation with OID 457639

Avoid this by skipping the partition in expand_partitioned_rtentry if it
doesn't exist.

Noted by me while investigating bug #18559.  Kuntal Gosh helped to
identify the exact failure.

Backpatch to 14, where DETACH CONCURRENTLY was introduced.

Author: Álvaro Herrera <[email protected]>
Reviewed-by: Kuntal Ghosh <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit be73e7008558cef1f59b0f4a61cefb6ec53c45c5)
(cherry picked from commit 320c5bc)
Document the hard limit stemming from the size of an OID, and also
mention the perfomance impact that occurs before the hard limit
is reached.

Jakub Wartak and Robert Haas
Backpatch to all supported versions

Discussion: https://postgr.es/m/CAKZiRmwWhp2yxjqJLwbBjHdfbJBcUmmKMNAZyBjjtpgM9AMatQ%40mail.gmail.com
(cherry picked from commit e51160fa0bab7203e37094fbb0349b0c324b09e7)
(cherry picked from commit 1d3ffda)
The descriptions for ProcArrayGroupUpdate and XactGroupUpdate claim
that these events mean we are waiting for the group leader "at end
of a parallel operation," but neither pertains to parallel
operations.  This commit reverts these descriptions to their
wording before commit 3048898, i.e., "end of a parallel
operation" is changed to "transaction end."

Author: Sameer Kumar
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAGPeHmh6UMrKQHKCmX%2B5vV5TH9P%3DKw9en3k68qEem6J%3DyrZPUA%40mail.gmail.com
Backpatch-through: 13
(cherry picked from commit 5d9170697e4e6dd282b015890bc29baf978c6cb5)
(cherry picked from commit 8c47f98)
This does not make sense.  It would write the output of the USING
clause into the converted column, which would violate the generation
expression.  This adds a check to error out if this is specified.

There was a test for this, but that test errored out for a different
reason, so it was not effective.

Reported-by: Jian He <[email protected]>
Reviewed-by: Yugo NAGATA <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/c7083982-69f4-4b14-8315-f9ddb20b9834%40eisentraut.org
(cherry picked from commit cf49a606c416634fd6af82eb559331fd6052835d)
(cherry picked from commit 7a1f030)
This change improves the description of the
restrict_nonsystem_relation_kind parameter in guc_table.c and the
documentation for better clarity.

Backpatch to 12, where this GUC parameter was introduced.

Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/6a96f1af-22b4-4a80-8161-1f26606b9ee2%40eisentraut.org
Backpatch-through: 12
(cherry picked from commit 886549fb105f0c3d6a24064a9ddaf4e8602e0382)
(cherry picked from commit 2e5c0e9)
The first test was sensitive to the insert LSN after setting up the
catalogs, which depended on environmental things like the locales on the
OS and usernames.  Switch to a new WAL file before the first test, as a
simple way to put every computer into the same state.

Back-patch to all supported releases.

Reported-by: Anton Voloshin <[email protected]>
Reported-by: Nathan Bossart <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Discussion: https://postgr.es/m/b26aeac2-cb6d-4633-a7ea-945baae83dcf%40postgrespro.ru
(cherry picked from commit 777f50b9b5822d1058c59fc2ee1d1a7e9c766bf8)
(cherry picked from commit 19abfd7)
Since commit 2549f06, we reject an identifier immediately following
a numeric literal (without separating whitespace), because that risks
ambiguity with hex/octal/binary integers.  However, that patch used
token patterns like "{integer}{ident_start}", which is problematic
because {ident_start} matches only a single byte.  If the first
character after the integer is a multibyte character, this ends up
with flex reporting an error message that includes a partial multibyte
character.  That can cause assorted bad-encoding problems downstream,
both in the report to the client and in the postmaster log file.

To fix, use {identifier} not {ident_start} in the "junk" token
patterns, so that they will match complete multibyte characters.
This seems generally better user experience quite aside from the
encoding problem: for "123abc" the error message will now say that
the error appeared at or near "123abc" instead of "123a".

While at it, add some commentary about why these patterns exist
and how they work.

Report and patch by Karina Litskevich; review by Pavel Borisov.
Back-patch to v15 where the problem came in.

Discussion: https://postgr.es/m/CACiT8iZ_diop=0zJ7zuY3BXegJpkKK1Av-PU7xh0EDYHsa5+=g@mail.gmail.com
(cherry picked from commit f37ac613a835c8ff28a2f23abe14c88fbac8b039)
(cherry picked from commit 8c1c2ba)
…essions

As introduced by f9900df, a REINDEX CONCURRENTLY job done for an
index with predicates or expressions would set PROC_IN_SAFE_IC in its
MyProc->statusFlags, causing it to be ignored by other concurrent
operations.

Such concurrent index rebuilds should never be ignored, as a predicate
or an expression could call a user-defined function that accesses a
different table than the table where the index is rebuilt.

A test that uses injection points is added, backpatched down to 17.
Michail has proposed a different test, but I have added something
simpler with more coverage.

Oversight in f9900df.

Author: Michail Nikolaev
Discussion: https://postgr.es/m/CANtu0oj9A3kZVduFTG0vrmGnKB+DCHgEpzOp0qAyOgmks84j0w@mail.gmail.com
Backpatch-through: 14
(cherry picked from commit 239837a708cc2c6f27d96ddb3faf89691f2a22cd)
(cherry picked from commit e54e012)
check_agglevels_and_constraints() asserted that if we find an
aggregate function in an EXPR_KIND_FROM_SUBSELECT expression, the
expression must be in a LATERAL subquery.  Alexander Lakhin found a
case where that's not so: because of the odd scoping rules for NEW/OLD
within a rule, a reference to NEW/OLD could cause an aggregate to be
considered top-level even though it's in an unmarked sub-select.
The error message that would be thrown seems sufficiently on-point,
so just remove the Assert.  (Hence, this is not a bug for production
builds.)

This Assert was added by me in commit eaccfde (9.3 era).  It looks
like I put it in to cross-check that the new logic for detecting
misplaced aggregates (using agglevelsup) caught the same cases that a
previous check on p_lateral_active did.  So there might have been some
related misbehavior before eaccfde ... but that's very ancient
history by now, so I didn't dig any deeper.

Per bug #18608 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 78d0bd452c78b17a788d2f15dc4e28bd3d3159c0)
(cherry picked from commit f8f7741)
Commit 4b82664 restricted a number of functions provided by
contrib modules to only relations that use the "heap" table access
method.  Sequences always use this table access method, but they do
not advertise as such in the pg_class system catalog, so the
aforementioned commit also (presumably unintentionally) removed
support for sequences from some of these functions.  This commit
reintroduces said support for sequences to these functions and adds
a couple of relevant tests.

Co-authored-by: Ayush Vatsa
Reviewed-by: Robert Haas, Michael Paquier, Matthias van de Meent
Discussion: https://postgr.es/m/CACX%2BKaP3i%2Bi9tdPLjF5JCHVv93xobEdcd_eB%2B638VDvZ3i%3DcQA%40mail.gmail.com
Backpatch-through: 12
(cherry picked from commit e03042a7003dcf0e966f0c6ea3ecf128455bf2b3)
(cherry picked from commit 9c0e0f9)
I managed to break this test in two different ways in commit
05036a3155.

First, the output of the new call to tuple_data_split() on the test
sequence is dependent on endianness.  This is fixed by setting a
special start value for the test sequence that produces the same
output regardless of the endianness of the machine.

Second, on versions older than v15, the new test case fails under
"force_parallel_mode = regress" with the following error:

	ERROR:  cannot access temporary tables during a parallel operation

This is because pageinspect's disk-accessing functions are
incorrectly marked PARALLEL SAFE on versions older than v15 (see
commit aeaaf520f4 for details).  This one is fixed by changing the
test sequence to be permanent.  The only reason it was previously
marked temporary was to avoid needing a DROP SEQUENCE command at
the end of the test.  Unlike some other tests in this file, the use
of a permanent sequence here shouldn't result in any test
instability like what was fixed by commit e2933a6e11.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/ZuOKOut5hhDlf_bP%40nathan
Backpatch-through: 12
(cherry picked from commit a63aef5e496c1db0325845892d946e8569e7f9d6)
(cherry picked from commit 9909c62)
When we are building a hash index that is large enough to need
pre-sorting (larger than either maintenance_work_mem or NBuffers),
the initial sorting phase is interruptible, but the insertion
phase wasn't.  Add the missing CHECK_FOR_INTERRUPTS().

Per bug #18616 from Alexander Lakhin.  Back-patch to all
supported branches.

Pavel Borisov

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit e0857898b87b745235e65c28feed6e23e121876d)
(cherry picked from commit 6c4dad4)
Latest versions of Strawberry Perl define USE_THREAD_SAFE_LOCALE, and we
therefore get a handshake error when building against such instances.
The solution is to perform a test to see if USE_THREAD_SAFE_LOCALE is
defined and only define NO_THREAD_SAFE_LOCALE if it isn't.

Backpatch the meson.build fix back to release 16 and apply the same
logic to Mkvcbuild.pm in releases 12 through 16.

Original report of the issue from Muralikrishna Bandaru.

(cherry picked from commit 17c35ab236980fce2989f7fac7cee42ca4d5ca04)
(cherry picked from commit dcb5c51)
Historically we've used timezone "PST8PDT", but the recent release
2024b of tzdb changes the definition of that zone in a way that
breaks many test cases concerned with dates before 1970.  Although
we've not yet adopted 2024b into our own tree, this is already
problematic for people using --with-system-tzdata if their platform
has already adopted 2024b.  To work with both older and newer
versions of tzdb, switch to using "America/Los_Angeles", accepting
the ensuing changes in regression test results.

Back-patch to all supported branches.

Per report and patch from Wolfgang Walther.

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 2b94ee58bf256e1b4682f21f13d0f2c73e1b0cc2)
(cherry picked from commit 516a630)
michaelpq and others added 25 commits December 9, 2024 13:07
This commit changes libpq so that errors reported by the backend during
the protocol negotiation for SSL and GSS are discarded by the client, as
these may include bytes that could be consumed by the client and write
arbitrary bytes to a client's terminal.

A failure with the SSL negotiation now leads to an error immediately
reported, without a retry on any other methods allowed, like a fallback
to a plaintext connection.

A failure with GSS discards the error message received, and we allow a
fallback as it may be possible that the error is caused by a connection
attempt with a pre-11 server, GSS encryption having been introduced in
v12.  This was a problem only with v17 and newer versions; older
versions discard the error message already in this case, assuming a
failure caused by a lack of support for GSS encryption.

Author: Jacob Champion
Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Michael Paquier
Security: CVE-2024-10977
Backpatch-through: 12
(cherry picked from commit d2c3e31c13a6820980c2c6019f0b8f9f0b63ae6e)
(cherry picked from commit 9186308)
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: ecbca81dbf801f683e24897668cec8d1fb0f55a5
(cherry picked from commit 3f2c24e55b7ecfd06a4b65b9a52f74f27b580046)
(cherry picked from commit e4000ed)
Many process environment variables (e.g. PATH), bypass the containment
expected of a trusted PL.  Hence, trusted PLs must not offer features
that achieve setenv().  Otherwise, an attacker having USAGE privilege on
the language often can achieve arbitrary code execution, even if the
attacker lacks a database server operating system user.

To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just
replaces each modification attempt with a warning.  Sites that reach
these warnings should evaluate the application-specific implications of
proceeding without the environment modification:

  Can the application reasonably proceed without the modification?

    If no, switch to plperlu or another approach.

    If yes, the application should change the code to stop attempting
    environment modifications.  If that's too difficult, add "untie
    %main::ENV" in any code executed before the warning.  For example,
    one might add it to the start of the affected function or even to
    the plperl.on_plperl_init setting.

In passing, link to Perl's guidance about the Perl features behind the
security posture of PL/Perl.

Back-patch to v12 (all supported versions).

Andrew Dunstan and Noah Misch

Security: CVE-2024-10979
(cherry picked from commit e530835c6cc5b2dbf330ebe6b0a7fb9f19f5a54c)
(cherry picked from commit 713085a)
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it.  This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.

Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Backpatch-through: 12
(cherry picked from commit 6db5ea8de8ce15897b706009aaf701d23bd65b23)
(cherry picked from commit 0de4f01)
The SQL spec mandates that SET SESSION AUTHORIZATION implies
SET ROLE NONE.  We tried to implement that within the lowest-level
functions that manipulate these settings, but that was a bad idea.
In particular, guc.c assumes that it doesn't matter in what order
it applies GUC variable updates, but that was not the case for these
two variables.  This problem, compounded by some hackish attempts to
work around it, led to some security-grade issues:

* Rolling back a transaction that had done SET SESSION AUTHORIZATION
would revert to SET ROLE NONE, even if that had not been the previous
state, so that the effective user ID might now be different from what
it had been.

* The same for SET SESSION AUTHORIZATION in a function SET clause.

* If a parallel worker inspected current_setting('role'), it saw
"none" even when it should see something else.

Also, although the parallel worker startup code intended to cope
with the current role's pg_authid row having disappeared, its
implementation of that was incomplete so it would still fail.

Fix by fully separating the miscinit.c functions that assign
session_authorization from those that assign role.  To implement the
spec's requirement, teach set_config_option itself to perform "SET
ROLE NONE" when it sets session_authorization.  (This is undoubtedly
ugly, but the alternatives seem worse.  In particular, there's no way
to do it within assign_session_authorization without incompatible
changes in the API for GUC assign hooks.)  Also, improve
ParallelWorkerMain to directly set all the relevant user-ID variables
instead of relying on some of them to get set indirectly.  That
allows us to survive not finding the pg_authid row during worker
startup.

In v16 and earlier, this includes back-patching 9987a7bf3 which
fixed a violation of GUC coding rules: SetSessionAuthorization
is not an appropriate place to be throwing errors from.

Security: CVE-2024-10978
(cherry picked from commit a5d2e6205f716c79ecfb15eb1aae75bae3f8daa9)
(cherry picked from commit 97b12af)
v16 commit 8fe3e697a1a83a722b107c7cb9c31084e1f4d077 used REGRESS_OPTS in
a way needing this.  That broke "vcregress plcheck".  Back-patch
v16..v12; newer versions don't have this build system.

(cherry picked from commit 16ed4f4d08d69f2aced8ece69ba1076b48c9ce06)
(cherry picked from commit 2a92ae4)
TestUpgradeXversion knows how to make the main regression database's
references to pg_regress.so be version-independent.  But it doesn't
do that for plperl's database, so that the C function added by
commit b7e3a52a8 is causing cross-version upgrade test failures.
Path of least resistance is to just drop the function at the end
of the new test.

In <= v14, also take the opportunity to clean up the generated
test files.

Security: CVE-2024-10979
(cherry picked from commit c834b375a6dc36ff92f9f738ef1d7af09d91165f)
(cherry picked from commit 6fc3618)
…cks.

Commit 5a2fed911 had an unexpected side-effect: the parallel worker
launched for the new test case would fail if it couldn't use a
superuser-reserved connection slot.  The reason that test failed
while all our pre-existing ones worked is that the connection
privilege tests in InitPostgres had been based on the superuserness
of the leader's AuthenticatedUserId, but after the rearrangements
of 5a2fed911 we were testing the superuserness of CurrentUserId,
which the new test case deliberately made to be a non-superuser.

This all seems very accidental and probably not the behavior we really
want, but a security patch is no time to be redesigning things.
Pending some discussion about desirable semantics, hack it so that
InitPostgres continues to pay attention to the superuserness of
AuthenticatedUserId when starting a parallel worker.

Nathan Bossart and Tom Lane, per buildfarm member sawshark.

Security: CVE-2024-10978
(cherry picked from commit 109a323807d752f66699a9ce0762244f536e784f)
(cherry picked from commit 0c0f2ec)
Security: CVE-2024-10976, CVE-2024-10977, CVE-2024-10978, CVE-2024-10979
(cherry picked from commit b83b358b1bac61ca46af995a3da83cc515a434c6)
(cherry picked from commit 3b777bc)
(cherry picked from commit 0c53d54c812cea0d840490fd107910ed949e18c2)
(cherry picked from commit f668fe1)
The current code calls array_eq() and does not provide FmgrInfo.  This commit
provides initialization of FmgrInfo and uses C collation as the safe option
for text comparison because we don't know anything about the semantics of
opclass options.

Backpatch to 13, where opclass options were introduced.

Reported-by: Nicolas Maus
Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org
Backpatch-through: 13
(cherry picked from commit 713b8546aba66be102bdd8b320c06ea3b2813fd9)
(cherry picked from commit a10058e)
Maintain the pg_stat_user_indexes.idx_scan pgstat counter during
contrib/Bloom index scans.

Oversight in commit 9ee014f, which added the Bloom index contrib
module.

Author: Masahiro Ikeda <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch: 13- (all supported branches).
(cherry picked from commit 16a2bb0793ac269943b0d676bac781e3691e28a0)
(cherry picked from commit 7c29c29)
This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid in
the event when it is reused: an entry may refer to a different object
but requires the same hash key.  This can happen with various stats
kinds, like:
- Replication slots that compute internally an index number, for
different slot names.
- Stats kinds that use an OID in the object key, where a wraparound
causes the same key to be used if an OID is used for the same object.
- As of PostgreSQL 18, custom pgstats kinds could also be an issue,
depending on their implementation.

This issue is fixed by introducing a counter called "generation" in the
shared entries via PgStatShared_HashEntry, initialized at 0 when an
entry is created and incremented when the same entry is reused, to avoid
concurrent issues on drop because of other backends still holding a
reference to it.  This "generation" is copied to the local copy that a
backend holds when looking at an object, then cross-checked with the
shared entry to make sure that the entry is not dropped even if its
"refcount" justifies that if it has been reused.

This problem could show up when a backend shuts down and needs to
discard any entries it still holds, causing statistics to be removed
when they should not, or even an assertion failure.  Another report
involved a failure in a standby after an OID wraparound, where the
startup process would FATAL on a "can only drop stats once", stopping
recovery abruptly.  The buildfarm has been sporadically complaining
about the problem, as well, but the window is hard to reach with the
in-core tests.

Note that the issue can be reproduced easily by adding a sleep before
dshash_find() in pgstat_release_entry_ref() to enlarge the problematic
window while repeating test_decoding's isolation test oldest_xmin a
couple of times, for example, as pointed out by Alexander Lakhin.

Reported-by: Alexander Lakhin, Peter Smith
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 15
(cherry picked from commit 154c5b42a3d80424f7b7beef33a69600245c147d)
(cherry picked from commit 1fc9015)
Previously, in unlucky cases, it was possible for pg_rewind to remove
certain WAL segments from the rewound demoted primary.  In particular
this happens if those files have been marked for archival (i.e., their
.ready files were created) but not yet archived; the newly promoted node
no longer has such files because of them having been recycled, but they
are likely critical for recovery in the demoted node.  If pg_rewind
removes them, recovery is not possible anymore.

Fix this by maintaining a hash table of files in this situation in the
scan that looks for a checkpoint, which the decide_file_actions phase
can consult so that it knows to preserve them.

Backpatch to 14.  The problem also exists in 13, but that branch was not
blessed with commit eb00f1d, so this patch is difficult to apply
there.  Users of older releases will just have to continue to be extra
careful when rewinding.

Co-authored-by: Полина Бунгина (Polina Bungina) <[email protected]>
Co-authored-by: Alexander Kukushkin <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Atsushi Torikoshi <[email protected]>
Discussion: https://postgr.es/m/CAAtGL4AhzmBRsEsaDdz7065T+k+BscNadfTqP1NcPmsqwA5HBw@mail.gmail.com
(cherry picked from commit e28cf2fbc222a607377813590e4bee448fcf0a29)
(cherry picked from commit 97b814c)
In commit 08c0d6a which introduced "rainbow" arcs in regex NFAs,
I didn't think terribly hard about what to do when creating the color
complement of a rainbow arc.  Clearly, the complement cannot match any
characters, and I took the easy way out by just not building any arcs
at all in the complement arc set.  That mostly works, but Nikolay
Shaplov found a case where it doesn't: if we decide to delete that
sub-NFA later because it's inside a "{0}" quantifier, delsub()
suffered an assertion failure.  That's because delsub() relies on
the target sub-NFA being fully connected.  That was always true
before, and the best fix seems to be to restore that property.
Hence, invent a new arc type CANTMATCH that can be generated in
place of an empty color complement, and drop it again later when we
start NFA optimization.  (At that point we don't need to do delsub()
any more, and besides there are other cases where NFA optimization can
lead to disconnected subgraphs.)

It appears that this bug has no consequences in a non-assert-enabled
build: there will be some transiently leaked NFA states/arcs, but
they'll get cleaned up eventually.  Still, we don't like assertion
failures, so back-patch to v14 where rainbow arcs were introduced.

Per bug #18708 from Nikolay Shaplov.

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 2496c3f6f1bf5a735184d27d81527dfea7ad9e9b)
(cherry picked from commit c6e2eeb)
…kwards.

Previously LogicalIncreaseRestartDecodingForSlot() accidentally
accepted any LSN as the candidate_lsn and candidate_valid after the
restart_lsn of the replication slot was updated, so it potentially
caused the restart_lsn to move backwards.

A scenario where this could happen in logical replication is: after a
logical replication restart, based on previous candidate_lsn and
candidate_valid values in memory, the restart_lsn advances upon
receiving a subscriber acknowledgment. Then, logical decoding restarts
from an older point, setting candidate_lsn and candidate_valid based
on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments
then update the restart_lsn to an LSN older than the current value.

In the reported case, after WAL files were removed by a checkpoint,
the retreated restart_lsn prevented logical replication from
restarting due to missing WAL segments.

This change essentially modifies the 'if' condition to 'else if'
condition within the function. The previous code had an asymmetry in
this regard compared to LogicalIncreaseXminForSlot(), which does
almost the same thing for different fields.

The WAL removal issue was reported by Hubert Depesz Lubaczewski.

Backpatch to all supported versions, since the bug exists since 9.4
where logical decoding was introduced.

Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com
Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me
Backpatch-through: 13
(cherry picked from commit 91771b3fbbc33e066e9a28a7d85bde87f5a0c900)
(cherry picked from commit 528310a)
After commit 5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state
resulting from these commands ceased to affect sessions.  Restore the
longstanding behavior, which is like beginning the session with a SET
ROLE command.  If cherry-picking the CVE-2024-10978 fixes, default to
including this, too.  (This fixes an unintended side effect of fixing
CVE-2024-10978.)  Back-patch to v12, like that commit.  The release team
decided to include v12, despite the original intent to halt v12 commits
earlier this week.

Tom Lane and Noah Misch.  Reported by Etienne LAFARGE.

Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com
(cherry picked from commit edf80895f6bda824403f843df91cbc83890e4b6c)
(cherry picked from commit e06cd04)
Commits aac2c9b4f et al. added a bool field to struct ResultRelInfo.
That's no problem in the master branch, but in released branches
care must be taken when modifying publicly-visible structs to avoid
an ABI break for extensions.  Frequently we solve that by adding the
new field at the end of the struct, and that's what was done here.
But ResultRelInfo has stricter constraints than just about any other
node type in Postgres.  Some executor APIs require extensions to index
into arrays of ResultRelInfo, which means that any change whatever in
sizeof(ResultRelInfo) causes a fatal ABI break.

Fortunately, this is easy to fix, because the new field can be
squeezed into available padding space instead --- indeed, that's where
it was put in master, so this fix also removes a cross-branch coding
variation.

Per report from Pavan Deolasee.  Patch v14-v17 only; earlier versions
did not gain the extra field, nor is there any problem in master.

Discussion: https://postgr.es/m/CABOikdNmVBC1LL6pY26dyxAS2f+gLZvTsNt=2XbcyG7WxXVBBQ@mail.gmail.com
(cherry picked from commit 17db248f318f09b143af208fdcc1f067b3b0b2cb)
(cherry picked from commit b94f988)
(cherry picked from commit b57d9d2e5d5543cc0c4b2de70d65d7b7c4115da6)
(cherry picked from commit c467927)
fixempties() counts the number of in-arcs in the regex NFA and then
allocates an array of that many arc pointers.  If the NFA contains no
arcs, this amounts to malloc(0) for which some platforms return NULL.
The code mistakenly treats that as indicating out-of-memory.  Thus,
we can get a bogus "out of memory" failure for some unsatisfiable
regexes.

This happens only in v15 and earlier, since bea3d7e38 switched to
using palloc() rather than bare malloc().  And at least of the
platforms in the buildfarm, only AIX seems to return NULL.  So the
impact is pretty narrow.  But I don't especially want to ship code
that is failing its own regression tests, so let's fix this for
this week's releases.

A quick code survey says that there is only the one place in
src/backend/regex/ that is at risk of doing malloc(0), so we'll just
band-aid that place.  A more future-proof fix could be to install a
malloc() wrapper similar to pg_malloc().  But this code seems unlikely
to change much more in the affected branches, so that's probably
overkill.

The only known test case for this involves a complemented character
class in a bracket expression, for example [^\s\S], and we didn't
support that in v13.  So it may be that the problem is unreachable
in v13.  But I'm not 100% sure of that, so patch v13 too.

Discussion: https://postgr.es/m/[email protected]
(cherry picked from commit 6ab39c02747c33173e5e33291e66cebbdbc75d82)
(cherry picked from commit 79ae156)
(cherry picked from commit a4bd20b6d7f9d42750b797c450592f55d5374c1f)
(cherry picked from commit 06df9a4)
Signed-off-by: Mihir Jadhav <[email protected]>
(cherry picked from commit 5f36804)
This is due to changes from community commit
8590c942c1a6b861d0cf4fa5aa694ab3a65fa306

Fix data loss at inplace update after heap_update().

(cherry picked from commit 3d9aef7fbc9a8e2da60e8cc67b85a5d09cded629)
(cherry picked from commit 7cbce07)
Commit 09f0820095ea8b7c6ca4269454a94695c8421628 introduced
locking for individual tuples during certain operations. However,
Babelfish introduced the concept of ENR-only relations, which
store all catalog tuples in their own local cache and not in the
physical catalogs. As these relations and their catalog tuples
are completely session-local, there is no need to acquire locks on
tuples for these relations (and it would just lead to a crash
anyways since the tuples do not have an underlying TID to lock against).

Signed-off-by: Jason Teng <[email protected]>
(cherry picked from commit 60916a8f80f1164daade1fc707637ec93ff3ee68)
(cherry picked from commit e905197)
…t 25d639ee

Signed-off-by: Tanzeel Khan <[email protected]>
(cherry picked from commit c260de9cbca52ce5acbbe2c2acbf67762024af58)
(cherry picked from commit 10d3078)
@shardgupta shardgupta merged commit 762ffc6 into babelfish-for-postgresql:BABEL_3_8_STABLE__PG_15_10 Dec 11, 2024
2 checks passed
@shardgupta shardgupta deleted the community_merge_3x branch December 11, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.