Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Created by
brew bump
Created with
brew bump-formula-pr
.release notes
…ing replication
Dolt PR: Fix dolt_schemas for doltgres dolthub/dolt#8401
Dolt PR: Add pointer for dolt_docs schema so it can be replaced by doltgres dolthub/dolt#8398
ParenExpr
to match MySQL's requirementsPostgreSQL's syntax does not require column default expressions to be wrapped in parens, but MySQL's does, so when we translate the column default value expressions to the vitess AST, we need to wrap them in parens so that they execute in GMS without triggering an error.
Fixes Column default expressions should not require parentheses dolthub/doltgresql#751
ALTER TABLE ADD FOREIGN KEY
Helps with data imports, since it's common to add FKs at the end of an import script.
The regression below seems to come from the FK being added, but we don't yet support removing an FK, so the
drop table
call now fails.Related to
ALTER TABLE
support dolthub/doltgresql#724GMS: Support information_schema views/tables hooks for doltgres dolthub/go-mysql-server#2678
Dolt: Return information_schema schema for doltgres dolthub/dolt#8391
Better test harness allowing to unskip about half of the dolt merge tests (after various bug fixes in dolt, already in main)
Dolt PR: Fix diff related table functions for doltgres dolthub/dolt#8381
Fixes:
For some index joins, the analyzer will create a specific type of plan that creates MySQL ranges rather than Doltgres ranges. It appears as though there may be divergent branches for the join logic, so I attempted to look for the source of the divergence, however I came up short.
For now, rather than chasing this down and delaying a PR (since Tim needs this fixed asap), we can pass the lookup to the internal Dolt table. This will return incorrect results in some situations, but it won't panic for now, so I'll follow up with a better search through GMS at a later date to merge the index join paths.
COPY FROM STDIN
Added more test coverage over the Doltgres Getting Started Guide and pulled them out into their own file.
This implements the initial portion of the authentication protocol.
Postgres Reference Documentation:
Primarily, this implements
SASL SCRAM-SHA-256
, which appears to be the primary form of authentication used in modern Postgres. It has been built by following the RFC specification:There are no tests since the implementation is incomplete. It cannot truly be tested until we have passwords and such that it can verify against (as the results must be sent back to the client for verification, so it can't be faked), however I have tested it up through what has been written, and what exists works as it should.
Surprisingly, there aren't any libraries that we could really leverage for this. Most SASL libraries don't implement
SCRAM
. The closest was the following:However, I couldn't really find a way to integrate it using raw messages and the eventual Doltgres user backend, so this is all custom-written using the RFC as a guideline (along with capturing packets using the regression capture tool to ensure that Postgres follows the RFC's implementation). For now, the logic is hidden behind a bogus parameter check so that the static analyzer is happy, and the next step is to make a mock in-memory database of users and passwords so I can fully test the entire workflow.
ALTER TABLE
, starting with adding a primary keyAdding initial support for converting
ALTER TABLE
statements. This first iteration only supportsALTER TABLE t ADD PRIMARY KEY (...);
.Related to
ALTER TABLE
support dolthub/doltgresql#724use mydb/main
without quoting and implemented the IF functionMost of this PR is changes to the doltgres engine testing harness to make it pass more tests.
Also include parser support for unquoted db identifiers with a
/
in a USE statement, and implements the IF function (apparently a cockroach extension).COPY
support forHEADER
optionAdds support for using the
HEADER
option inCOPY
statements.In this first iteration, we only support specifying
HEADER
orHEADER true
. This form causes the tabular data loader and CSV data loader to skip over the initial, header line in import data. In addition to this form,COPY
also supports aHEADER MATCH
option, where the server asserts that the columns in the import data exactly match the name and the order of the columns in the destination table.(Note: this PR is based off of Feature:
COPY FROM STDIN
support for CSV files dolthub/doltgresql#700 to help split up the changes to make them easier to review)This works around the problem described here:
race analysis fails for several tests on Amazon Linux dolthub/doltgresql#718
pgproto3
for handling server connection messagesCOPY FROM STDIN
support for CSV filesSupport for loading data via
COPY FROM STDIN
using CSV data.dolt_reset now works correctly
Code changes are all in Dolt:
support for schemas in various version control operations dolthub/dolt#8343
ANALYZE
statementsAdds support for converting Postgres'
ANALYZE
statement for a single table and running it through the GMS SQL engine. There are stills lots of unsupported options in Postgres'ANALYZE
statement, but this change allows Doltgres to process the simplest form – where a single table is being analyzed.Since it's common to run
ANALYZE
at the end of data load scripts (example), this change is intended to make it easier to load dumps into Doltgres.COPY FROM STDIN
supportAdds support for
COPY ... FROM STDIN
. When copying fromSTDIN
, theCOPY FROM
statement starts a multi-message flow between the server and client – the client will sendCOPY DATA
messages until all the data has been sent, and then send aCOPY DONE
(orCOPY FAIL
) message to finalize the transfer and let the server process more queries.This PR adds a new
TabularDataLoader
type, with the idea that we can create aDataLoader
interface for that when we extend this to add support for loading data from CSV data files, too.This PR also depends on a GMS change to allow us to create a new
sql.Context
instance: Addsql.ContextProvider
interface dolthub/go-mysql-server#2652This was pulled from:
Separating the
COPY FROM
portion from the regression tests.Note: the prepared tests don't utilize this field in the
Parse
message, so it needs to extract the binding value types from analyzed plan of the query.This removes
IN
prematurely decaying, since it's only necessary for index filters. To complement this, I've implementedSplitConjunction
andSplitDisjunction
, so that they're aware of Doltgres expression types. The GMS versions will see*pgexprs.GMSCast
and do nothing, since we actually care about the child, but GMS is unaware of that.GMS PR: Fix revision databases not showing up for schema databases dolthub/go-mysql-server#2645
This adds a ton of tests, taken from the Postgres Regression Test suite, and made into Go tests using the new tool in
testing/go/regression/reader
. There are errors in some of the tests, but it at least gives a very rough idea of what works and what doesn't. The plan is to put this in a nightly job that displays results alongside the correctness tests, and it will also be able to tell us which tests fail that previously passed.In addition, this also implements
COPY FROM
, since it's used by the regression tests for loading in the test data.Improvements to testing harnesses to allow more tests to pass.
Relies on Fixed table resolution for reset dolthub/dolt#8313 and [no-release-notes] Testing harness changes dolthub/go-mysql-server#2647
Dolt PR: Export DoltSystemVariables var so that it can be used by doltgres dolthub/dolt#8306
Created by the Release workflow to update DoltgreSQL's version
This implements a proof-of-concept for true Postgres indexes.
Current Implementation
The current index implementation relies on the methods used in GMS and Dolt, which are inherently MySQL-based. There are a lot of layers to make indexes as efficient as possible (while also allowing for different integrators to use their own implementations), but I'm going to focus on the inner "core" logic. There are two parts to the core: storage and iteration.Storage
At a high level, we've hardcoded how values should be stored in an index by their observed behavior.NULL
is treated as being smaller than the smallest possible value, and is always stored first. Integers are ordered from negative to positive. Strings are in the order defined by their collation, which is usually in alphabetical order with some casing differences. In Dolt the ordering is concrete, and in GMS this order is assumed for all indexable types.Iteration
In GMS, we take an expression (filter or join) and create a range (or multiple ranges) that expresses the values that the index should return. That range contains the lower and upper bounds, and those are based on the value given. For example, the filtercolumn >= 6
(wherecolumn
is an integer) uses the value of 6 to construct a range of[6, ∞)
. This range is then passed to Dolt, which uses the inclusive6
as its starting point, and knows to iterate over the remaining index data until the end. If given some value that uses a different type than the indexed column's type, then that value is automatically cast to the index's type.Postgres vs. MySQL
With the storage and iteration in place for how MySQL (GMS and Dolt) work, let's now look at some key differences with indexes in Postgres.=
operator) values are not equivalent.=
,>
,>=
,<
,<=
), then any value may be used to iterate over storage. It is assumed that these operators map to some logical form of continuity, but that is not strictly required (the Postgres analyzer can actually catch some forms of discontinuity and apply additional filters, pretty cool actually). For example, it is possible that<
and>
could returntrue
for the same input, but again it is assumed that this is not the case.UNIQUE
indexes, this controls whether we permit multipleNULL
values. IfNULL
s are distinct, then multiple rows may useNULL
. In MySQL,NULL
s are always considered distinct.Indexed Joins
This is originally what kickstarted this small project. Now that I've covered how the current implementation works, and a few ways how Postgres differs, it should be much easier to show how proper indexed joins would not work in the current implementation. At their simplest, an index join has a form likeSELECT * FROM t1 JOIN t2 ON t1.col = t2.col;
. It may not be obvious, but this touches at least 2 of the 4 differences that I mentioned in the previous section.SELECT 0.99999994::float4 = 0.99999997::float8;
returnsfalse
, as there is a defined=
operator for the two types.SELECT 0.99999994::float4 = 0.99999997::float8::float4;
returnstrue
, as casting fromfloat8
tofloat4
loses some precision. In this exact case, as long as we keep the filter expression then it's okay, but can lead to data corruption otherwise. There are many more examples than this, so don't take this as the only case, but it's an easier one to understand (compared to a more "realistic" example usingreg...
types). If our index framework is built on casting (like the current GMS implementation), then we will always have cases where we are tracking down bugs due to invalid behavior.NULL
values are sorted differently between indexes, then that must be taken into account by some analyzer step. The current implementation does not do this, as it does not need to.The Simplest Solution
Right now onmain
, we are implementing indexes by casting everything to the column type for filters. This "works" in that we are able to get some tests and performance metrics working, but that's only for items that have a cast. As mentioned earlier, this casting logic is not correct, but our limited testing at least works with it. Once we leave the small bubble, we start to see all of the changes that would have to be made in order to get Postgres indexes "working", such as special casing types and expressions to work under the assumptions made in GMS and Dolt, and still it would be incorrect. Some of the special casing would even be hard to figure out in the first place, like how thereg...
types that were mentioned earlier should interact with other types.I propose, with this PR (and attached Dolt PR) that the simplest solution is to just do what Postgres is doing. Postgres defines functions that control the layout of values, so we can implement those functions and simply pass them down to Dolt's storage layer, which uses those for ordering rather than the hardcoded versions. This PR doesn't yet implement this part, but it is what we are already doing with the introduction of
sql.ExtendedType
, which uses the comparisons defined on the type to control the layout. We just have to change which function is being used, which is relatively simple. This PR, instead, focuses on the filter and retrieval part (since it's the more involved portion).Postgres simply passes the relevant operator functions down to its storage layer, and runs a tree search (using those operators on its internal b-tree) to find where the storage iterator should start. It then iterates until those operators are no longer fulfilled. This completely sidesteps the casting part, and focuses strictly on the comparison operators, which is exactly what we want. And that's all this PR (and the Dolt one) does in essence. It's a bit more complicated as I'm still trying to take advantage of as much infrastructure as possible, but at it's core it's passing the operators down to Dolt's storage layer to find a start (and stop) point. By passing down functions, this not only gives us full support for everything, but it even allows us to handle things like custom types without any additional code.
There are still additional things that need to be done, such as covering indexes vs. non covering indexes, composite indexes, etc. but those are mostly Doltgres-side changes to match how Postgres behaves. Also, code layout is not final (everything is in one file), comments are missing, names are bad, stuffing things into special structs rather than creating new fields, no GMS changes yet, etc. Look not at the code, but at the intention of the code, as none of this is final or production-quality.
Dolt PR
References
Closed Issues