Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MSSQL test Docker image #2120

Merged
merged 2 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 24 additions & 11 deletions ehrql/utils/mssql_log_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,14 +67,9 @@ def execute_with_log(connection, query, log, query_id=None):

# Regex to match IO statistics messages

Table\s'(?P<table>[^']+)'. \s+
Scan\scount\s (?P<scans>\d+), \s+
logical\sreads\s (?P<logical>\d+), \s+
physical\sreads\s (?P<physical>\d+), \s+
read-ahead\sreads\s (?P<read_ahead>\d+), \s+
lob\slogical\sreads\s (?P<lob_logical>\d+), \s+
lob\sphysical\sreads\s (?P<lob_physical>\d+), \s+
lob\sread-ahead\sreads\s (?P<lob_read_ahead>\d+)
Table\s'(?P<table>[^']+)'\. \s+
(?P<io_stats_line>.*)
\. $

) .*
""",
Expand Down Expand Up @@ -114,16 +109,18 @@ def parse_statistics_messages(messages):
timings[f"{prefix}_elapsed_ms"] += int(match["elapsed_ms"])
elif table := match["table"]:
table = table.decode(errors="ignore")
io_stats_line = match["io_stats_line"].decode(errors="ignore")
# Temporary table names are, internally to MSSQL, made globally unique
# by padding with underscores and appending a unique suffix. We need to
# restore the original name so our stats make sense. If you've got an
# actual temp table name with 5 underscores in it you deserve everything
# you get.
if table.startswith("#"):
table = table.partition("_____")[0]
stats = table_io[table]
for key in stats.keys():
stats[key] += int(match[key])
stats = parse_io_stats(io_stats_line)
cumulative_stats = table_io[table]
for key in cumulative_stats.keys():
cumulative_stats[key] += stats.get(key, 0)
else:
# Given the structure of the regex it shouldn't be possible to get here,
# but if somehow we did I'd rather drop the stats message than blow up
Expand All @@ -135,6 +132,22 @@ def parse_statistics_messages(messages):
return timings, table_io


def parse_io_stats(io_stats):
return dict(map(parse_io_stats_item, io_stats.split(",")))


def parse_io_stats_item(item):
item = item.strip()
name, _, value = item.rpartition(" ")
# Reformat MSSQL's names to the style we use internally
if name == "Scan count":
name = "scans"
name = name.removesuffix(" reads")
name = name.replace("-", "_").replace(" ", "_")
value = int(value)
return name, value


def format_table_io(table_io):
results_table = table_io_dict_to_table(table_io)
return format_table(results_table)
Expand Down
7 changes: 6 additions & 1 deletion tests/lib/databases.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,12 @@ def make_mssql_database(containers):
def run_mssql(container_name, containers, password, mssql_port): # pragma: no cover
containers.run_bg(
name=container_name,
image="mcr.microsoft.com/mssql/server:2017-CU30-ubuntu-18.04",
# This is *not* the version that TPP run for us in production which, as of
# 2024-09-24, is SQL Server 2016 (13.0.5893.48). That version is not available
# as a Docker image, so we run the oldest supported version instead. Both the
# production server and our test server set the "compatibility level" to the
# same value so the same feature set should be supported.
image="mcr.microsoft.com/mssql/server:2019-CU28-ubuntu-20.04",
volumes={
MSSQL_SETUP_DIR: {"bind": "/mssql", "mode": "ro"},
},
Expand Down
5 changes: 3 additions & 2 deletions tests/support/mssql/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,13 @@ if [ "$1" = '/opt/mssql/bin/sqlservr' ]; then

function _sqlcmd() {
# Extra arguments:
# -C : trust server certificate
# -b : exit with 1 on error rather than 0
# -h -1 : no headers in output
# -W : trim trailing whitespace
/opt/mssql-tools/bin/sqlcmd \
/opt/mssql-tools18/bin/sqlcmd \
-S localhost -U sa -P "$MSSQL_SA_PASSWORD" -d master \
-b -h -1 -W \
-C -b -h -1 -W \
"$@"
}

Expand Down
8 changes: 6 additions & 2 deletions tests/unit/utils/test_mssql_log_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,14 @@ def test_parse_statistics_messages():
b" Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0,"
b" lob logical reads 0, lob physical reads 0, lob read-ahead reads 0."
),
# Newer version of SQL Server return IO stats in a different order and with
# additional items, so check we handle those correctly
(
b"Table '#tmp_1______________________________________________________________________________________________________________0000000014D9'."
b" Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0,"
b" lob logical reads 0, lob physical reads 0, lob read-ahead reads 0."
b" Scan count 1, logical reads 4, physical reads 0, page server reads 0,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the new page server items also show up in the #tmp_1 entry in the table_io assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I've got it working at the moment is that it discards any stats it isn't expecting, because it only iterates the keys already in cumulative_stats:

table_io = defaultdict(
lambda: {
"scans": 0,
"logical": 0,
"physical": 0,
"read_ahead": 0,
"lob_logical": 0,
"lob_physical": 0,
"lob_read_ahead": 0,
}

for key in cumulative_stats.keys():
cumulative_stats[key] += stats.get(key, 0)

I wanted to try to keep the output consistent between dev and prod in this respect.

I agree the tests here aren't as comprehensible as they could be, but they do cover everything and I didn't think it was worth the time making them better for a fairly small feature.

b" read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0,"
b" lob physical reads 0, lob page server reads 0, lob read-ahead reads 0,"
b" lob page server read-ahead reads 0."
),
(
b"Table '#tmp_1______________________________________________________________________________________________________________0000000014B1'."
Expand Down
Loading