Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadAsync throws open SSL error for specific table on Linux on Azure #145

Closed
ErikApption opened this issue Jul 31, 2019 · 22 comments · Fixed by #171
Closed

ReadAsync throws open SSL error for specific table on Linux on Azure #145

ErikApption opened this issue Jul 31, 2019 · 22 comments · Fixed by #171
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.
Milestone

Comments

@ErikApption
Copy link

Describe the bug

When connecting to SQL Server on Azure with Linux client we get a SQLException that has a error linked to OpenSSL. We tried both Ubuntu 18x and Ubuntu 19x and same error.

Oddly enough, this error happens on a very specific table.
Also same error with older client System.Data.SqlClient
This is on net core preview 7

Exception message:
Stack trace:
 ---> System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.IO.IOException: The decryption operation failed, see inner exception.
 ---> Interop+OpenSsl+SslException: Decrypt failed with OpenSSL error - SSL_ERROR_SSL.
 ---> Interop+Crypto+OpenSslCryptographicException: error:1408F10B:SSL routines:ssl3_get_record:wrong version number
   --- End of inner exception stack trace ---
   at Interop.OpenSsl.Decrypt(SafeSslHandle context, Byte[] outBuffer, Int32 offset, Int32 count, SslErrorCode& errorCode)
   at System.Net.Security.SslStreamPal.EncryptDecryptHelper(SafeDeleteContext securityContext, ReadOnlyMemory`1 input, Int32 offset, Int32 size, Boolean encrypt, Byte[]& output, Int32& resultSize)
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Data.SqlClient.SNI.SNIPacket.<ReadFromStreamAsync>g__ReadFromStreamAsync|27_0(SNIPacket packet, SNIAsyncCallback cb, ValueTask`1 valueTask)
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallbackCaptureException(TaskCompletionSource`1 source)
--- End of stack trace from previous location where exception was thrown ---
   at Apption.DBConnectors.SQLServer.SQLServerTableMetadata.ReadTable(SQLDatabase database, SQLTable table, ITargetBlock`1 target)

Expected behavior

SQL Clients reads data correctly from source data.

Further technical details

Microsoft.Data.SqlClient version: 1.0.19189.1
SQL Server version: Azure SQL Server
Operating system: Ubuntu 18.04 docker container

Additional context
Error seems to be related to SSL version https://stackoverflow.com/questions/29627991/1408f10bssl-routinesssl3-get-recordwrong-version-number-call-on-indy

@cheenamalhotra
Copy link
Member

Oddly enough, this error happens on a very specific table.

Could you provide more specific details so we can debug better?

@ErikApption
Copy link
Author

Oddly enough, this error happens on a very specific table.

Could you provide more specific details so we can debug better?

Yes, we could provide you a private access to that dataset and the repro code. How can we send you these details?

@cheenamalhotra
Copy link
Member

You may send me email with necessary details.

@lghinet
Copy link

lghinet commented Aug 12, 2019

same problem here on preview7
works fine with 4.7.0-preview6.19303.8

@rmja
Copy link
Contributor

rmja commented Aug 15, 2019

I also see this error. The app is running on the mcr.microsoft.com/dotnet/core/aspnet:3.0.0-preview8-alpine docker image.

System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.IO.IOException: The decryption operation failed, see inner exception.
 ---> Interop+OpenSsl+SslException: Decrypt failed with OpenSSL error - SSL_ERROR_SSL.
 ---> Interop+Crypto+OpenSslCryptographicException: error:1408F10B:SSL routines:ssl3_get_record:wrong version number
   --- End of inner exception stack trace ---
   at Interop.OpenSsl.Decrypt(SafeSslHandle context, Byte[] outBuffer, Int32 offset, Int32 count, SslErrorCode& errorCode)
   at System.Net.Security.SslStreamPal.EncryptDecryptHelper(SafeDeleteContext securityContext, ReadOnlyMemory`1 input, Int32 offset, Int32 size, Boolean encrypt, Byte[]& output, Int32& resultSize)
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Data.SqlClient.SNI.SNIPacket.<ReadFromStreamAsync>g__ReadFromStreamAsync|27_0(SNIPacket packet, SNIAsyncCallback cb, ValueTask`1 valueTask)
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
   at System.Data.SqlClient.SqlDataReader.Read()

@rmja
Copy link
Contributor

rmja commented Aug 16, 2019

I have now tried with both System.Data.SqlClient 4.7.0 Previiew8 and Microsoft.Data.SqlClient 1.0.19221.1-Preview.

It turns out that System.Data throws this error but Microsoft.Data does not. It, however, returns corrupted data at some positions in the returned result!!!

@cheenamalhotra
Copy link
Member

@rmja Do you have a repro available that you can provide here?

@rmja
Copy link
Contributor

rmja commented Aug 16, 2019

I am working on it...

@rmja
Copy link
Contributor

rmja commented Aug 16, 2019

@cheenamalhotra Done....
Code is here: https://github.com/rmja/repro145

I have mailed the connection string to you. Please let me know when you are done with it, so that I can shut down the database instance.

The sample will run/fail depending on the launch configurration:
image

@rmja
Copy link
Contributor

rmja commented Aug 19, 2019

@cheenamalhotra were you able to reproduce the issue on your side?

@rmja
Copy link
Contributor

rmja commented Aug 30, 2019

I have investigated this further and these are my findings:

The following very simple program produces the issue:

image

Program.cs
class Program
    {
        private const string ConnectionString = "Server=tcp:xxx.database.windows.net,1433;Initial Catalog=xxx;Persist Security Info=False;User ID=xxx;Password=xxx;MultipleActiveResultSets=True;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;";

        static void Main(string[] args)
        {
            using (var connection = new Microsoft.Data.SqlClient.SqlConnection(ConnectionString))
            {
                connection.Open();

                //using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT TOP(5811) Id, DomainId, Timezone, DeviceNumber FROM Devices ORDER BY Id", connection)) // Works
                //using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT TOP(5812) Id, DomainId, Timezone, DeviceNumber FROM Devices ORDER BY Id", connection)) // Times out
                //using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT TOP(5813) Id, DomainId, Timezone, DeviceNumber FROM Devices ORDER BY Id", connection)) // Times out
                //using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT TOP(5814) Id, DomainId, Timezone, DeviceNumber FROM Devices ORDER BY Id", connection)) // Times out
                //using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT TOP(5815) Id, DomainId, Timezone, DeviceNumber FROM Devices ORDER BY Id", connection)) // Throws if MultipleActiveResultSets=True
                using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT Id, DomainId, Timezone, DeviceNumber FROM Devices ORDER BY Id", connection)) // Throws if MultipleActiveResultSets=True
                {
                    using (var reader = command.ExecuteReader())
                    {
                        while (reader.Read())
                        {
                            reader.GetInt32(0);
                            reader.GetString(1);
                            reader.GetString(2);
                            reader.GetString(3);
                        }
                    }
                }

                connection.Close();
            }
        }
    }
SQL Script seeding table
CREATE TABLE Devices (
	Id int identity(1,1) primary key,
	DomainId nvarchar(50) not null,
	Timezone nvarchar(50) not null,
	DeviceNumber nvarchar(50) not null
)

BEGIN TRANSACTION
DECLARE @i int = 0

WHILE @i < 10000
BEGIN
	INSERT INTO Devices (DomainId, Timezone, DeviceNumber) VALUES ('domain', 'Europe/Copenhagen', @i)
    SET @i = @i + 1
END

COMMIT TRANSACTION

I have only been able to reproduce it on SQL Azure, and not on my local SQL Server Developer instance.

The following two cases triggers the error:

  1. Run in a linux docker container with a package reference to
<PackageReference Include="Microsoft.Data.SqlClient" Version="1.0.19239.1" />
Dockerfile
FROM mcr.microsoft.com/dotnet/core/runtime:3.0.0-preview8-buster-slim AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/core/sdk:3.0-buster AS build
WORKDIR /src
COPY ["ConsoleApp3/ConsoleApp3.csproj", "ConsoleApp3/"]
RUN dotnet restore "ConsoleApp3/ConsoleApp3.csproj"
COPY . .
WORKDIR "/src/ConsoleApp3"
RUN dotnet build "ConsoleApp3.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "ConsoleApp3.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "ConsoleApp3.dll"]
  1. Alternatively, run in Windows with a debug build of Microsoft.Data.SqlClient and set the Microsoft.Data.SqlClient.UseManagedSNIOnWindows environment variable to true.

I have been unable to debug into the SqlClient inside docker, but as I have been able to reproduce it in Windows, the following is what I see when the error occurs:

image

This ultimately results in the following exception being thrown:
image

Some notes are:

  • The error only occurs if the number of rows/data received exceeds some threshold. With the example dataset selecting 5811 rows works fine but 5815 rows throws.
  • The errors seems to only happen if MARS is enabled.
  • I intuition (take it with a grain of salt) tells me that this is somehow related to https://github.com/dotnet/corefx/issues/40476

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2019

If you want to debug into the source in docker you can add the pdb to the nuget package and it'll let you trace in as long as JustMyCode isn't enabled. That's how i was looking at the bug in the linked PR.

[edit] They can't be related because M.D.SqlClient doesn't have the change from that PR. But i can investigate while i've got the test project setup.

@rmja
Copy link
Contributor

rmja commented Aug 30, 2019

@Wraith2 got it, here is the exception thrown from within the docker container:

image

[edit] (this is the exact same place as in Windows when ManagedSNI is enabled)

Note to myself: I disabled "Just My Code" and copied C:\Users\rmja\.nuget\packages\microsoft.data.sqlclient\1.0.19239.1\runtimes\unix\lib\netcoreapp2.1 to C:\Source\ConsoleApp3\ConsoleApp3\bin\Debug\netcoreapp3.0\runtimes\unix\lib\netcoreapp2.1 before starting the debug session.

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2019

It's going to take some poking at the TDS data to see what's wrong with it. I'll see if i can look into it this evening. It isn't going to be the same problem but it could easily be a related issue.

The managed implementation has a few rough areas which is one of the reasons i've made so many changes to it in corefx. Hopefully with the faster release cadance here we can identify and rectify them independently of corefx releases and in doing so improve non-windows reliability and performance.

@rmja
Copy link
Contributor

rmja commented Aug 30, 2019

@Wraith2 year, they are for sure not related. This is a different issue.

The problem here has to do with reading the length of a string when the length (2 bytes) is split across two packets:

This is a dump of what I see where a row corresponds to a SQL datarow:

// Packet header
04 00 1F 40 00 72 2E 00
// Last portion of row continued from previous packet
                                    00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 35 00 36 00 35 00
// A full row
D1 BF 15 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 35 00 36 00 36 00
// A full row
D1 C0 15 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 35 00 36 00 37 00
... // Many more rows
// A full row
D1 37 16 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 36 00 38 00 36 00
// A full row
D1 38 16 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 36 00 38 00 37 00
// Start of a row
D1 39 16 00 00 0C 00 64 00


// Next packet header
04 00 1F 40 00 72 2F 00
// Continuation of previous data
                           6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 36 00 38 00 38 00
// A full row
D1 3A 16 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 36 00 38 00 39 00
... // Many more rows
// A full row
D1 B2 16 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 38 00 30 00 39 00
// A full row
D1 B3 16 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 38 00 31 00 30 00
// A part of a row. NOTICE: 0C is the first out of 2 bytes describing the length of the string
D1 B4 16 00 00 0C

// The next header
04 00 1F 40 00 72 30 00
// Continuation of the previous packet. The first byte 00 is the last of the two length bytes
                  00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 38 00 31 00 31 00
D1 B5 16 00 00 0C 00 64 00 6F 00 6D 00 61 00 69 00 6E 00 22 00 45 00 75 00 72 00 6F 00 70 00 65 00 2F 00 43 00 6F 00 70 00 65 00 6E 00 68 00 61 00 67 00 65 00 6E 00 08 00 35 00 38 00 31 00 32 00
...

Clearly for this to work, the 0C has to be "saved" while reading the next packet. The code that should handle this is:
image

The following shows _bTmp after the 0C is read into the buffer:

image

However, after a single step over TryPrepareBuffer() _bTmp suddenly looks like this:
image

This is of cause causing a bad interpretation. _bTmp is somehow being used in between which is unfortunate. I have been unable to determine which code overwrites the buffer. It is somehow done from a different thread.

@rmja
Copy link
Contributor

rmja commented Aug 30, 2019

One more thing. It seems that if I skip the SendAckIfNecessary() call:

image

while reading invoking TryPrepareBuffer() during the read of the 2-byte length field then _bTmp does not get corrupted. Whether this is the cause or just a coincidence, I don't know.

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2019

It might be this dotnet/corefx#37270 it looks like the change but not the fix got applied and it's the only way i know of to corrupt that buffer, it's also string specific. If so applying the same fix should work.

@rmja
Copy link
Contributor

rmja commented Aug 30, 2019

@Wraith2 you are absolutely right. I can confirm that it fixes the issue. I created #171 which applies dotnet/corefx#37270 to M.D.SqlClient.

Thank you so much for helping me with this!

David-Engel pushed a commit that referenced this issue Aug 30, 2019
@David-Engel David-Engel added this to the 1.1.0 milestone Aug 30, 2019
@David-Engel David-Engel added the 🐛 Bug! Issues that are bugs in the drivers we maintain. label Aug 30, 2019
@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2019

Well I'm sorry I broke it in the first place. It's been a bad day for my quality of my work.

@rmja
Copy link
Contributor

rmja commented Aug 30, 2019

@David-Engel I just want to point out the severity of this bug. Basically hitting the exception is a good thing because it is quite obvious that something is wrong. However, subtle reads of bad data can happen and is why I found the bug in the first place. Getting it to throw is a special, lucky case.

I would say that 1.0.19239.1 is dangerous to use in a linux/osx invironment as incorrect reads of values across packets can and will happen without any errors shown to the user.

Because of this I would strongly recommend to get the fix out in a 1.0.1 patch release asap instead of waiting on a 1.1.0.

cc @Wraith2 @cheenamalhotra

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2019

I'm pretty sure that conversation is being had already.
I agree that the possibility of silent bad data warrants a release.

@rmja
Copy link
Contributor

rmja commented Sep 5, 2019

Sorry for my impatience to get this fix out in a hurry, but I really believe that it is urgent now where ef core relies on M.D.SqlClient. It's release is imminent and it will have a dependency with this fatal issue. I really hope that a patch release will be out asap.

I created dotnet/efcore#17592 to track the update of the reference if this gets a patch release before the ef core release

Cc @cheenamalhotra @David-Engel @divega

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants