Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed line endings not ignored when comparing hashes #247

Closed
rfbain opened this issue Apr 19, 2016 · 17 comments
Closed

Mixed line endings not ignored when comparing hashes #247

rfbain opened this issue Apr 19, 2016 · 17 comments

Comments

@rfbain
Copy link

rfbain commented Apr 19, 2016

This is a follow up to issue: Ignore EOL changes when checking whether script has changed. There is still a problem when the original text has a mix of windows and unix line endings. An attempt is made to convert the new text to all windows line endings and all unix line endings, but neither one matches the original with mixed line endings. I'd like to see a command line switch that lets the user choose to set line endings to all windows or to all unix prior to computing the hash which gets saved to table ScriptsRun. Note, I'm only asking for the hash to be changed to enable future checks to work correctly. I would like the original version of the text to be saved in column text_of_script. If others would like the text to be updated as well, perhaps that could use different values for the switch.

@ferventcoder
Copy link
Member

ferventcoder commented Apr 19, 2016

So I'm not fully following. If we are ignoring the \r\n / \n line endings for comparison, what does it matter what it is stored as?

@ferventcoder
Copy link
Member

As in, I'm pretty sure once the ignore is in, line endings would no longer be a factor.

@rfbain
Copy link
Author

rfbain commented Apr 19, 2016

Perhaps an example will help. Think of the following being included in a sql script.

-- this line was copied from a unix source.\n
-- this line was copied from a windows source.\r\n

Then someone looks at the file in a windows editor and saves it. Now it becomes.

-- this line was copied from a unix source.\r\n
-- this line was copied from a windows source.\r\n

Function have_same_hash_ignoring_platform in class DefaultDatabaseMigrator will show the file as having been changed and the script will either be run again, or trigger an error or warning when the script should be skipped.

@ferventcoder
Copy link
Member

Again, I'm not sure it would be a factor, that's what ignoring the line endings is about. It doesn't matter if the line ending is windows or unix format, or even older Mac format with \r, the line endings are completely ignored with the ignore line endings code. If they are not, it's a bug in the code itself.

@rfbain
Copy link
Author

rfbain commented Apr 20, 2016

The enclosed zip file has 4 batch files, and the logs produced by executing those files. Some sql scripts have lines that consistently end in CrLf. Some have lines that consistently end in Lf. And some have lines with a mix of line endings. You should be able to tell which is which from the filenames. Batch files CrLf_Lf and Lf_CrLf run as expected. This shows line endings are ignored when the line endings in individual files are consistent, even if they differ from one file to another. Batch files CrLf_Mixed and Mixed_Crlf show the bug when line endings differ in individual files.
rh_issue.zip

@ferventcoder
Copy link
Member

Curious though, has the other feature actually been released so you can test this?

@ferventcoder
Copy link
Member

Looking at #104 and the ReadMe, it looks like it was released in 0.8.6 - however looking at https://github.com/chucknorris/roundhouse/pull/104/files, it may have a possible bug. I would have expected it to just convert all line endings to one so they would be ignored.

@rfbain
Copy link
Author

rfbain commented Apr 20, 2016

It comes down to this function, in class DefaultDatabaseMigrator in version 0.8.6.0 which I got from the site a few days ago. I have not looked to see if this has been updated in the unreleased versions.
private bool have_same_hash_ignoring_platform(string sql_to_run, string old_text_hash)
{
// check with unix and windows line endings
const string line_ending_windows = "\r\n";
const string line_ending_unix = "\n";
string new_text_hash = create_hash(sql_to_run.Replace(line_ending_windows, line_ending_unix));
bool hash_is_same = hashes_are_equal(new_text_hash, old_text_hash);

        if (!hash_is_same)
        {
            // try other way around
            new_text_hash = create_hash(sql_to_run.Replace(line_ending_unix, line_ending_windows));
            hash_is_same = hashes_are_equal(new_text_hash, old_text_hash);
        }

        return hash_is_same;
    }

create_hash(sql_to_run.Replace(line_ending_unix, line_ending_windows));

will produce a mix of \r\r\n and \r\n when sql_to_run has a mix of line endings, explaining why there are problems when the new file has inconsistent endings. If the original file had inconsistent endings, the hash will be based on those endings, so neither substitution will match.

@ferventcoder
Copy link
Member

Right - so it's a bug in the code and the feature.

@jftaylorMn
Copy link

I think there are two related bugs.

The first is documented above. The code correctly tests if the EOL is unix or windows. However, it returns a false positive when there are both line types intermixed in the same file. The first time through, everything works fine because everything is new. After the first pass, mixed EOL files will consistently be reported as changed by have_same_hash_ignoring_platform even if they haven't been touched.

The second bug relates to the hash that is saved and subsequently passed to the above method. We have seen cases where mixed EOL files are repaired "automatically" - by just re-saving a file opened in an editor for example. Some evidence suggests that simply pulling the code from the repo using VS team explorer might actually trigger the "fix". The developer had no intention to make any change, and the SQL database wouldn't detect a change. In order to consistently ignore any EOL changes, the hash that is saved to the database needs to be the one for the EOL-agnostic version of the script - not what is in the current file. I am hoping that after this bug is fixed the -baseline switch could be used to "true up" the hash values in the database without actually executing the SQL code. I can see where this might be a controversial/breaking change so would entertain the introduction of a new switch to enforce this behavior.

@rfbain
Copy link
Author

rfbain commented Apr 20, 2016

I'm new to this forum and not sure about the next steps. Now that the roundhouse team is aware of the issue, should I leave it in their hands? Or is there more I should do? My original post requested a switch to enable the line endings to be normalized prior to computing the hash to be saved into the database. My testing in the past day shows it would be helpful to edit the function listed above to first change crlf to lf and then change lf to crlf prior to computing the has for the sql_to_run variable.

@rfbain
Copy link
Author

rfbain commented Apr 20, 2016

I tried not to bring other applications into the discussion, but yes, this issue was noticed in relation to using git with Visual Studio and performing merges. That's a bit much to introduce into a discussion like this one in a reproducible way.

@ferventcoder
Copy link
Member

It goes to the backlog as a bug.

@ferventcoder
Copy link
Member

Then it gets prioritized and fixed - someone from the community could also provide a PR to fix this behavior.

@rfbain
Copy link
Author

rfbain commented Apr 20, 2016

I'm not ready to enter this as a pull request without real world testing, but I'll mention this solved all of the test cases I entered in my attachment above. It did not require any new switches. I changed function hash in class MD5CryptographicService as listed below. I did not change function have_same_hash_ignoring_platform. However, if this change gets accepted, the line ending changes and multiple hash comparisons in function have_same_hash_ignoring_platform become unnecessary.

    public string hash(string clear_text_of_what_to_hash)
    {
        // roundhouse version returned hash based on source line endings
        // custom returns hash based on all line endings being unix line endings

        // start of custom addition
        const string line_ending_windows = "\r\n";
        const string line_ending_unix = "\n";
        const string line_ending_mac = "\r";
        string normalized_text_of_what_to_hash = clear_text_of_what_to_hash
            .Replace(line_ending_windows, line_ending_unix).Replace(line_ending_mac, line_ending_unix);
        // end of custom addition

        // byte[] clear_text_bytes = Encoding.UTF8.GetBytes(clear_text_of_what_to_hash); // removed in custom code
        byte[] clear_text_bytes = Encoding.UTF8.GetBytes(normalized_text_of_what_to_hash); // replacement in custom code

        byte[] cypher_bytes = crypto_provider.ComputeHash(clear_text_bytes);
        return Convert.ToBase64String(cypher_bytes);
    }

@ferventcoder
Copy link
Member

This is more along the lines of what I would have wanted for the earlier fix.

@BiggerNoise
Copy link
Member

This should be addressed by my recent changes to hash computation (on master or the next release). Please reopen with details if what I have doesn't get it done for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants