Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX Transport creates file on Open() without Flush() #1765

Closed
philip-davis opened this issue Sep 19, 2019 · 13 comments
Closed

POSIX Transport creates file on Open() without Flush() #1765

philip-davis opened this issue Sep 19, 2019 · 13 comments

Comments

@philip-davis
Copy link
Collaborator

The open() system call can create the file on disk immediately. To demonstrate this, log into two different login nodes on, for example, Cori and run this code on one of them:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
    int fd;
    struct stat st;

    open("abc.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    sleep(100);

    return(0);

}

abc.txt will be visible on the other login node immediately, both for NFS and Lustre file systems.

The reason I bring this up is that this open() call is the same as the one used by FilePOSIX::Open() when openMode is Mode::Write. If I'm reading the code correctly, FilePOSIX is the default Transport library, including for BP4. This matters for the File Metadata Index. Even though we wait until after writing the header to do a flush, that flush is a no-op for POSIX, and the zero-length file has already been created.

This is, in turn, a problem because the reader could see the zero-length metadata index file, and then interpreting the header as a new timestep. I expect ParseMetadata would fail noisily in this case, but I haven't looked closely enough at the code to know that for sure.

@williamfgc
Copy link
Contributor

williamfgc commented Sep 19, 2019

Hi @philip-davis I assume you're referring to the streaming with files case. At the user-level reading should be done after a writer EndStep, but you point out there is a "gray" area (0 bytes) between Open and right before EndStep. Am I understanding the issue correctly? If it is either the reader might have the appropriate logic, or the initial buffer must have a flag to say it exist but nothing was written yet.

@philip-davis
Copy link
Collaborator Author

Hi @williamfgc, yes this is for streaming. The gray area is the between Open and the writing of the header of the metadata index file (which contains things like the ADIOS version of the writer, and which is only written once.) This leads to a code path where the reader could misinterpret this header as the index for a new step.

@philip-davis
Copy link
Collaborator Author

The write happens only a few lines after the Open in InitTransports(), so it's not a very long time.

@williamfgc
Copy link
Contributor

@philip-davis thanks for the clarification. My first guess is that the reader would look for the appropriate index files flags and throw an exception if things are not as expected. As you said, it would be a matter of testing to look for race conditions (even if it's not a very long time).

@lwan86
Copy link
Contributor

lwan86 commented Sep 19, 2019

Hi @philip-davis, I'm not sure if I understand your question correctly, but I think the flush is called every time the metadata index file is written. Just take a look at the following code from line 530 to 533 in BP4Writer.cpp:

    m_FileMetadataIndexManager.WriteFiles(
        m_BP4Serializer.m_MetadataIndex.m_Buffer.data(),
        m_BP4Serializer.m_MetadataIndex.m_Position);
    m_FileMetadataIndexManager.FlushFiles();

What kind of problem did you run into? It's on the writer side or reader side?

@lwan86
Copy link
Contributor

lwan86 commented Sep 19, 2019

@philip-davis I see what you are saying. It's actually the code from line 215 to 233:

    m_FileMetadataIndexManager.OpenFiles(
        metadataIndexFileNames, m_OpenMode, m_IO.m_TransportsParameters,
        m_BP4Serializer.m_Profiler.m_IsActive);

    if (m_OpenMode != Mode::Append ||
        m_FileMetadataIndexManager.GetFileSize(0) == 0)
    {
        /* Prepare header and write now to Index Table indicating
         * the start of streaming */
        m_BP4Serializer.MakeHeader(m_BP4Serializer.m_MetadataIndex,
                                   "Index Table", true);

        m_FileMetadataIndexManager.WriteFiles(
            m_BP4Serializer.m_MetadataIndex.m_Buffer.data(),
            m_BP4Serializer.m_MetadataIndex.m_Position);
        m_FileMetadataIndexManager.FlushFiles();
        /* clear the metadata index buffer*/
        m_BP4Serializer.ResetBuffer(m_BP4Serializer.m_MetadataIndex, true);
    }

Basically, you are saying between "OpenFiles" and "FlushFiles", if the reader tries to read the metadata index file, error will occur since it is empty, right?

I think Norbert probably added some code on the reader side that handles this situation. But I need to double check with him.

@philip-davis
Copy link
Collaborator Author

philip-davis commented Sep 19, 2019

I'm not sure if I'm actually seeing this cause a problem, I just noticed this when I was trying to figure out what was going on with #1745. The issue is with the open:

 m_FileMetadataIndexManager.OpenFiles(
            metadataIndexFileNames, m_OpenMode, m_IO.m_TransportsParameters,
            m_BP4Serializer.m_Profiler.m_IsActive);

As soon as that happens, a zero length md0.idx is visible on the file system (even though there hasn't been a Flush yet.) Once that file exists, OpenFiles() can succeed on the read side, since the idx file and metadata files can be opened without an exception being thrown. Since the idx file is empty, m_MDIndexFileProcessedSize will be set to zero. Later, when the header is written, in BP4Reader::UpdateBuffer the reader will see that the idx file is larger and wrongly process that as a new timestep.

@philip-davis
Copy link
Collaborator Author

philip-davis commented Sep 19, 2019 via email

@lwan86
Copy link
Contributor

lwan86 commented Sep 19, 2019

@philip-davis I think this situation has been handled on the reader side. Take a look at the code from line 255 to 267 in BP4Reader.cpp:
while (waited <= timeoutSeconds)
{
startTime = MPI_Wtime();
const size_t idxFileSize = m_MDIndexFileManager.GetFileSize(0);
if (idxFileSize > 63)
{
flag = 0; // we have data
break;
}
std::this_thread::sleep_for(std::chrono::milliseconds(pollTime_ms));
endTime = MPI_Wtime();
waited += endTime - startTime;
}

If the size of the metadata index file is less than 63 bytes, which means the header has not been written, the reader will be just waiting.

@williamfgc
Copy link
Contributor

@philip-davis I think what @lwan86 explained addresses the concern. The Flush() function is virtual and it has actual value for buffered I/O backends, fstream and stdio. It's just a way to abstract many backends under the Transport base class. fstream is the default on Windows.

@philip-davis
Copy link
Collaborator Author

@lwan86, thanks for pointing that out; I didn't see that code. I think it would help with the situation, except that I can't see that it is updating m_MDIndexFileProcessedSize, unless I am missing it. As long as m_MDIndexFileProcessedSize isn't updated when the size changes, I think this issue is still present.

@pnorbert
Copy link
Contributor

pnorbert commented Sep 19, 2019 via email

@philip-davis
Copy link
Collaborator Author

@pnorbert Okay, I see that now. So we always have the idx header by the time we leave the IO::Open(). Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants