Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: init and add base pouchd logs API #1298

Merged
merged 1 commit into from
May 14, 2018
Merged

feature: init and add base pouchd logs API #1298

merged 1 commit into from
May 14, 2018

Conversation

fuweid
Copy link
Contributor

@fuweid fuweid commented May 10, 2018

Signed-off-by: Wei Fu [email protected]

Ⅰ. Describe what this PR did

pouchd will redirect the container's stdout/stderr into the /var/lib/pouch/containers/{id}/json.log. However, we don't redirect the exec log into the json.log.

However, this commit doesn't contains compress and logrotate functionality. And Haven't support show details in the stream yet. We will upgrade the functionality in the future.

Ⅱ. Does this pull request fix one issue?

NONE.

Ⅲ. Describe how you did it

Using thecontainerio pkg to redirect the stdout/stderr into the json.log. Since the raw-file in the containerio pkg is like the json.log, in order to avoid duplicate data, I remove the raw-file option in the pkg.

The important parts are tailFile and followFile.

tailFile

Before reading log, we must use the tailLine to seek the offset in the file. With the offset, we can set it to file and read it.

I don't want to load all the content into memory during the seeking the offset. Therefore, it maybe impact the IO in the host if the tailLine is too large or the each line contains so many content.

followFIle

I use github.com/fsnotify/fsnotify to watch the file. But it has limitation about the Remove event. In order to make it work, I use os.Stat to check the file if the watcher has timeout. Workaround.... sad :( ...

Besides this, limit of current design of containerio, we cannot get the event when the container has been stopped. I have to use the timeout-check the status of container. sad :( :(....+2

Ⅳ. Describe how to verify it

first, we run the following command:

➜  pouch run -it reg.docker.alibaba-inc.com/base/busybox:latest ls -al
total 44
drwxr-xr-x    1 root     root          4096 May 10 10:05 .
drwxr-xr-x    1 root     root          4096 May 10 10:05 ..
drwxr-xr-x    2 root     root         12288 Mar  9  2017 bin
drwxr-xr-x    5 root     root           360 May 10 10:05 dev
drwxr-xr-x    2 root     root          4096 Mar  9  2017 etc
drwxr-xr-x    2 nobody   nogroup       4096 Mar  9  2017 home
dr-xr-xr-x  168 root     root             0 May 10 10:05 proc
drwxr-xr-x    2 root     root          4096 Mar  9  2017 root
drwxr-xr-x    2 root     root            40 May 10 10:05 run
dr-xr-xr-x   13 root     root             0 May 10 10:05 sys
drwxrwxrwt    2 root     root          4096 Mar  9  2017 tmp
drwxr-xr-x    3 root     root          4096 Mar  9  2017 usr
drwxr-xr-x    4 root     root          4096 Mar  9  2017 var

Since we don't add the related cli, we can use docker to check

docker -H unix:///var/run/pouchd.sock logs  9e053b257714a3139b1c342c5ffdd2cfa0daba0ec7bdf1a63cb80e308e3ec576
total 44
drwxr-xr-x    1 root     root          4096 May 10 10:05 .
drwxr-xr-x    1 root     root          4096 May 10 10:05 ..
drwxr-xr-x    2 root     root         12288 Mar  9  2017 bin
drwxr-xr-x    5 root     root           360 May 10 10:05 dev
drwxr-xr-x    2 root     root          4096 Mar  9  2017 etc
drwxr-xr-x    2 nobody   nogroup       4096 Mar  9  2017 home
dr-xr-xr-x  168 root     root             0 May 10 10:05 proc
drwxr-xr-x    2 root     root          4096 Mar  9  2017 root
drwxr-xr-x    2 root     root            40 May 10 10:05 run
dr-xr-xr-x   13 root     root             0 May 10 10:05 sys
drwxrwxrwt    2 root     root          4096 Mar  9  2017 tmp
drwxr-xr-x    3 root     root          4096 Mar  9  2017 usr
drwxr-xr-x    4 root     root          4096 Mar  9  2017 var

and

➜  docker -H unix:///var/run/pouchd.sock logs --timestamps 9e053b257714a3139b1c342c5ffdd2cfa0daba0ec7bdf1a63cb80e308e3ec576
2018-05-10T10:05:31.125305017Z total 44
2018-05-10T10:05:31.125886382Z drwxr-xr-x    1 root     root          4096 May 10 10:05 .
2018-05-10T10:05:31.125909361Z drwxr-xr-x    1 root     root          4096 May 10 10:05 ..
2018-05-10T10:05:31.125916756Z drwxr-xr-x    2 root     root         12288 Mar  9  2017 bin
2018-05-10T10:05:31.125923447Z drwxr-xr-x    5 root     root           360 May 10 10:05 dev
2018-05-10T10:05:31.125929621Z drwxr-xr-x    2 root     root          4096 Mar  9  2017 etc
2018-05-10T10:05:31.125935903Z drwxr-xr-x    2 nobody   nogroup       4096 Mar  9  2017 home
2018-05-10T10:05:31.125942006Z dr-xr-xr-x  168 root     root             0 May 10 10:05 proc
2018-05-10T10:05:31.125948261Z drwxr-xr-x    2 root     root          4096 Mar  9  2017 root
2018-05-10T10:05:31.125954355Z drwxr-xr-x    2 root     root            40 May 10 10:05 run
2018-05-10T10:05:31.125960399Z dr-xr-xr-x   13 root     root             0 May 10 10:05 sys
2018-05-10T10:05:31.12596663Z drwxrwxrwt    2 root     root          4096 Mar  9  2017 tmp
2018-05-10T10:05:31.126030836Z drwxr-xr-x    3 root     root          4096 Mar  9  2017 usr
2018-05-10T10:05:31.126220486Z drwxr-xr-x    4 root     root          4096 Mar  9  2017 var

and

➜  docker -H unix:///var/run/pouchd.sock logs --since 2018-05-10T10:05:31.125916756Z 9e053b257714a3139b1c342c5ffdd2cfa0daba0ec7bdf1a63cb80e308e3ec576
drwxr-xr-x    2 root     root         12288 Mar  9  2017 bin
drwxr-xr-x    5 root     root           360 May 10 10:05 dev
drwxr-xr-x    2 root     root          4096 Mar  9  2017 etc
drwxr-xr-x    2 nobody   nogroup       4096 Mar  9  2017 home
dr-xr-xr-x  168 root     root             0 May 10 10:05 proc
drwxr-xr-x    2 root     root          4096 Mar  9  2017 root
drwxr-xr-x    2 root     root            40 May 10 10:05 run
dr-xr-xr-x   13 root     root             0 May 10 10:05 sys
drwxrwxrwt    2 root     root          4096 Mar  9  2017 tmp
drwxr-xr-x    3 root     root          4096 Mar  9  2017 usr
drwxr-xr-x    4 root     root          4096 Mar  9  2017 var

if you don't want to use docker, you can use curl:

# pouchd --listen tcp://localhost:5000
➜  curl "localhost:5000/v1.24/containers/9e053b257714a3139b1c342c5ffdd2cfa0daba0ec7bdf1a63cb80e308e3ec576/logs?stdout=1"
total 44
drwxr-xr-x    1 root     root          4096 May 10 10:05 .
drwxr-xr-x    1 root     root          4096 May 10 10:05 ..
drwxr-xr-x    2 root     root         12288 Mar  9  2017 bin
drwxr-xr-x    5 root     root           360 May 10 10:05 dev
drwxr-xr-x    2 root     root          4096 Mar  9  2017 etc
drwxr-xr-x    2 nobody   nogroup       4096 Mar  9  2017 home
dr-xr-xr-x  168 root     root             0 May 10 10:05 proc
drwxr-xr-x    2 root     root          4096 Mar  9  2017 root
drwxr-xr-x    2 root     root            40 May 10 10:05 run
dr-xr-xr-x   13 root     root             0 May 10 10:05 sys
drwxrwxrwt    2 root     root          4096 Mar  9  2017 tmp
drwxr-xr-x    3 root     root          4096 Mar  9  2017 usr
drwxr-xr-x    4 root     root          4096 Mar  9  2017 var

Ⅴ. Special notes for reviews

pouch cli will be done in the following PR.

And please read the code comment to get more detail. Thanks

@codecov-io
Copy link

codecov-io commented May 10, 2018

Codecov Report

Merging #1298 into master will increase coverage by 1%.
The diff coverage is 39.42%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #1298    +/-   ##
========================================
+ Coverage   16.42%   17.42%    +1%     
========================================
  Files         183      188     +5     
  Lines       11309    11753   +444     
========================================
+ Hits         1857     2048   +191     
- Misses       9316     9557   +241     
- Partials      136      148    +12
Impacted Files Coverage Δ
daemon/config/config.go 7.89% <ø> (ø) ⬆️
apis/server/utils.go 0% <0%> (ø)
apis/server/container_bridge.go 0% <0%> (ø) ⬆️
daemon/mgr/container.go 0% <0%> (ø) ⬆️
daemon/mgr/system.go 0% <0%> (ø) ⬆️
apis/server/router.go 0% <0%> (ø) ⬆️
daemon/containerio/options.go 0% <0%> (ø) ⬆️
daemon/containerio/jsonfile.go 1.49% <1.49%> (ø)
pkg/utils/utils.go 72.39% <100%> (+2.59%) ⬆️
daemon/logger/jsonfile/jsonfile.go 26.92% <26.92%> (ø)
... and 14 more


// type
Type string `json:"Type,omitempty"`
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In moby, the LogConfig has a type of

// LogConfig represents the default log configuration.
// It includes json tags to deserialize configuration from a file
// using the same names that the flags in the command line use.
type LogConfig struct {
	Type   string            `json:"log-driver,omitempty"`
	Config map[string]string `json:"log-opts,omitempty"`
}

The json naming of the fields are different, log-driver rather than Type, log-opts rather than Config.
Do you think we need to keep compatible? @fuweid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make senses. Will update it.

// will hang and cause goroutine leak.
func withCancelHandler(h handler) handler {
return func(ctx context.Context, rw http.ResponseWriter, req *http.Request) error {
if notifier, ok := rw.(http.CloseNotifier); ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using return fast to make less ident?

if notifier, ok := rw.(http.CloseNotifier); !ok {
    return h(ctx, rw, req)
}
var cancel context.CancelFunc
ctx, cancel = context.WithCancel(ctx)
......
return h(ctx, rw, req)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Will update it.


// JSONLogFile is uses to log the container's stdout and stderr.
type JSONLogFile struct {
mu sync.Mutex
Copy link
Collaborator

@allencloud allencloud May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering this lock will detect what kind of critical section.
At least, in my opinion, a f *os.File does not need locked. This is not an instance allocated in memory and shared within several goroutine. It is specific filesystem and io resource to guarantee the concurrent writing of the file. WDYT? @fuweid
Correct me if I missed anything.

In addition, @shaloulcy found that in the definition of os.File, it is already has a lock there to guarantee concurrency things in internal/poll/fd_unix.go:

// Write implements io.Writer.
func (fd *FD) Write(p []byte) (int, error) {
    if err := fd.writeLock(); err != nil {
        return 0, err 
    }   
    defer fd.writeUnlock()
    if err := fd.pd.prepareWrite(fd.isFile); err != nil {
        return 0, err 
    }   
    var nn int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wondering the case that if we need file.Name() related action. The case still needs the lock. Other cases, like Write and Close, need the lock to make sure the concurrent issue. I think we can keep the lock here. After make the container.io stable and high performance, we will remove the lock if we don't need it. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.

"time"

"github.com/alibaba/pouch/daemon/logger"
"github.com/sirupsen/logrus"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line at line 12, and remove the blank line in L13.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will do


const (
blockSize = 1024
eol = '\n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using endOfLine intead of eol which seems to be less readable. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will do

// NOTE: created container doesn't create IO.
if c.IsCreated() {
msgCh := make(chan *logger.LogMessage, 1)
close(msgCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot quite follow the above two line of code. Why to initialize a channel and close it immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The created container doesn't have the log file in current design.

For this case, we should fast return with closed channel. The receiver receives from closed channel and it will break the dead loop and finish the writeLogStream.

I just hold the change scope in the log side and don't want to change the logic of initialize container in this commit. Does it make senses? Maybe I can add the TODO to explain why I do.

"net/http"
"strings"

"github.com/alibaba/pouch/apis/types"
Copy link
Collaborator

@allencloud allencloud May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that a package in pkg like httputils should import packages from alibaba/pouch. WDYT
Otherwise, it is not so common that we need to split it out of pkg and locate it in project pouch/..., like logs_utils.go .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put it into the apis package level. It's only used in logs API.

@@ -2131,6 +2115,27 @@ definitions:
HostConfig:
$ref: "#/definitions/HostConfig"

LogConfig:
Copy link
Collaborator

@allencloud allencloud May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid we still need to update the routes of /containers/{id}/logs: in this file, since the status code in the code from manager are more than that has been defined in /containers/{id}/logs:.
For example, 400.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make senses. Will do

@allencloud
Copy link
Collaborator

I left some comments there, while it is still not so easy to do the review work.
To be honest, when a large part of codes need merged into the master, I usually rely on the sufficient test case your wrote to make it move on. @fuweid 😄 🍺

In this commit, pouchd will redirect the stdout/stderr into the
/var/lib/pouch/containers/{id}/json.log. However, we don't redirect the
exec log into the json.log.

This commit doesn't contains compress log and logrotate functionality
and also doesn't support show details in the stream.

Signed-off-by: Wei Fu <[email protected]>
@fuweid
Copy link
Contributor Author

fuweid commented May 14, 2018

@allencloud , I have updated the code and please take a look. Thanks

@allencloud
Copy link
Collaborator

LGTM, let's kick the ball.

@allencloud allencloud merged commit 967f505 into AliyunContainerService:master May 14, 2018
@pouchrobot pouchrobot added the LGTM one maintainer or community participant agrees to merge the pull reuqest. label May 14, 2018
@fuweid fuweid deleted the feature_pouch_logs_api branch August 3, 2018 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
areas/log kind/feature LGTM one maintainer or community participant agrees to merge the pull reuqest. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants