Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Refactor the structure #65

Merged
merged 12 commits into from
Apr 22, 2018
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,10 @@ Katib provides a Web UI based on ModelDB(https://github.com/mitdbg/modeldb). The
In addition to TensorFlow, other deep learning frameworks (e.g. PyTorch, MXNet) support TensorBoard format logging.
Katib integrates with TensorBoard easily. To use TensorBoard from Katib, we define a persistent volume claim and set the mount config for the Study. Katib searches each trial log in `{pvc mount path}/logs/{Study ID}/{Trial ID}`.
`{{STUDY_ID}}` and `{{TRIAL_ID}}` in the Studyconfig file are replaced the corresponding value when creating each job.
See example `conf/tf-nmt.yml` which is a config for parameter tuning of [tensorflow/nmt](https://github.com/tensorflow/nmt).
See example `examples/tf-nmt.yml` which is a config for parameter tuning of [tensorflow/nmt](https://github.com/tensorflow/nmt).

```bash
./katib-cli -s gpu-node2:30678 -f ../conf/tf-nmt.yml Createstudy
./katib-cli -s gpu-node2:30678 -f ../examples/tf-nmt.yml Createstudy
2018/04/03 05:52:11 connecting gpu-node2:30678
2018/04/03 05:52:11 study conf{tf-nmt root MINIMIZE 0 configs:<name:"--num_train_steps" parameter_type:INT feasible:<max:"1000" min:"1000" > > configs:<name:"--dropout" parameter_type:DOUBLE feasible:<max:"0.3" min:"0.1" > > configs:<name:"--beam_width" parameter_type:INT feasible:<max:"15" min:"5" > > configs:<name:"--num_units" parameter_type:INT feasible:<max:"1026" min:"256" > > configs:<name:"--attention" parameter_type:CATEGORICAL feasible:<list:"luong" list:"scaled_luong" list:"bahdanau" list:"normed_bahdanau" > > configs:<name:"--decay_scheme" parameter_type:CATEGORICAL feasible:<list:"luong234" list:"luong5" list:"luong10" > > configs:<name:"--encoder_type" parameter_type:CATEGORICAL feasible:<list:"bi" list:"uni" > > [] random median [name:"SuggestionNum" value:"10" name:"MaxParallel" value:"6" ] [] test_ppl [ppl bleu_dev bleu_test] yujioshima/tf-nmt:latest-gpu [python -m nmt.nmt --src=vi --tgt=en --out_dir=/nfs-mnt/logs/{{STUDY_ID}}_{{TRIAL_ID}} --vocab_prefix=/nfs-mnt/learndatas/wmt15_en_vi/vocab --train_prefix=/nfs-mnt/learndatas/wmt15_en_vi/train --dev_prefix=/nfs-mnt/learndatas/wmt15_en_vi/tst2012 --test_prefix=/nfs-mnt/learndatas/wmt15_en_vi/tst2013 --attention_architecture=standard --attention=normed_bahdanau --batch_size=128 --colocate_gradients_with_ops=true --eos=</s> --forget_bias=1.0 --init_weight=0.1 --learning_rate=1.0 --max_gradient_norm=5.0 --metrics=bleu --share_vocab=false --num_buckets=5 --optimizer=sgd --sos=<s> --steps_per_stats=100 --time_major=true --unit_type=lstm --src_max_len=50 --tgt_max_len=50 --infer_batch_size=32] 1 default-scheduler pvc:"nfs" path:"/nfs-mnt" }
2018/04/03 05:52:11 req Createstudy
Expand Down
12 changes: 0 additions & 12 deletions build.sh

This file was deleted.

4 changes: 2 additions & 2 deletions cli/Dockerfile → cmd/cli/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cli
WORKDIR /go/src/github.com/kubeflow/katib/cmd/cli
RUN go build -o katib-cli

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/cli/katib-cli /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/cli/katib-cli /app/
2 changes: 1 addition & 1 deletion cli/main.go → cmd/cli/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (

"google.golang.org/grpc"

pb "github.com/kubeflow/katib/api"
pb "github.com/kubeflow/katib/pkg/api"
)

var server = flag.String("s", "127.0.0.1:6789", "server address")
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/earlystopping/medianstopping
WORKDIR /go/src/github.com/kubeflow/katib/cmd/earlystopping/medianstopping
RUN go build -o medianstopping

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/earlystopping/medianstopping /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/earlystopping/medianstopping /app/
ENTRYPOINT ["./medianstopping"]
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ import (
"google.golang.org/grpc"
"google.golang.org/grpc/reflection"

pb "github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/earlystopping"
pb "github.com/kubeflow/katib/pkg/api"
"github.com/kubeflow/katib/pkg/earlystopping"
)

func main() {
Expand Down
12 changes: 12 additions & 0 deletions cmd/manager/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/cmd/manager
RUN go build -o vizier-manager

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/manager/vizier-manager /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/pkg/manager/visualise /
ENTRYPOINT ["./vizier-manager"]
CMD ["-w", "dlk"]
17 changes: 9 additions & 8 deletions manager/main.go → cmd/manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ import (
"strconv"
"time"

pb "github.com/kubeflow/katib/api"
kdb "github.com/kubeflow/katib/db"
"github.com/kubeflow/katib/manager/modelstore"
tbif "github.com/kubeflow/katib/manager/visualise/tensorboard"
"github.com/kubeflow/katib/manager/worker_interface"
dlkwif "github.com/kubeflow/katib/manager/worker_interface/dlk"
k8swif "github.com/kubeflow/katib/manager/worker_interface/kubernetes"
nvdwif "github.com/kubeflow/katib/manager/worker_interface/nvdocker"
pb "github.com/kubeflow/katib/pkg/api"
kdb "github.com/kubeflow/katib/pkg/db"
"github.com/kubeflow/katib/pkg/manager/modelstore"
tbif "github.com/kubeflow/katib/pkg/manager/visualise/tensorboard"
"github.com/kubeflow/katib/pkg/manager/worker_interface"
dlkwif "github.com/kubeflow/katib/pkg/manager/worker_interface/dlk"
k8swif "github.com/kubeflow/katib/pkg/manager/worker_interface/kubernetes"
nvdwif "github.com/kubeflow/katib/pkg/manager/worker_interface/nvdocker"

"google.golang.org/grpc"
"google.golang.org/grpc/reflection"
Expand Down Expand Up @@ -394,6 +394,7 @@ func main() {
switch *workerType {
case "kubernetes":
log.Printf("Worker: kubernetes\n")
// Notice: Missing in the repo.
kc, err := clientcmd.BuildConfigFromFlags("", "/conf/kubeconfig")
if err != nil {
log.Fatal(err)
Expand Down
8 changes: 4 additions & 4 deletions manager/main_test.go → cmd/manager/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ import (

"github.com/golang/mock/gomock"

api "github.com/kubeflow/katib/api"
mockdb "github.com/kubeflow/katib/mock/db"
mockmodelstore "github.com/kubeflow/katib/mock/modelstore"
mockworker "github.com/kubeflow/katib/mock/worker"
api "github.com/kubeflow/katib/pkg/api"
mockdb "github.com/kubeflow/katib/pkg/mock/db"
mockmodelstore "github.com/kubeflow/katib/pkg/mock/modelstore"
mockworker "github.com/kubeflow/katib/pkg/mock/worker"
)

func TestCreateStudy(t *testing.T) {
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM python:3

ADD . /usr/src/app/github.com/kubeflow/katib
WORKDIR /usr/src/app/github.com/kubeflow/katib/suggestion/bayesianoptimization
WORKDIR /usr/src/app/github.com/kubeflow/katib/cmd/suggestion/bayesianoptimization
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /usr/src/app/github.com/kubeflow/katib

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

import time

from api.python import api_pb2_grpc
from suggestion.bayesian_service import BayesianService
from suggestion.types import DEFAULT_PORT
from pkg.api.python import api_pb2_grpc
from pkg.suggestion.bayesian_service import BayesianService
from pkg.suggestion.types import DEFAULT_PORT

_ONE_DAY_IN_SECONDS = 60 * 60 * 24

Expand All @@ -14,6 +14,7 @@ def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
api_pb2_grpc.add_SuggestionServicer_to_server(BayesianService(), server)
server.add_insecure_port(DEFAULT_PORT)
print("Listening...")
server.start()
try:
while True:
Expand Down
4 changes: 2 additions & 2 deletions suggestion/grid/Dockerfile → cmd/suggestion/grid/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/suggestion/grid
WORKDIR /go/src/github.com/kubeflow/katib/cmd/suggestion/grid
RUN go build -o grid

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/suggestion/grid /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/suggestion/grid /app/
ENTRYPOINT ["./grid"]
4 changes: 2 additions & 2 deletions suggestion/grid/main.go → cmd/suggestion/grid/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ import (
"google.golang.org/grpc"
"google.golang.org/grpc/reflection"

pb "github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/suggestion"
pb "github.com/kubeflow/katib/pkg/api"
"github.com/kubeflow/katib/pkg/suggestion"
)

func main() {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/suggestion/hyperband
WORKDIR /go/src/github.com/kubeflow/katib/cmd/suggestion/hyperband
RUN go build -o hyperband

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/suggestion/hyperband /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/suggestion/hyperband /app/
ENTRYPOINT ["./hyperband"]
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ import (
"google.golang.org/grpc"
"google.golang.org/grpc/reflection"

pb "github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/suggestion"
pb "github.com/kubeflow/katib/pkg/api"
"github.com/kubeflow/katib/pkg/suggestion"
)

func main() {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/katib
WORKDIR /go/src/github.com/kubeflow/katib/suggestion/random
WORKDIR /go/src/github.com/kubeflow/katib/cmd/suggestion/random
RUN go build -o random

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/katib/suggestion/random /app/
COPY --from=build-env /go/src/github.com/kubeflow/katib/cmd/suggestion/random /app/
ENTRYPOINT ["./random"]
4 changes: 2 additions & 2 deletions suggestion/random/main.go → cmd/suggestion/random/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ import (
"google.golang.org/grpc"
"google.golang.org/grpc/reflection"

pb "github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/suggestion"
pb "github.com/kubeflow/katib/pkg/api"
"github.com/kubeflow/katib/pkg/suggestion"
)

func main() {
Expand Down
11 changes: 0 additions & 11 deletions deploy.sh

This file was deleted.

2 changes: 1 addition & 1 deletion docs/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
You can build all images from source.

```bash
./build.sh
./scripts/build.sh
```

## Implement new suggestion algorithm
Expand Down
4 changes: 2 additions & 2 deletions docs/getting-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ StudyID Name Owner RunningTrial CompletedTrial
Try Createstudy. Study will be created and start hyperparameter search.

```bash
$ katib-cli -s gpu-node2:30678 -f ../conf/random.yml Createstudy
$ katib-cli -s gpu-node2:30678 -f ../examples/random.yml Createstudy
2018/04/03 05:16:37 connecting gpu-node2:30678
2018/04/03 05:16:37 study conf{cifer10 root MAXIMIZE 0 configs:<name:"--lr" parameter_type:DOUBLE feasible:<max:"0.07" min:"0.03" > > configs:<name:"--lr-factor" parameter_type:DOUBLE feasible:<max:"0.2" min:"0.05" > > configs:<name:"--max-random-h" parameter_type:INT feasible:<max:"46" min:"26" > > configs:<name:"--max-random-l" parameter_type:INT feasible:<max:"75" min:"25" > > configs:<name:"--num-epochs" parameter_type:INT feasible:<max:"3" min:"3" > > [] random median [name:"SuggestionNum" value:"2" name:"MaxParallel" value:"2" ] [] Validation-accuracy [accuracy] mxnet/python:gpu [python /mxnet/example/image-classification/train_cifar10.py --batch-size=512 --gpus=0,1] 2 default-scheduler <nil> }
2018/04/03 05:16:37 req Createstudy
Expand Down Expand Up @@ -215,7 +215,7 @@ parameterconfigs:
```

```bash
$ katib-cli -s gpu-node2:30678 -f ../conf/random-pv.yml Createstudy
$ katib-cli -s gpu-node2:30678 -f ../examples/random-pv.yml Createstudy
2018/04/03 05:49:47 connecting gpu-node2:30678
2018/04/03 05:49:47 study conf{cifer10-pv-test root MAXIMIZE 0 configs:<name:"--lr" parameter_type:DOUBLE feasible:<max:"0.07" min:"0.03" > > configs:<name:"--lr-factor" parameter_type:DOUBLE feasible:<max:"0.2" min:"0.05" > > configs:<name:"--max-random-h" parameter_type:INT feasible:<max:"46" min:"26" > > configs:<name:"--max-random-l" parameter_type:INT feasible:<max:"75" min:"25" > > configs:<name:"--num-epochs" parameter_type:INT feasible:<max:"3" min:"3" > > [] random median [name:"SuggestionNum" value:"2" name:"MaxParallel" value:"2" ] [] Validation-accuracy [accuracy] mxnet/python:gpu [python /mxnet/example/image-classification/train_cifar10.py --batch-size=512 --gpus=0,1] 2 default-scheduler pvc:"nfs" path:"/nfs-mnt" }
2018/04/03 05:49:47 req Createstudy
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added katib-manager
Binary file not shown.
12 changes: 0 additions & 12 deletions manager/Dockerfile

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
import grpc

import api.python.api_pb2 as api__pb2
import pkg.api.python.api_pb2 as api__pb2


class ManagerStub(object):
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion db/interface.go → pkg/db/interface.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (
"strings"
"time"

api "github.com/kubeflow/katib/api"
api "github.com/kubeflow/katib/pkg/api"

_ "github.com/go-sql-driver/mysql"
)
Expand Down
2 changes: 1 addition & 1 deletion db/interface_test.go → pkg/db/interface_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import (
"github.com/golang/protobuf/jsonpb"
"gopkg.in/DATA-DOG/go-sqlmock.v1"

api "github.com/kubeflow/katib/api"
api "github.com/kubeflow/katib/pkg/api"
)

var db_interface VizierDBInterface
Expand Down
2 changes: 1 addition & 1 deletion db/test/test.go → pkg/db/test/test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ package main

import (
"fmt"
"github.com/kubeflow/katib/db"
"github.com/kubeflow/katib/pkg/db"
"os"
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ package earlystopping
import (
"context"
"errors"
"github.com/kubeflow/katib/api"
vdb "github.com/kubeflow/katib/db"
"github.com/kubeflow/katib/pkg/api"
vdb "github.com/kubeflow/katib/pkg/db"
"log"
"sort"
"strconv"
Expand Down
2 changes: 1 addition & 1 deletion earlystopping/types.go → pkg/earlystopping/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package earlystopping
import (
"context"

"github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/pkg/api"
)

const (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ import (
"context"
"fmt"
"git.apache.org/thrift.git/lib/go/thrift"
"github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/manager/modelstore/modeldb"
"github.com/kubeflow/katib/pkg/api"
"github.com/kubeflow/katib/pkg/manager/modelstore/modeldb"
"log"
"net"
"strconv"
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package modelstore

import (
"github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/pkg/api"
)

type ModelStore interface {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ package tensorboard

import (
"bytes"
"github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/pkg/api"
"io/ioutil"
apiv1 "k8s.io/api/core/v1"
exbeatav1 "k8s.io/api/extensions/v1beta1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ import (
"bytes"
"encoding/json"
"fmt"
"github.com/kubeflow/katib/api"
"github.com/kubeflow/katib/db"
"github.com/kubeflow/katib/pkg/api"
"github.com/kubeflow/katib/pkg/db"
dlkapi "github.com/kubeflow/katib/dlk/dlkmanager/api"
"github.com/kubeflow/katib/dlk/dlkmanager/datastore"
wIF "github.com/kubeflow/katib/manager/worker_interface"
wIF "github.com/kubeflow/katib/pkg/manager/worker_interface"
"io/ioutil"
"log"
"net/http"
Expand Down
Loading