Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Simplified Job Templates for NFS configured OpenPAI #2238

Closed
5 tasks done
scarlett2018 opened this issue Feb 28, 2019 · 11 comments
Closed
5 tasks done

Simplified Job Templates for NFS configured OpenPAI #2238

scarlett2018 opened this issue Feb 28, 2019 · 11 comments
Assignees
Milestone

Comments

@scarlett2018
Copy link
Member

scarlett2018 commented Feb 28, 2019

What would you like to be added:
A simplified job submission page for simple job types.

Why is this needed:
Current Job submission page is too complex to understand for most of the users.

Goal

  • 2 TF Job Types we listed for jobs running on OpenPAI Instance that has NFS configured.

No Goal

  • Jobs that does not have NFS configured
  • Other Job Types

UX Mock Up (need to update with latest discussed changes)

Sample Job Type - Tensorflow-SingleNode
image

Sample Job Type - Tensorflow-Distributed
image


Exit Criteria

  • 3 Job Type Examples that can run end 2 end @abuccts Anbang

Dependency
Plugin Changes Need: issue to be created by @Gerhut

  • server side change to make endpoint per plugin: plugin configuration changes to pass plugin dictionary
  • plugin versioning
  • submit v1 json to initialize form

Job Type Need:

  • * Job Type Examples that can run end 2 end @abuccts Anbang
@fanyangCS
Copy link
Contributor

Looks great! Please make the UI on mount path configurable, I.e., we can config to make it disappear. Some deployment may not have a storage system that is suitable to be mounted on (e.g., hdfs)

@squirrelsc
Copy link
Member

For JobType, is there any details about it? And there could be a clone button to clone from self jobs, or even search more jobs.

@squirrelsc
Copy link
Member

For extensibility, resource part should be able to customize, and support different resource type for clusters with different server model.

@squirrelsc
Copy link
Member

squirrelsc commented Mar 1, 2019

  1. For directory paths in docker, it should be able configurable. So that user don't need to update their code to align with OpenPAI.
  2. The paths may need credential to access. It's a difficult part to support different type of resource, but need to be considered in design also.
  3. We should tell user or let their choose the files are copied or mounted. So that they know the disk space impact, and if they can get data out with those folders.

@scarlett2018
Copy link
Member Author

For JobType, is there any details about it? And there could be a clone button to clone from self jobs, or even search more jobs.

created #2241 for this input, will not cover this in current issue.

@scarlett2018
Copy link
Member Author

For extensibility, resource part should be able to customize, and support different resource type for clusters with different server model.

yes, this will be part of the current end 2 end story. Admin is the one to customize the Resource options.

@scarlett2018
Copy link
Member Author

scarlett2018 commented Mar 1, 2019

  1. For directory paths in docker, it should be able configurable. So that user don't need to update their code to align with OpenPAI.
  2. The paths may need credential to access. It's a difficult part to support different type of resource, but need to be considered in design also.
  3. We should tell user or let their choose the files are copied or mounted. So that they know the disk space impact, and if they can get data out with those folders.

created #2247 for this input, will not cover this in current issue.

@xwzheng1020
Copy link
Contributor

xwzheng1020 commented Mar 1, 2019

Maybe the JobType can be group by "Single Node" and "Distribute Nodes", each group contains sth like "Tensorflow-py36-1.3". So the Properties can be removed.

@scarlett2018
Copy link
Member Author

@xwzheng1020 - yes, agreed. Mock up updated, the Properties has just been removed.

@scarlett2018 scarlett2018 changed the title Simplest Job Submission Page for Job (types) that will only run on 1 node Simplified Job Submission Page Mar 13, 2019
@scarlett2018 scarlett2018 changed the title Simplified Job Submission Page Simplified Job Templates for NFS configured OpenPAI Mar 19, 2019
@Gerhut
Copy link
Member

Gerhut commented Mar 22, 2019

Test case:

  1. Config the service-configuration.yaml according to https://github.com/Microsoft/pai/tree/master/contrib/submit-nfs-job#install with uri: https://gerhut.github.io/store/simple-nfs-job.js
  2. Restart webportal service
  3. Validate that there is a "Submit NFS Job" item in the plugin menu.
  4. Enter the plugin without guest�Login page appeared
  5. Enter the plugin with logged in user�Validate the "Mount Directories" exists (which depends on the storage configuration exists in k8s config map) and submit button (on the bottom) is enabled.
  6. Job submission:
    a. Submit simple nfs job, the job will be able to run successfully.
  7. Job NFS storage:
    a. Submit tensorflow single node job with changed work directory
    b. The job will be able to run successfully.
    c. The new work directory should be exists in NFS server, includes some train data.
    d. The original dataset should be in the data directory of NFS server.
  8. Job cloning:
    a. Submit tensorflow distribute job, with different server count / gpu count / command of ps, workers.
    b. Job details appeared.
    c. "Clone"
    d. Check that plugin page appeared, the type / server count / gpu count / command of ps, workers are same as filled in step a.
    Job could also be submitted since the job name is regenerated.

@mzmssg
Copy link
Member

mzmssg commented Mar 25, 2019

Please setup enviroments before test, refer to #2204

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants