Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env_mach_specific.xml update needed on Cori-KNL? #2596

Closed
worleyph opened this issue Oct 26, 2018 · 15 comments
Closed

env_mach_specific.xml update needed on Cori-KNL? #2596

worleyph opened this issue Oct 26, 2018 · 15 comments
Assignees

Comments

@worleyph
Copy link
Contributor

On a recent master (v1.1.0-470-gce4cf01b2) case.setup on Cori-KNL is failing with

 ERROR: module command /opt/modules/default/bin/modulecmd python load python/2.7-anaconda-5.2 craype PrgEnv-intel cray-mpich failed with message:
 python/2.7-anaconda-5.2(12):ERROR:150: Module 'python/2.7-anaconda-5.2' conflicts with the currently loaded module(s) 'python/2.7-anaconda-4.4'
 python/2.7-anaconda-5.2(12):ERROR:102: Tcl command execution failed: conflict $name

Adding

  <command name="rm">python/2.7-anaconda-4.4</command>

right before

  <command name="load">python/2.7-anaconda-5.2</command>

eliminates the problem for me.

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2018

It looks like a commit on Sep27th brought in a module load for a specific python version on cori-knl (@singhbalwinder ):

commit aecba157fe20a7902368f0d9c3819b677d935fa3
Author: Balwinder Singh <[email protected]>
Date:   Thu Sep 27 11:53:17 2018 -0700

    Fixes netcdf data types and loads python for Cori
    
    [BFB] - Bit-For-Bit

I was unaware of this. The way it is now, if you have a different version of python loaded in your env, the module E3SM commands will fail (as you noted).

Questions before we can make a PR to address:

a) Do we even need python loaded? It wasn't there before. Should it also be loaded on other machines?
b) Do we need a specific version loaded? The one being used is curiously not the default.

For now, to get around this, you can just make sure you have no python loaded, or the version requested (which is NOT the default on the machine):

module rm python
module load python/2.7-anaconda-5.2
cori10% module avail python

---------------------------------------------------------------------------------- /usr/common/software/modulefiles ----------------------------------------------------------------------------------
python/2.7-anaconda              python/2.7-anaconda-5.2          python/3.6-anaconda-4.4
python/2.7-anaconda-4.4(default) python/3.5-anaconda              python/3.6-anaconda-5.2

@worleyph
Copy link
Contributor Author

Thanks. Note that python appears to be loaded by default currently (I do not have this in any of my local rc files), and is python/2.7-anaconda-4.4 (not python/2.7-anaconda-5.2).

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2018

Well as you see with the "module avail" command, it looks like "python/2.7-anaconda-4.4" is the current default. However, what a user actually has loaded when logging in (as seen with module list) is not always clear. Obv some folks (including myself) have modules that are loaded at login. Some modules might be changed because of other module commands. And NERSC might make changes as well.

@singhbalwinder
Copy link
Contributor

I need this version of python for one of the tests (PGN) that I added recently. I thought I chose the "default" version of python but I do not recall that clearly. I think the default version should also work for my test. I will try that and create a PR to fix it. I will add module rm python command so that it removes user loaded python before loading this version.

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2018

Thanks Balwinder. Could you explain why python is needed for the test?
Would this also need to be handled on other machines?

As cime uses python, it may not be an inert software -- ie it wouldn't surprise me if something is different when switching python versions. If we do need python, maybe we should ask @jgfouca or others if there are concerns regarding specific python versions (ones to be avoided or favored)?

When deciding if we should chose a specific version of a module:
a) it's nice when the version number doesn't matter as changes to system will most likely not affect us -- but when changes to system do affect us, it would happen without our knowing (ie if the default version changes).
b) Picking a specific version gives us more control -- EXCEPT when the system decides to remove that version and we are forced to make change

One option for (a) is to simply module rm python and the load it again should get you the default on all NERSC machines. Would that work?

Interestingly, the repo I've been testing most recently was from Sep27th, but I must have cloned it just before this change was committed as I don't have this (or would have caught it sooner).

@singhbalwinder
Copy link
Contributor

I did ask @jgfouca about adding python to machine files before adding it and he told me that it is fine to do that (of course, like me, he wasn't expecting that it will break the compilation this way).

I always thought it is better to explicitly mention versions of the compilers/libraries when we load them as otherwise it will be hard to figure out if something changes without our knowledge. For example, if we load a compiler using module load intel, it will load the "default" version. "default" version can change anytime without our knowledge which can cause model to produce nbfb answers. So explicitly specifying versions avoid these scenarios which can sometimes be hard to figure out given so many dependencies.

I would prefer doing module rm python and then loading a specific Python version (which can also be default) as long as it is compatible. Please let me know if you see any downside in doing this.

@rljacob
Copy link
Member

rljacob commented Oct 26, 2018

Because CIME needs python before you can run any CIME commands, CIME should not be loading python at all. Its the users's responsibility to have python loaded before running any CIME commands.

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2018

Well, say git for example. We load the git module to allow for saving the repo version in log files. It was decided that we do not care what version of git, therefore we load the default. If what you need is very basic, then the default should work and no version required. Certainly, we would always specify the version for most modules (such as the compilers!).

Rob: yea I thought of that as well. Without some version of python available, then very little would work in E3SM. I guess it depends on what is needed for this test and how it might interact with the version used for CIME. So you are suggesting that we should not be loading python at all? I was kinda thinking that myself. Maybe the right solution here is to take that command out (ie don't load any python) and we just require that python be loaded before beginning.

@singhbalwinder
Copy link
Contributor

Because CIME needs python before you can run any CIME commands, CIME should not be loading python at all. Its the users's responsibility to have python loaded before running any CIME commands.

That makes sense. I need some python libraries to run my test which are only available in that conda package. I didn't know how to make those available to the test except loading this package via machine files. Even if we load that conda package before running the test on login node, that package might not be available on the compute nodes. Is there a way to get around that? If there is no simple fix, I will revert this change so that machine file do not remove/load any python until we find another way to handle this.

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2018

Currently, python is certainly being used on the compute nodes. I assume that it would be the version that you have loaded in your env (I don't know the details of this). But if what you need turns out to be only in a specific version of python, then maybe we just require this be loaded before running any E3SM commands.

@rljacob
Copy link
Member

rljacob commented Oct 26, 2018

You'll have to include how to get those libraries in the documentation for the test. This is a consequence of having a test with a lot of dependencies which is new for create_test. So yes revert the loading of python.

@rljacob
Copy link
Member

rljacob commented Oct 26, 2018

Getting back to the problem that started this thread: the e3sm_developer test suite is running fine on cori-knl according to cdash. So why can't Pat run?

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2018

It also fails for me in the same way. It would just depend on what python version you have in your env. It would probably also work if you had NO version of python loaded.

OK, I just tried unloading python and this worked.
So I think there must always be a system-level python. Yep:

cori11% which python
/usr/common/software/python/2.7-anaconda-4.4/bin/python
cori11% module unload python
cori11% which python
/usr/bin/python

So unloading python will still let you run E3SM. I suppose that's yet another option.

@jhkennedy
Copy link
Contributor

@singhbalwinder this issue is why I'm working on a CIME compliant way of loading conda/pyenv environments that add in needed external python dependencies (like numpy for your test). I'd suggest not changing the machine python module codes until I've got that solution nailed down.

@singhbalwinder
Copy link
Contributor

Sounds great @jhkennedy ! I have created a PR to revert this change.

singhbalwinder added a commit that referenced this issue Oct 26, 2018
Removes python from Cori machine files

Fixes #2596

[BFB] - Bit-For-Bit
singhbalwinder added a commit that referenced this issue Oct 26, 2018
Removes python from Cori machine files

Fixes #2596

[BFB] - Bit-For-Bit
jgfouca pushed a commit that referenced this issue Jan 16, 2019
Removes python from Cori machine files

Fixes #2596

[BFB] - Bit-For-Bit
jgfouca pushed a commit that referenced this issue Jan 18, 2024
…sion-nano-fix

Automatically Merged using E3SM Pull Request AutoTester
PR Title: EAMxx: fix setting of FillValue based on fp precision
PR Author: bartgol
PR LABELS: I/O, AT: AUTOMERGE, bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants