Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1324 netcdf parallel #1388

Merged
merged 49 commits into from
Jun 25, 2015
Merged

Issue 1324 netcdf parallel #1388

merged 49 commits into from
Jun 25, 2015

Conversation

doutriaux1
Copy link
Contributor

No description provided.

@doutriaux1
Copy link
Contributor Author

@dlonie @aashish24 this PR has evolved into something much diff than its original name.

It does turn on PARALLEL capabilities for both hdf5/netcdf4 and also cdms2.

BUT cdms2 // writing is still very much in development, // writing must be done "the right" so while the features are here for experts, the newbie will likely break its code. I will keep exploring the // aspect and improve both cdms2 and documentation on how to use it in this context.

What this PR is really about is
1- Many build fixes on various Lniux, most difficult one being rhea
2- newer NetCDF that gave us 2 order of magnitude speed gain on rhea (along with some configuration tweaks withini rhea)
3: 0d write bug fix in cdms2, string array bug fixes.

all of this is VERY important and high priority for ACME, alsofixes master build on linux rh and ubunutu for me.

possible side effect: loss of dap client (bug in the newer netcdf) I'm investigating this

@aashish24
Copy link
Contributor

@doutriaux1 one minor request. Can you squash some of the commits such as "getting there" so that history is bit cleaner and its easy to understand your changes. Also, we may need to get rid of building ffi (if not in this branch then in another PR when we update the add_cdat_package and deps cmake macros).

@doutriaux1
Copy link
Contributor Author

@aashish24 found a bug (ctest thank you) so put it in now. Let's keep ffi in, when we you use_system_packages it will just skip it, otherwise people like susannah with no admin won't be able to build it.

@doutriaux1
Copy link
Contributor Author

@aashish24 there's only one "getting there" and it's pretty descriptive there. I promise to work even harder on better commit messages in the future. Feel free to squash commits to your liking.

@doutriaux1
Copy link
Contributor Author

@aashish24 do you know of a quick tool to squash commits?

@doutriaux1
Copy link
Contributor Author

looking into test1 failure

@doutriaux1
Copy link
Contributor Author

ok last not dap related test passes see mac results at:
https://open.cdash.org/buildSummary.php?buildid=3859039

@alliepiper
Copy link
Contributor

Squashing commits: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

git rebase -i [commitish] is the easiest way. Takes less than a minute for a simple squash.

@aashish24
Copy link
Contributor

thanks @doutriaux1 , you can use command referred by @dlonie. You can pick a commit that you want to use as a root commit (from that point you want to change history) and then it will show a list of commits something like

pick ff225e6 Fix #1029 - Update ffmpeg from 0.11.1 to 2.7
pick 1e21fec Add whitespace to trigger rebuild

You can change pick to s for squash. You cannot squash top commit in that list if that makes sense.

@doutriaux1
Copy link
Contributor Author

@dlonie yep I know that but it requires some thinking 😝 I was hoping for a gui-based once (I know I must be getting old when I ask for a GUI tool...) for less brain power drain.

@alliepiper
Copy link
Contributor

It's really quite simple -- run the command, change pick to f(ixup) (or s(quash) depending on whether you want to keep the second commit message), and it's good to go. Won't even have to worry about conflicts unless you rearrange the commits.

@doutriaux1
Copy link
Contributor Author

@dlonie it's no problem. Plus it's easier now that I'm at work, got the bigger screens can see multiple windows at once etc... Will squash in a minute.

@doutriaux1 doutriaux1 force-pushed the issue_1324_netcdf_parallel branch from 8b568fe to 9b22d85 Compare June 15, 2015 16:26
@doutriaux1
Copy link
Contributor Author

@aashish24 is it better now?

@aashish24
Copy link
Contributor

Its looking much better I think. It could be improved bit more but for now it will work.

hdf5 needed to be built with --enable-parallel
hdf5 and pnetcdf needed to be built with -fPIC
netcdf and hdf5 needed to be configured with CC=mpicc hence new cdatmpi_configure_step.cmake file
hdf5 parallel had changed/renamed/removed some functions and wasn't compatible anymore
pyopenssl suddenly refused to build...
had to enable six/cffi/ffi and add pycparser
it would leads to an error, now we remove the file if it's here and 'w'
mode and then create the file.
also turn off classic to ensure using netcdf4
so if deflate/shuffle are 0 AND classic is 0 we use netcdf4 mpiio

before shuffle and defalte 0 triggered netcdf3

we might want to be able to use NetCDF4 classic format, in which case we
will need to implement a cdms2.useNetCDF4(True) to force using netcdf4
no mattter what
the file opened by other process after the original (bad) one was removed
thsi seems to fix it
This may look innocent, but is important change in the way cdms2 works
File will now be produced as NetCDF4 by default even when setting
defalte and shuffle to zero
In order to get NetCDF3 files you need shuffle/defalte/deflatelevel set to 0
you also need netcdf4 set to 0
and you need classic set to 1 (which is the default)
@doutriaux1 doutriaux1 mentioned this pull request Jun 18, 2015
PATCH_COMMAND ${netcdf_PATCH_COMMAND}
CONFIGURE_COMMAND ${CMAKE_COMMAND} -DINSTALL_DIR=<INSTALL_DIR> -DWORKING_DIR=<SOURCE_DIR> -D CONFIGURE_ARGS=${netcdf_configure_args} -P ${cdat_CMAKE_BINARY_DIR}/cdat_configure_step.cmake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you took out this?

@aashish24
Copy link
Contributor

@doutriaux1 looks like things are failing in this branch. Do you know why?

@aashish24
Copy link
Contributor

Successfully updated your environment to use UVCDAT
(changes are valid for this session/terminal only)
Version: 2.2.0-99-gb7f1a56
Location: /home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/build/install
Traceback (most recent call last):
File "/home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/source/testing/vcs/test_vcs_basic_vectors.py", line 25, in
import vcs
File "/home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/build/install/lib/python2.7/site-packages/vcs/init.py", line 32, in
from utils import *
File "/home/kitware/buildbot/buildbot-slave/uvcdat-test-laptop-linux-release/build/install/lib/python2.7/site-packages/vcs/utils.py", line 3, in
import cdtime
ImportError: No module named cdtime

@doutriaux1
Copy link
Contributor Author

not sure works for me on mac(s), rhea, ubnutu.

@doutriaux1
Copy link
Contributor Author

@aashish24 I will do a fresh build just to be sure.

@doutriaux1
Copy link
Contributor Author

@aashish24 I got it the switch to CMake build seems to have changed where libnetcdf goes... it's now (on my ubuntu at least) under:

./Externals/lib/x86_64-linux-gnu/libnetcdf.so.7
./Externals/lib/x86_64-linux-gnu/libnetcdf.settings
./Externals/lib/x86_64-linux-gnu/libnetcdf.so.7.2.1
./Externals/lib/x86_64-linux-gnu/libnetcdf.so

Will need to go back to mac to double check what happen there

@doutriaux1
Copy link
Contributor Author

ok on my mac it's at the "right" place Externals/lib... Will revert the change for CMake build

@aashish24
Copy link
Contributor

@doutriaux1 bunch of tests are still waiting. Mind looking into it?

@doutriaux1
Copy link
Contributor Author

@aashish24 I know I fixed this somewhere... Probably didn't push the branch, will push soon.

@doutriaux1
Copy link
Contributor Author

@aashish24 seems to pass on garant, no idea what the test failure is about but I don't think it has anything to do with netcdf 😉

@doutriaux1
Copy link
Contributor Author

@dlonie I also found another bunch of object that are preciously kept by vcs forever in #1424 fixing this might help you too.

@durack1
Copy link
Member

durack1 commented Jun 24, 2015

@doutriaux1 the issue with garant can be fixed with the sleep(1) added in #1393 - a rebase should solve that..

@aashish24
Copy link
Contributor

thanks @doutriaux1 dashboards are looking pretty good now.

@aashish24
Copy link
Contributor

👍 I didn't verify the parallel code but in general it looks good to me.

aashish24 added a commit that referenced this pull request Jun 25, 2015
@aashish24 aashish24 merged commit a8d30d9 into master Jun 25, 2015
@aashish24 aashish24 deleted the issue_1324_netcdf_parallel branch June 25, 2015 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants