Import cross sections from csv #41

Yaxuan-Lii · 2021-07-15T13:30:05Z

No description provided.

This reverts part of commit 7984c15 and restores a newline that got lost while restoring test/test.py

galore/__init__.py

galore/cross_sections.py

ajjackson · 2021-07-16T09:27:33Z

galore/cross_sections.py

+    with tarfile.open(tar_file_name) as tf:
+        with tf.extractfile(file_path) as hello:
+            data = hello.read().decode()
+    a = data.split('\r\n')


Use descriptive names for variables. It is hard to read a line of code operating on a, b, c, and d and understand what it is supposed to be doing.

The new name data_string is a bit better because it is at least "greppable". But it's also a bit misleading because data_string is not actually a string, it's a list. Maybe something like data_lines would be better, as this conveys how it was split?

data_strings would at least be better. If you read data_string[0] it looks like it indexes a single letter from a string. Whereas data_strings[0] clearly gets a longer string, which can be split.

galore/cross_sections.py

test/test_process_pdos.py

Yaxuan-Lii

Sorry wrong push.

galore/cross_sections.py

galore/__init__.py

- Remove merge conflict - Remove unnecessary style changes Not that there's anything wrong with style cleanup, but it's better if it doesn't dominate the changes under review. This could be a separate PR and discussion.

galore/cross_sections.py

ajjackson

Some comments on the updated read_csv_file: I'll take a closer look at the other parts tomorrow.

galore/cross_sections.py

ajjackson

Ok, comments on the rest of the file!

galore/cross_sections.py

ajjackson

Hi Yaxuan,

I have made some comments but haven't yet tried running the new code. This looks pretty functional already, which is great.

Please do use tools to check PEP8 compliance and clean up the spacing as you go along, it makes the code easier to read/review. (And will be needed in the end anyway.)

I'm relieved to see that not many lines of code were needed to handle the installation locations, this looks simple and robust. The next step would be to add an environment variable for a user location, but that doesn't necessarily have to be part of this PR. Cleaning this up and handling/interpolating missing energy values is a higher priority.

galore/cli/galore_get_cs.py

ajjackson · 2021-08-12T13:44:47Z

galore/cli/galore_install_data.py

+def run(reference):
+    if reference == 'Scofield' or reference == 'Yeh':
+
+        url,data_file_dir,data_file_path =galore.cross_sections.get_csv_file_path(reference)


Please follow PEP8 guidelines for spacing around , and =, it makes the code easier to read.

ajjackson · 2021-08-12T13:46:58Z

galore/cli/galore_install_data.py

+    return parser
+
+def run(reference):
+    if reference == 'Scofield' or reference == 'Yeh':


Another way to do this is if reference in ('Scofield', 'Yeh'). What you have is fine here, but the other way is worth knowing if there are more options to compare 😅

Another nice trick which is used elsewhere in the code is if reference.lower() in ('scofield', 'yeh'); this makes it case-insensitive so our user doesn't have to mess around with shift keys.

galore/cross_sections.py

ajjackson · 2021-08-12T15:33:14Z

galore/cross_sections.py

+
+
+    if os.path.isfile(data_file_path)== True:
+        print("Data file exists.")


What if somebody wants to update to a corrected newer version?

If we don't provide some kind of force=True option, it would be helpful to print where the file exists so it can be examined/deleted.

ajjackson · 2021-08-12T15:34:10Z

galore/cross_sections.py

+
+        try:
+            os.mkdir(data_file_dir)
+        except:


What is this except catching? "Bare" exceptions like this can catch any error, including ones we don't expect. It's best to be very specific about which exceptions are ok to ignore.

galore/cross_sections.py

ajjackson · 2021-08-26T08:46:57Z

galore/cross_sections.py

+    electrons_numbers = np.array(
+        [value for key, value in electron_counts_by_subshells.items() if 's' in key])
+    # get highest obital cross section of obital s
+    highest_obital_cross_section = s_cross_sections[-1]/electrons_numbers[-1]


This would IndexError if there is no s orbital data. (Say because we are looking at very low energy and only a p orbital is available.)

ajjackson · 2021-09-01T11:14:58Z

galore/cli/galore_get_cs.py

@@ -54,28 +54,33 @@ def get_parser():
        Space-separated symbols for elements in material.""")

    parser.add_argument('--dataset', type=str,
-                        help='You can enter "Scofield" or "Yeh"')
+                        help=
+        """You can enter 'Scofield' or 'Yeh' """)


This isn't clearer than the previous formatting, I think autopep8 is sneaking in again?

Would still be good to use choices here, then Argparse is responsible for handling bad user input.

ajjackson · 2021-09-01T11:17:39Z

galore/cli/galore_get_cs.py

-    logging = galore.cross_sections.cross_sections_info(cross_sections)
-    logging.info("Photoionisation cross sections per electron:")
+    if cross_sections is None:
+        pass


Which scenario is this catching? Could there be a more helpful logging message in that case? Users don't like it when programs do nothing.

ajjackson · 2021-09-02T09:37:21Z

galore/cross_sections.py

+                cross_sections, closest_energy = _cross_sections_from_csv_data(
+                    energy, data, dataset)
+                cross_sections_dict[element] = cross_sections
+            print('The closest energy of input is {energy} keV'.format(


Use logging instead of print here: the user should have a record of this.

ajjackson

The general approach with sample test data looks like a good one for the current data setup.

The new test file looks functional, but isn't running at the moment because the filename needs tweaking.

ajjackson · 2021-09-02T15:13:26Z

test/new_test.py

@@ -0,0 +1,72 @@
+import numpy as np


This should have a different name than new_test.py. It won't be "new" for long, and then it won't be easy to guess what it tests without looking inside.

More importantly, the test isn't actually running on the CI at the moment - I don't see them in the logs here https://github.com/SMTG-UCL/galore/pull/41/checks

That's because the test framework expects files containing tests to begin with the word "test" - this needs to be renamed to something like "test_csv_data.py" so they are found by setup.py test.

ajjackson · 2021-09-02T15:23:32Z

test/new_test.py

+
+    ##check the function goes well with above datatable
+        cross_sections_scofield,_= _cross_sections_from_csv_data(800,data,'Yeh')
+        self.assertAlmostEqual(cross_sections_scofield['s'], 0.00145)


Should this variable be cross_sections_yeh? These tests are nearly duplicated, which suggests they might be more cleanly implemented by calling a common function.

(This is not urgent, I can clean it up later if need be.)

ajjackson · 2021-09-02T15:35:31Z

test/new_test.py

+    def test_scofield_directory_and_path(self):
+        ## Simulate expected correct directory and path for scofield data
+        correct_path = os.path.join(correct_directory, 'Scofield_csv_database.zip')
+        _,_,Scofield_file_path = get_csv_file_path('Scofield')


This should be scofield_file_path (PEP8)

Yaxuan Li and others added 14 commits June 28, 2021 14:57

add test for process_pdos

0a0270c

fix the imports

d5c8552

removeed irrelevant files

553738f

import is fixed

5300d62

try to remove irrelevant fils second time

dab091b

remove irrelevant files second time

d69d61b

delet irrelevant files

7984c15

remove irrelevant files

66b1c6f

delete .DS_Store

4fdafa7

Modified test_process_pdos.py

dca2d49

add test.py

eb079f6

Restore some files that were accidentally deleted

8190183

This reverts part of commit 7984c15 and restores a newline that got lost while restoring test/test.py

use flake8 to optimise format

946efaa

import cross-sections from CSV archives

cd0f3c0

ajjackson reviewed Jul 16, 2021

View reviewed changes

galore/__init__.py Outdated Show resolved Hide resolved

ajjackson reviewed Jul 16, 2021

View reviewed changes